Showing posts with label modularity. Show all posts
Showing posts with label modularity. Show all posts

15 April 2013

Separation of concerns for AWS DB Instance setup

Database node in AWS?  Wow, I'm out of my league on this, maybe writing things down will help me get some clarity. :)

This is a writeup of my thoughts about how to properly separate concerns for a production db node setup in AWS.

Constraints:
  • utilize the available AWS automation tools at every appropriate point
  • reduce the number of decisions that a NoSQL DBA would have to make when bringing a new db node online (storage, machine type, disk configuration)
  • reduce the number of tweaks that a NoSQL DBA would have to make to a setup script to bring a node up (goal: fully automated)
How this played out in my head:
Mongo: You can run these handy MongoDB CloudFormation templates.
Me: How am I going to get a 20-node cluster?  Copy/paste in the CF template?
Me: Copy/paste alarm beeping really loud...
Me: Who am I asking to do this copy/paste in the future, just my proof-of-concept team members, or also NoSQL DBAs?
Proof-of-concept team: When are you going to finally have the Mongo cluster set up?
Me: Need to split the prod setup from the head-to-head setup...  => creating this page to record my prod setup thoughts :)
There seem to be 4 different concerns when setting up a db node:
  1. Base machine image, including the following:
    • software pre-installed, but unconfigured
    • appropriate user accounts pre-created
    • appropriate BIOS & OS settings for a DB node
  2. Storage configuration, pre-configured for the following concerns:
    • Q: How many volumes?
    • Q: How large should the volumes be?
    • Q: What type of volumes should exist? (ephemeral vs. EBS; single volume vs. RAID0/1/10)
    • Q: How durable does the storage need to be? (based on published failure rates)
    • NOTE: All of the above questions depend on the db technology, starting with vendor recommendations, with our tweaks added on.
    • NOTE: All of the above questions should be answered and saved in as reusable of a form as feasible (or at least documented for proof-of-concept tests).
  3. Volume construction, including the following:
    • creating any necessary RAID structures over top of the block devices
    • mounting the resulting storage volumes with the appropriate filesystem
    • carving up the space among different mount points to appropriately cap certain kinds of usage
    • using the appropriate flags for optimum filesystem use (noatime, nodiratime, etc)
    • formatting the volumes appropriately
  4. Running instance parameters, including the following:
    • Q: How much memory is needed?
    • Q: How many cores are needed?
    • Q: Is EBS optimization needed?
Each of these concerns have impact on the choices made when setting up a database node in AWS.  And luckily, each set of concerns seems to be easily saved in template form, separate from each other, and ready to be deployed when needed.
  1. Base machine image
    • pre-created AMI
    • script in VCS to take a stock AMI on a given OS and produce a new AMI (solves OS upgrade, etc)
  2. Storage configuration
    • volume configuration is saved with the AMI, I think
  3. Volume construction
    • needs to be done at first boot
    • db service startup script could be patched to call the volume construction lazily
    • RAID setup software could be pre-deployed in #1, like: https://github.com/jsmartin/raidformer
    • boot script could be laid down as part of #1, or deployed as part of #4
    • can be saved in a CloudFormation script, but not really in any reusable form
  4. Running instance parameters
    • just have this documented somewhere so we know how 
    • possible to script this, this is the sweet spot for CloudFormation

22 August 2012

Modular Apps

In the last 12 months, it seems like apps are getting a lot more modular.

I discovered news.me before Christmas last year -- which felt modular at an app level -- RSS => social, and the reciprocal daily digest.  In fact, I've been reading their daily Twitter summary basically ever since, even though I am effectively a read-only non-citizen on Twitter.

It's old news to a lot of people, but when I saw IFTTT and Wappwolf, my mind was blown.  This is modularity at an app level, not just a code level.

Here is a good outline of what kind of things are possible with this new breed of app-level integrator apps:

http://www.readwriteweb.com/hack/2012/04/4-cool-things-you-can-do-with

Minus the fairly major hole of lack of HTTPS support from IFTTT, it looks awesome.  When that's fixed I'll be investing a fair amount of time/automation into it.

21 December 2011

Intrusion Detection through Stackable Filesystems

I've always wondered what exploit might be running on my system, and never had any time to devise/install a detection system that would have the right balance of useful detection (maximize) and performance impact (minimize).

When I stumbled upon unionfs a couple weeks ago, I thought that was an interesting idea from a change-logging perspective.  It's sometimes useful to be able to keep a filesystem-based diff of what a certain operation does to a system, and then bake it onto the system when I know it did what I wanted to.  The takeaway for me was that unionfs's performance profile had the opportunity to be so low because it was so thin and so baked into the kernel.

This compounded with my recent discovery and fiddling with fusefs (in user-land) and I wondered about what kind of useful logic I could put underneath a filesystem.  The GlusterFS feature set, the recent LessFS GC stuff, the bup-fuse stuff, and the S3FS stuff is all just *really* cool.  I ended up gazing longingly at the big backlog of fuse filsystem suggestions on their wiki, and wandered back into the unionfs space.

So, when I saw the I3FS paper (linked from here) about the modular application of this technique to intrusion detection, this really triggered the Useful *AND* Performant neurons and I got really excited.

Unfortunately, on first glance, the stackable filesystems stuff seemed pretty cryptic to set up in a lightweight, just-works kind of way (think custom mount command-lines complete with arcane stacking options).

It would be soooo awesome to have an easy-to-compose ruby DSL for doing some kind of rack-like filesystem mashup with a kernel-level unionfs layer underneath a user-land fusefs layer, but all expressed in the same DSL.

This would be an awesome tool to put on top of the Arch-derived clone I want to put up for people at work.  There are folks who care about living more on the edge of linux stuff, but that don't care to install from scratch, and also might not care that much about not having a full Gnome stack if things just work.  And if I could give them the same tuned IDS-on-the-desktop solution (or upgrade their developer stack by letting them pull a filesystem delta over), that would be really cool.

The cheap development observation [PDF, linked from] because of modularity is one of attributes valued highly by the Ruby community as well.  It is one the key things that makes Ruby as a community awesome.

These kinds of ideas really matter, and making them so cheap and stable that you don't have to think about them really matters even more.

NOTE: The I3FS paper is really pretty old (2004), and the whole unionfs stack is older than that.  The fuse stack came into the kernel a long time ago too (2005).  So while this is new to me, it's been around for quite a long time.  I'm playing catch-up.

Unicode Support Better Now in Ruby

Unicode support in Ruby has historically been one of Ruby's major problems for me.

With the advent of Ruby 1.9 of course, Unicode support started being added to the language.  However, it's not as straightforward as Java, which supported some version of Unicode from the beginning.

Even though it's been a rough decade, things are finally looking up.  In fact, I actually like the way things got factored after all of the mess.

The thing I like is that the minimal amount of support is included in the standard library, and it's easy to compose things in non-standard ways for weird scenarios or data in improperly encoded formats.

The core library has support for strings of codepoints and bytes and a flexible set of encoding facilities.

In addition, there are two libraries of interest:
  • unicode_utils - includes implementations of word-break functionality, grapheme boundaries, etc.
  • jcode (if you're stuck on ruby 1.8.x)

This series of posts gives you a full understanding of the topic.  Highly recommended!

This post gives a high-level view of where things were at around 2006, much of which is valuable background.

This post has a good summary of unicode-related resources, as does this stackoverflow question.

Semantic Versioning

After having skimmed the semantic versioning proposal/spec, I really like it, and I'm going back for a deep read.

The most notable violator of this that has bit me in the past has been the jersey framework, and maybe earlier versions of commons-collections.

03 September 2009

Exchanging gravitons with the monolith

When I was telling my colleague about my plans to aggressively modularize the monolith, he said:
You're exchanging gravitons with the monolith.
And he was right. The effect of the warped space-time in the close vicinity of the monolith was such that I lost sight of the new work we had recently embarked on. The trajectory I was on got skewed by the gravitational pull.

It is not sufficient to be doing work that is improving modularization. It is required to be doing work to deliver user value all the time -- and to take advantage of every opportunity to modularize in order to deliver user value.

Letting modularization become an end in itself is sometimes tempting, but in the end it was not the right path forward.

27 June 2009

Heroes and Monoliths

I'm sure it's no surprise to anyone, but huge, monolithic software is extremely hard to ship.

It used to be that when I encountered a particularly complex software problem, I would work at mastering the complexity until (by hook or by crook) I figured it out. Then I would proclaim myself victor. It was often the case that people were waiting for me to figure out the hard problem, and would congratulate me on a successful resolution.

However, those congratulations feel empty to me now. I think I can finally appreciate what Dijkstra meant in The Humble Programmer (and here), argument #6:
... the only problems we can really solve in a satisfactory manner are those that finally admit a nicely factored solution. ... By the time that we are sufficiently modest to try factored solutions only, because the other efforts escape our intellectual grip, we shall do our utmost best to avoid all those interfaces impairing our ability to factor the system in a helpful way. And I cannot but expect that this will repeatedly lead to the discovery that an initially untractable problem can be factored after all.
Now I remain unsatisfied until I've solved that problem at hand "in a satisfactory manner", i.e. in a manner that is tractable to read, understand, and change in the future. Not just for me, but for any other competent software developer that comes behind me.

I think this links to the IEEE Code of Ethics, items #5 and #6, in which computing professionals agree:
5. to improve the understanding of technology, its appropriate application, and potential consequences;

6. to maintain and improve our technical competence and to undertake technological tasks for others only if qualified by training or experience, or after full disclosure of pertinent limitations;
While I was writing this, someone came over and asked me a question about complex software I wrote part of, and I helped him. And I laughed at myself for being unsatisfied about helping him. :)

31 March 2009

DRY plus Proper Responsibilities

It is not sufficient to blindly apply the DRY principle.

Where the single representation ends up living is just as important as avoiding the duplication in the first place.

Having the single, unambiguous, authoritative representation of a piece of knowledge living in a place that it doesn't belong is only slightly better than having it duplicated, once in a place where it seems to belong, and once in a place it does not.

So included in the task of reducing (or avoiding) duplication is the consideration of where to put the common code, *including* the task of inventing a new place if it doesn't seem to fit anywhere, now that it's being used from multiple places.

Usually, a new place is waiting to be invented -- that is why there was duplication in the first place -- the place didn't exist, or else it would have been fairly obvious where to put the logic and how to reuse it in the first place.

Perhaps that is part of what "authoritative" means.