This makes me want to move to Provo:
17 April 2013
Dystopia - from NetflixOSS's Adrian Cockcroft:
We have spent years striving to build perfect apps running on perfect kernels on perfect CPUs connected by perfect networks, but this utopia hasn't really arrived.
Instead we live in a dystopian world of buggy apps changing several times a day running on JVMs running on an old version of Linux running on Xen running on something I can't see, that only exists for a few hours, connected by a network of unknown topology and operated by many layers of automation.
Reminds me of Fallacies of Distributed Computing, but from the perspective of people who have actually vanquished the beast with weapons that are usable by others.
15 April 2013
Database node in AWS? Wow, I'm out of my league on this, maybe writing things down will help me get some clarity. :)
This is a writeup of my thoughts about how to properly separate concerns for a production db node setup in AWS.
- utilize the available AWS automation tools at every appropriate point
- reduce the number of decisions that a NoSQL DBA would have to make when bringing a new db node online (storage, machine type, disk configuration)
- reduce the number of tweaks that a NoSQL DBA would have to make to a setup script to bring a node up (goal: fully automated)
How this played out in my head:
Mongo: You can run these handy MongoDB CloudFormation templates.
Me: How am I going to get a 20-node cluster? Copy/paste in the CF template?
Me: Copy/paste alarm beeping really loud...
Me: Who am I asking to do this copy/paste in the future, just my proof-of-concept team members, or also NoSQL DBAs?
Proof-of-concept team: When are you going to finally have the Mongo cluster set up?
Me: Need to split the prod setup from the head-to-head setup... => creating this page to record my prod setup thoughts :)
There seem to be 4 different concerns when setting up a db node:
- Base machine image, including the following:
- software pre-installed, but unconfigured
- appropriate user accounts pre-created
- appropriate BIOS & OS settings for a DB node
- Storage configuration, pre-configured for the following concerns:
- Q: How many volumes?
- Q: How large should the volumes be?
- Q: What type of volumes should exist? (ephemeral vs. EBS; single volume vs. RAID0/1/10)
- Q: How durable does the storage need to be? (based on published failure rates)
- NOTE: All of the above questions depend on the db technology, starting with vendor recommendations, with our tweaks added on.
- NOTE: All of the above questions should be answered and saved in as reusable of a form as feasible (or at least documented for proof-of-concept tests).
- Volume construction, including the following:
- creating any necessary RAID structures over top of the block devices
- mounting the resulting storage volumes with the appropriate filesystem
- carving up the space among different mount points to appropriately cap certain kinds of usage
- using the appropriate flags for optimum filesystem use (noatime, nodiratime, etc)
- formatting the volumes appropriately
- Running instance parameters, including the following:
- Q: How much memory is needed?
- Q: How many cores are needed?
- Q: Is EBS optimization needed?
Each of these concerns have impact on the choices made when setting up a database node in AWS. And luckily, each set of concerns seems to be easily saved in template form, separate from each other, and ready to be deployed when needed.
- Base machine image
- pre-created AMI
- script in VCS to take a stock AMI on a given OS and produce a new AMI (solves OS upgrade, etc)
- Storage configuration
- volume configuration is saved with the AMI, I think
- Volume construction
- needs to be done at first boot
- db service startup script could be patched to call the volume construction lazily
- RAID setup software could be pre-deployed in #1, like: https://github.com/jsmartin/raidformer
- boot script could be laid down as part of #1, or deployed as part of #4
- can be saved in a CloudFormation script, but not really in any reusable form
- Running instance parameters
- just have this documented somewhere so we know how
- possible to script this, this is the sweet spot for CloudFormation