Database node in AWS? Wow, I'm out of my league on this, maybe writing things down will help me get some clarity. :)
This is a writeup of my thoughts about how to properly separate concerns for a production db node setup in AWS.
- utilize the available AWS automation tools at every appropriate point
- reduce the number of decisions that a NoSQL DBA would have to make when bringing a new db node online (storage, machine type, disk configuration)
- reduce the number of tweaks that a NoSQL DBA would have to make to a setup script to bring a node up (goal: fully automated)
How this played out in my head:
Mongo: You can run these handy MongoDB CloudFormation templates.
Me: How am I going to get a 20-node cluster? Copy/paste in the CF template?
Me: Copy/paste alarm beeping really loud...
Me: Who am I asking to do this copy/paste in the future, just my proof-of-concept team members, or also NoSQL DBAs?
Proof-of-concept team: When are you going to finally have the Mongo cluster set up?
Me: Need to split the prod setup from the head-to-head setup... => creating this page to record my prod setup thoughts :)
There seem to be 4 different concerns when setting up a db node:
- Base machine image, including the following:
- software pre-installed, but unconfigured
- appropriate user accounts pre-created
- appropriate BIOS & OS settings for a DB node
- Storage configuration, pre-configured for the following concerns:
- Q: How many volumes?
- Q: How large should the volumes be?
- Q: What type of volumes should exist? (ephemeral vs. EBS; single volume vs. RAID0/1/10)
- Q: How durable does the storage need to be? (based on published failure rates)
- NOTE: All of the above questions depend on the db technology, starting with vendor recommendations, with our tweaks added on.
- NOTE: All of the above questions should be answered and saved in as reusable of a form as feasible (or at least documented for proof-of-concept tests).
- Volume construction, including the following:
- creating any necessary RAID structures over top of the block devices
- mounting the resulting storage volumes with the appropriate filesystem
- carving up the space among different mount points to appropriately cap certain kinds of usage
- using the appropriate flags for optimum filesystem use (noatime, nodiratime, etc)
- formatting the volumes appropriately
- Running instance parameters, including the following:
- Q: How much memory is needed?
- Q: How many cores are needed?
- Q: Is EBS optimization needed?
Each of these concerns have impact on the choices made when setting up a database node in AWS. And luckily, each set of concerns seems to be easily saved in template form, separate from each other, and ready to be deployed when needed.
- Base machine image
- pre-created AMI
- script in VCS to take a stock AMI on a given OS and produce a new AMI (solves OS upgrade, etc)
- Storage configuration
- volume configuration is saved with the AMI, I think
- Volume construction
- needs to be done at first boot
- db service startup script could be patched to call the volume construction lazily
- RAID setup software could be pre-deployed in #1, like: https://github.com/jsmartin/raidformer
- boot script could be laid down as part of #1, or deployed as part of #4
- can be saved in a CloudFormation script, but not really in any reusable form
- Running instance parameters
- just have this documented somewhere so we know how
- possible to script this, this is the sweet spot for CloudFormation