17 April 2013

Dystopia - Fallacies Conquered

Dystopia - from NetflixOSS's Adrian Cockcroft:
We have spent years striving to build perfect apps running on perfect kernels on perfect CPUs connected by perfect networks, but this utopia hasn't really arrived.
Instead we live in a dystopian world of buggy apps changing several times a day running on JVMs running on an old version of Linux running on Xen running on something I can't see, that only exists for a few hours, connected by a network of unknown topology and operated by many layers of automation.
Reminds me of Fallacies of Distributed Computing, but from the perspective of people who have actually vanquished the beast with weapons that are usable by others.

15 April 2013

Separation of concerns for AWS DB Instance setup

Database node in AWS?  Wow, I'm out of my league on this, maybe writing things down will help me get some clarity. :)

This is a writeup of my thoughts about how to properly separate concerns for a production db node setup in AWS.

Constraints:
  • utilize the available AWS automation tools at every appropriate point
  • reduce the number of decisions that a NoSQL DBA would have to make when bringing a new db node online (storage, machine type, disk configuration)
  • reduce the number of tweaks that a NoSQL DBA would have to make to a setup script to bring a node up (goal: fully automated)
How this played out in my head:
Mongo: You can run these handy MongoDB CloudFormation templates.
Me: How am I going to get a 20-node cluster?  Copy/paste in the CF template?
Me: Copy/paste alarm beeping really loud...
Me: Who am I asking to do this copy/paste in the future, just my proof-of-concept team members, or also NoSQL DBAs?
Proof-of-concept team: When are you going to finally have the Mongo cluster set up?
Me: Need to split the prod setup from the head-to-head setup...  => creating this page to record my prod setup thoughts :)
There seem to be 4 different concerns when setting up a db node:
  1. Base machine image, including the following:
    • software pre-installed, but unconfigured
    • appropriate user accounts pre-created
    • appropriate BIOS & OS settings for a DB node
  2. Storage configuration, pre-configured for the following concerns:
    • Q: How many volumes?
    • Q: How large should the volumes be?
    • Q: What type of volumes should exist? (ephemeral vs. EBS; single volume vs. RAID0/1/10)
    • Q: How durable does the storage need to be? (based on published failure rates)
    • NOTE: All of the above questions depend on the db technology, starting with vendor recommendations, with our tweaks added on.
    • NOTE: All of the above questions should be answered and saved in as reusable of a form as feasible (or at least documented for proof-of-concept tests).
  3. Volume construction, including the following:
    • creating any necessary RAID structures over top of the block devices
    • mounting the resulting storage volumes with the appropriate filesystem
    • carving up the space among different mount points to appropriately cap certain kinds of usage
    • using the appropriate flags for optimum filesystem use (noatime, nodiratime, etc)
    • formatting the volumes appropriately
  4. Running instance parameters, including the following:
    • Q: How much memory is needed?
    • Q: How many cores are needed?
    • Q: Is EBS optimization needed?
Each of these concerns have impact on the choices made when setting up a database node in AWS.  And luckily, each set of concerns seems to be easily saved in template form, separate from each other, and ready to be deployed when needed.
  1. Base machine image
    • pre-created AMI
    • script in VCS to take a stock AMI on a given OS and produce a new AMI (solves OS upgrade, etc)
  2. Storage configuration
    • volume configuration is saved with the AMI, I think
  3. Volume construction
    • needs to be done at first boot
    • db service startup script could be patched to call the volume construction lazily
    • RAID setup software could be pre-deployed in #1, like: https://github.com/jsmartin/raidformer
    • boot script could be laid down as part of #1, or deployed as part of #4
    • can be saved in a CloudFormation script, but not really in any reusable form
  4. Running instance parameters
    • just have this documented somewhere so we know how 
    • possible to script this, this is the sweet spot for CloudFormation

08 March 2013

Living in the Future

What does it feel like to hoist yourself into the future, and start living in the future again after having drifted for a while?

I'm having to catch up on cloud deployment as part of a new team I'm on.  Wow, there is a lot of change in the last 5 years.  I feel swamped.

I remember I felt swamped like this a while ago.  And reading about taking crazy risks reminded me of the feeling, and makes me wonder what the risks are of introducing significant change into the group I'm working in.

I guess I could get fired for being too inconsiderate of people who don't like change or the associated risks.

I've also had a long-running idea about making genealogy data editable in a distributed version control way.  Although my ideas are undeveloped, due to my current lack of ability to focus, I've been working on how I could make the idea more viable.

So now that I'm feeling the need to get on the early adopter curve again, I saw How to Get Startup Ideas from Paul Graham, and realized that this article was about both things:

  • living in the future
  • developing new ideas

From Paul's article:
It takes time to come across situations where you notice something missing. And often these gaps won't seem to be ideas for companies, just things that would be interesting to build. Which is why it's good to have the time and the inclination to build things just because they're interesting.
Live in the future and build what seems interesting. Strange as it sounds, that's the real recipe.

I guess it's the process of shedding the natural loss aversion that I'm not used to, and accepting and realizing the innovation risks that may come.

06 March 2013

Innovation Risks


Instead of responding to "innovation" as a buzzword, I want to make sure that I always just think about innovation in terms of social changes, large or small.

As software people, we probably tend to be much more change-tolerant on average than many people in the non-software population -- I believe that's one reason why we gravitate toward the soft-ware part of things.

But there are particular kinds of innovation risk, I think:
1) effort investment risk
2) future opportunity risk
3) legacy replacement risk
4) replacement rate risk

Business people often talk about the expected ROI of a particular proposal.  That is what I'd categorize under the category of #1.  If I expect a return, it'd better be worth the effort I put into it.  This is standard stuff for software developers.  We do estimates to establish expected ROI, we do the work and see the results.

What it comes down to the categories of risk #2 and #3, I thing there is a wide variation in risk tolerance in software developers.  Often this is because of variation in our perception of value, or varied backgrounds, and even in similar backgrounds, variation in our recall of the hard lessons of experience.  Some of the most experienced among us look farther ahead, and therefore avoid certain risks because they look similar to times when we got bitten in the past.

In addition, it seems that #4 is different than #3, because even though some people may be willing to absorb the cost of a significant change once or twice, they may not be willing to continue to absorb changes of the same magnitude on the same frequency.

I think that common responses to these different kinds of risk are as follows:
1) proper planning (mitigates effort investment risk)
2) proper deliberation (mitigates future opportunity risk by measuring which opportunity to chase)
3) caution, loss aversion (enhances legacy replacement risk by clouding judgement)
4) apathy, rejection (enhances replacement rate risk by inhibiting trust and hurting relationships)

The difference is that most humans have a disproportionate amount of loss aversion for things they perceive that they own.  That's what distinguishes #2 from #3.

Where we fall into a trap is if we over-deliberate or let loss aversion dictate our learning environment, and if the world changed in a way that causes us to mis-predict failure based on our prior experience.  Sometimes it's more valuable to walk away from something of value even when we don't know what we're looking for in its replacement, because we have a distinct feeling that non-linear improvement is needed, or because we trust someone else's concept of where we can eventually end up.  Sometimes, something we failed at earlier is now possible, but only possible in a way are ignorant of, and therefore only possible in a way we cannot predict.

Ignorant and highly-motivated young blood (or adventurous veterans) in our field is what keeps us continue taking inordinate risks and learning from the experiences that come from them.

Does experience and capability make us better innovators?  Does our level of context make us more capable of effecting positive change?  Not necessarily, I think.

Maybe, to a point.  Once we've achieved a certain level of experience, I believe our efforts have higher overall effectiveness only if we are capable of avoiding the expert trap, and are able to forget & re-learn appropriate parts of our experience in the current context.

Some material that is relevant to this topic:
- http://matt.might.net/articles/programmers-resolutions/
- http://blog.8thlight.com/uncle-bob/2013/03/05/TheStartUpTrap.html (warn: unfortunate language)
- http://www.lessonsoffailure.com/developers/habits-kill-career/ (warn: contains a crude analogy)
http://tcagley.wordpress.com/tag/zen/
http://pragprog.com/book/trevan/driving-technical-change

Just remember the following pragmatic rallying cry:
"If it's not broke, let's not invite the UN to fix it." (heard on Linux Radar podcast)

04 March 2013

xcl: X Clipboard Helper

As a pragmatic command-line user, I just found a new way to easily interact with the clipboard.

Yes, yes, I knew about 'xclip' before now -- but it was just way too hard to use because it made me type (and remember) lots of options for simple clipboard operations.

Introducing 'xcl', a simple, helpful wrapper around 'xclip':

This uses a little-known trick from bash (really from the 'test' or '[' builtin), which allows me to detect whether a file descriptor is attached to a live terminal.

Hope this is helpful to you.

27 November 2012

X Clipboard Cleaner

Ever copy/paste from a web page and have the web page's style mess up the document you are pasting into?

That's pretty common for me.

Chrome/Firefox both preserve HTML style when copying text to the clipboard, and the following are common targets:

  • email
  • HTML-enabled editor widget
  • word processor document
My old process was:
  1. copy text/link from web page
  2. open up a text editor
  3. paste inside the text editor
  4. copy the same text again, this time without the style
  5. paste into the target location
I finally got sick of doing this manually all the time, so I wrote a script and put it in my app launcher bar.

Here is a gist with a script you can download:

04 October 2012

Rewriting Git History for Fun

For starters, if you're looking for a good JDBC connection pool, look no farther than c3p0 [github, doc].

Back a few years commons-dbcp wasn't very stable under load, so I went looking for an alternative and landed on c3p0.  Ever since then, Steve Waldman has been making it better and better.  Except for a 2-year hiatus when he wasn't working on it, he's been very responsive and willing to accept feedback and make improvements.  I'm seriously impressed by the project.

Anyway, since I've been a long-time user and fan, when I saw Steve put his software on GitHub, I went for a look.

Here is what I saw:

But what about all those prior releases in source snapshot form, that I was used to seeing from SourceForge?

I realized that perhaps I could contribute a little to the project, so I went and grabbed all the source release zips from SourceForge, created a local git repo, created a commit for each release, tagged it, and spliced Steve's recent work on the top.

Then I submitted a GitHub issue to ask Steve what he thought -- wasn't really a pull request because it was a completely disconnected history.

Here is the top of the new history:

And here is the earliest part of that history:

The tools used were:
- git config author.name - to give credit where credit was due
- git config author.email
- curl - to download all the releases
- unzip | sed - to figure out what the commit date should be
- git commit --date "[release date]" - to create the commits
- vi .git/info/grafts - to temporarily splice the new history on the old
- git filter-branch --tag-name-filter - to rewrite the new history permanently on top
- git tag -f - to replace the existing release tags to point to the new rewritten history
- git push --tags - to push it up to github

If I could consult for projects / companies to do this kind of VCS conversion work and actually get paid for it -- wouldn't that be awesome!