27 August 2014

Rubber Ducking with Git

You've heard of the phenomenon where when you try to explain a hard problem to someone else, you suddenly know the answer, and the other person did nothing but listen to you ramble.

On the C2 wiki, it's called:



The theory I have about the phenomenon is that in a problem solving situation, the human mind develops a lot of parallel ideas & possible solutions, even ones that you are not aware of.  But when you try to describe the problem and your ideas to someone else, just the act of trying to explain the situation helps you see it more clearly and links the ideas together better in a way that you become aware of more possibilities than you were able to see before.

But I've always had a problem talking to inanimate objects.  Call me less imaginative, I guess.  Or timid, maybe.

Well, I've had the feeling for a while now that using Git with small commits made me more productive.  And I just realized, I'm using my future self as a rubber ducky, and that the act of writing explanatory commit messages to explain things to my future self is a source of ideas for me.

09 April 2014

Heartbleed Reaction Part 2

A particularly relavent statement from http://heartbleed.org (server side):
"Fortunately many large consumer sites are saved by their conservative choice of SSL/TLS termination equipment and software. Ironically smaller and more progressive services or those who have upgraded to latest and best encryption will be affected most."

There doesn't appear to be any up-to-the-minute current registry that I could find of sites that are affected on the server.  The scan posted on github is fairly out of date at this point, and from what I can tell only takes the homepage into consideration, not sites that only forward to https for things like login / checkout.

Here is the best one-off checker I could find (server side):
- https://www.ssllabs.com/ssltest/

Also, it may not be necessary to update Chrome/Firefox, based on the following language on the security stackexchange site:
- http://security.stackexchange.com/questions/55119/does-the-heartbleed-vulnerability-affect-clients-as-severely
"Chrome (all platforms except Android): Probably unaffected (uses NSS)"
"Chrome on Android: 4.1.1 may be affected (uses OpenSSL). Source. 4.1.2 should be unaffected, as it is compiled with heartbeats disabled."
"Mozilla products (e.g. Firefox, Thunderbird, SeaMonkey, Fennec): Probably unaffected, all use NSS"

The potential vulnerability of clients is discussed here:
- http://security.stackexchange.com/questions/55119/does-the-heartbleed-vulnerability-affect-clients-as-severely
- https://www.openssl.org/news/secadv_20140407.txt (language: "client or server")
- http://heartbleed.com
"Furthermore you might have client side software on your computer that could expose the data from your computer if you connect to compromised services."

My guess is that curl going to an https site would be affected, or other programs that use OpenSSL.  Maybe a chat client or if programs are downloading their own "auto-updates" over SSL.  Those are the only kinds of things that come to mind right now.

Reacting to Heartbleed

It's 2:37am and I can't sleep.  It feels like the internet fell down around my ears.

What I am doing:

  1. Got educated at http://heartbleed.com
  2. Updated Chrome to the 34.x version manually (promoted to stable yesterday)
  3. Checked for vulnerability in sites I use
  4. Completely clearing cookies and cache on ALL my computers, family & work, including phones
  5. Installing LastPass and resetting ALL my passwords as I become confident that each site is patched
    • I am assuming that all my user/passwords are either already known at this point, or can be discovered by anyone who recorded SSL traffic in the past 2 years
  6. Wondering what will happen because of this
UPDATE: Chrome update seems to be not strictly necessary as stated here.  But I'm upgrading anyway, because the Chrome stable release on 8 Apr. 2014 has a lot of other security fixes in it.

UPDATE: More details that I've learned are here in a follow-up post.

03 April 2014

Paying Down Learning Risk

I've heard: "Solve the hardest problem first."  As a rule of thumb, that works great to reduce risk early on a software project.  But I found myself saying something different to a co-worker recently:
Sometimes I start with the hardest problem, but sometimes I like to start with a really easy problem.  Why do I do that?
Why would it be a good idea to start with the easiest problem?  What kind of risk are you trying to pay down in a given situation?

Here are some reasons that would justify breaking the "hardest problem first" rule:
  • If you need to gain experience in a new domain, starting with something easy can help you get experience before tackling the harder problems.
  • If the world has changed out from underneath an existing system in an unpredictable way, starting with changing something easy or predictable can help you observe the source of the chaos.
  • If you are sharing work, handing the easy work items out to others based on their learning goals can help them learn better.
  • If tackling a hard problem will take a very long time, and others are waiting for you, then picking an easier part of the problem can help ease integration while still letting you engage on the hard problem.

The kind of risk you want to pay down first is important.  Here are the kinds of risk that would be payed down by the above behaviors:
  • risk of getting lost while learning
  • risk of being unable to bring order to a chaotic system
  • risk of assigning impossible tasks to someone who just wants to ramp up
  • risk of high integration costs because of trying to change too much at once

Most of the time, the risk caused by the uncertainty inherent in solving a hard problem is the most important risk to pay down first.  But sometimes, there are other factors at play, and other subtle variables that need to be managed to achieve a successful group outcome.

Thank you to Michael Nelson for his instructive collaboration on this topic.

29 March 2014

Painless localhost demos

Quick demo?  Easier than heroku?  Look at @ngrok.

I have periodically needed something that lets me painlessly set up a demo from my laptop that I could just email a link to anyone on the internet.

I guess that ngrok.com would be a pretty valuable target to pwn.  Maybe it wouldn't be too hard to install on my own host.

Thanks to @lmonson for retweeting about @ngrok.

20 March 2014

Frame heap idea for Ruby

After listening to @srawlins great talk about new allocation stats features from @tmm1 in ruby 2.1, I got thinking about garbage collection in ruby.  BTW, MWRC 2014 is awesome so far!

I was wondering if there is some hybrid model that takes advantage of the call stack patterns that exist in server-side apps, or in apps that do a lot of iterative processing work.

If I could annotate a certain stack frame as a collection barrier, then I could write something that would modify collection behavior at that call boundary.  On the way in, I could pre-allocate per-call heaps that would be used up to capacity by all code from that point in the call graph on down.  On the way out, collection could be triggered to get rid of the garbage.

The benefits I can see here are as follows:
  • there is greater locality of reference in the heap for objects allocated within a certain call graph
  • objects that survive the frame heap collection would be merged into the global heap
  • objects that overflow the frame heap space would either cause another linked frame heap to be allocated, or would overflow to the global heap
  • there is greater opportunity for compacting the heap, per-call-graph collections imply compaction
  • different collection barriers can be collected in parallel, because unless the call graphs won't overlap, even if they happen concurrently, so concurrent collection logic is potentially lock-free

Possible points of annotation as collection barrier:
  • rack request
  • http client request
  • json parse / save
  • File.readlines w/ block
  • long-running loop{}s early in the stack

My thinking about heap and garbage collection is formed by what I've seen happen over the years in the HotSpot JVM heap structures.  First there was mark/sweep, with large pauses.  Then generational garbage collection came along, optimized for collecting new garbage, and only doing mark/sweep when an object has survived many early collection cycles.  Then there was the garbage-first collector which did memory region analysis, prioritizing entire regions of heap that are likely to contain the most garbage.

Ruby has seen some garbage collection innovation recently, and there is more to come in ruby 2.2.  However, maybe this is a novel idea that can help out.   Or maybe this idea isn't new and has been tried and has failed, but it was at least worth a try to get this out there.

12 March 2014

Map-reduce-friendly graph traversal

You may be well acquainted with this, but I thought I'd pass it along.

    https://giraph.apache.org/intro.html

It does iterative graph processing.  The cool thing is that it lets you break down your graph traversal into substeps that are distributed.

Here is some sample code:

  public void compute(Iterable<DoubleWritable> messages) {
    double minDist = Double.MAX_VALUE;
    for (DoubleWritable message : messages) {
      minDist = Math.min(minDist, message.get());
    }
    if (minDist < getValue().get()) {
      setValue(new DoubleWritable(minDist));
      for (Edge edge : getEdges()) {
        double distance = minDist + edge.getValue().get();
        sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));
      }
    }
    voteToHalt();
  }

Anyway, it looks like a very interesting project.

If I can do map reduce over change history, then it would then be possible to do contribution graph processing -- deep inspection of the graph of users and records they edit, possibly overlaid with the graph of users and which records they are connected to by ancestry.

For example, given a defined group of users, it would be interesting to see how their edits overlap with each other.  Another example would be to determine which top 1000 editing users made the most "distant" edits, based on some kind of ancestral distance measure.

07 March 2014

Unanswered Questions

In my experience, programmers vary in their ability to tolerate ambiguity, or in their ability to proceed without an answer to a critical question.

For myself, I've had the sense that I have advanced in my ability to tolerate not knowing the answer to an important question.  However, it's only because of some coping mechanisms I've built up over time.  And without those coping mechanisms, I still basically stink at dealing with ambiguity at a core human level.

There is a sequence that I go through all the time:

  1. No explanation yet
  2. What is the real question?
  3. How to file open questions while I'm working to find the real question?
  4. What is the TTL on open questions?
  5. How do I review open questions?
  6. How do I forget to revisit something important?


And I always feel uneasy when it gets to #4-#6.  I realize that GTD is all about managing a fixed-size attention span, and keeping track of things that fall outside that attention span.  However, I stink at the paperwork part of GTD, but I try to apply some of the principles in the context of open questions.

Here is a list of ideas that relate to each other and are related to this overall theme:

  • Ambiguous results
  • Anomalous results
  • Open question
  • Loss reflex
  • Disorientation cost
  • Orientation rate
  • Orientation ability
  • Orientation cost
  • Learning pipeline
  • Fixed buffer size of open questions
  • Open question LRU/LN cache eviction (least recently used, least needed)
  • Isolating open questions in code


Here are some articles that relate to this theme:



This post is totally alpha and I don't even know where to go with it, but I wanted to get it out there to think about it some more, since I always think better after pressing "Post" than before.

05 March 2014

Why Instead of What

Writing a good commit message is an important collaborative skill on a software project.

See:
http://who-t.blogspot.de/2009/12/on-commit-messages.html
http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
https://github.com/torvalds/linux/pull/17#issuecomment-5661185

While all the formatting advice can seem pretty nitpicky, there are a couple principles that stand out to me:
  1. Above all, clearly and concisely state WHY you are making the change
  2. Also, include all the considerations that went into this change and why you are making THIS change instead of any others you considered
Even if the formatting isn't perfect or wouldn't meet the linux kernel mailing list standards, just having the context later when you are scratching your head wondering "why?" - that is invaluable.

03 March 2014

Code doc: "Responsible for ..."

Instead of writing class documentation like:

/**
 * This class does X.
 */

Instead of that, please start writing class documentation like:

/**
 * Responsible for X.
 */

The difference seems subtle at first, however the latter is better because it sends you down the path of the Single Responsibility Principle as a default way of thinking about things.

If it's hard to write a sentence fragment "responsible for X" because X is really X, Y, and Z, then that is a code smell that can point to a need for more refined subdivision of responsibility.

28 February 2014

Feeling Good with Creative Tension

This blog went dark after last summer because I had to focus on Boy Scout summer camp and moving my family to a new location.  But now those projects are largely taken care of, and I have some mental space to speak here again.

I have been reading two books recently:

In Creating, the author says that it is more important to acknowledge the tension between your creative vision and current reality, despite the pain that comes from realizing how far away from your vision you might currently be.  This has been very hard to internalize for me.  I feel depressed when things that I care about look so impossible.

Well, I gave up on Creating for a while, and one of the books that I hauled to my new house was Feeling Good.  I found the core of the book in the description of the interaction between "feelings" and "thoughts".

Chapter 3 - Understanding Your Moods: You Feel the Way You Think

There, the author lays out a list of common mental errors that present themselves in people who suffer depressive symptoms.  The premise of the book is that cognitive therapy focused on correcting the mental errors is helpful to relieve depressive symptoms.

Reading these books together put the two thoughts in juxtaposition in my mind, and the result was interesting.  I realized that I was making some of the common mental errors in the context of creating my vision of a distributed versioned family tree.

So the principle I want to put into practice now personally is "Feeling Good with Creative Tension" by recognizing the mental errors and correcting them as quickly as is practical.

Despite the hope that these thoughts bring, I have had to refocus my efforts after being assigned to a project at work that deals with innovation on the entire corpus of pedigree data.  The things I have been learning at a whole dataset level are valuable to me in thinking about how to make a distributed tree even work.

26 February 2014

Kent Beck about steering code via tests

I was about to send this to my team, but then I realized this is a blog post.

Here is Kent's article:


And here is a related tweet from him on that topic:

    "selecting the next test is an act of design"

I've always thought about "design" as choosing which classes exist and how they relate to each other, and what methods look like, etc.  Or at a system level, deciding which services exist and what their core of responsibility should be.

However, I never thought about gaming my own mind (like applying genetic algorithms to my own thoughts), and choosing to introduce tests in different orders -- in order to produce a different neurophysical response from myself.

Perhaps "design" is not just about a static system and what to include.  Perhaps "design" is about acknowledging the dynamism of the human/computer programming environment, and leveraging that dynamic nature in order to get a more optimum response.