29 March 2014

Painless localhost demos

Quick demo?  Easier than heroku?  Look at @ngrok.

I have periodically needed something that lets me painlessly set up a demo from my laptop that I could just email a link to anyone on the internet.

I guess that ngrok.com would be a pretty valuable target to pwn.  Maybe it wouldn't be too hard to install on my own host.

Thanks to @lmonson for retweeting about @ngrok.

20 March 2014

Frame heap idea for Ruby

After listening to @srawlins great talk about new allocation stats features from @tmm1 in ruby 2.1, I got thinking about garbage collection in ruby.  BTW, MWRC 2014 is awesome so far!

I was wondering if there is some hybrid model that takes advantage of the call stack patterns that exist in server-side apps, or in apps that do a lot of iterative processing work.

If I could annotate a certain stack frame as a collection barrier, then I could write something that would modify collection behavior at that call boundary.  On the way in, I could pre-allocate per-call heaps that would be used up to capacity by all code from that point in the call graph on down.  On the way out, collection could be triggered to get rid of the garbage.

The benefits I can see here are as follows:
  • there is greater locality of reference in the heap for objects allocated within a certain call graph
  • objects that survive the frame heap collection would be merged into the global heap
  • objects that overflow the frame heap space would either cause another linked frame heap to be allocated, or would overflow to the global heap
  • there is greater opportunity for compacting the heap, per-call-graph collections imply compaction
  • different collection barriers can be collected in parallel, because unless the call graphs won't overlap, even if they happen concurrently, so concurrent collection logic is potentially lock-free

Possible points of annotation as collection barrier:
  • rack request
  • http client request
  • json parse / save
  • File.readlines w/ block
  • long-running loop{}s early in the stack

My thinking about heap and garbage collection is formed by what I've seen happen over the years in the HotSpot JVM heap structures.  First there was mark/sweep, with large pauses.  Then generational garbage collection came along, optimized for collecting new garbage, and only doing mark/sweep when an object has survived many early collection cycles.  Then there was the garbage-first collector which did memory region analysis, prioritizing entire regions of heap that are likely to contain the most garbage.

Ruby has seen some garbage collection innovation recently, and there is more to come in ruby 2.2.  However, maybe this is a novel idea that can help out.   Or maybe this idea isn't new and has been tried and has failed, but it was at least worth a try to get this out there.

12 March 2014

Map-reduce-friendly graph traversal

You may be well acquainted with this, but I thought I'd pass it along.


It does iterative graph processing.  The cool thing is that it lets you break down your graph traversal into substeps that are distributed.

Here is some sample code:

  public void compute(Iterable<DoubleWritable> messages) {
    double minDist = Double.MAX_VALUE;
    for (DoubleWritable message : messages) {
      minDist = Math.min(minDist, message.get());
    if (minDist < getValue().get()) {
      setValue(new DoubleWritable(minDist));
      for (Edge edge : getEdges()) {
        double distance = minDist + edge.getValue().get();
        sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));

Anyway, it looks like a very interesting project.

If I can do map reduce over change history, then it would then be possible to do contribution graph processing -- deep inspection of the graph of users and records they edit, possibly overlaid with the graph of users and which records they are connected to by ancestry.

For example, given a defined group of users, it would be interesting to see how their edits overlap with each other.  Another example would be to determine which top 1000 editing users made the most "distant" edits, based on some kind of ancestral distance measure.

07 March 2014

Unanswered Questions

In my experience, programmers vary in their ability to tolerate ambiguity, or in their ability to proceed without an answer to a critical question.

For myself, I've had the sense that I have advanced in my ability to tolerate not knowing the answer to an important question.  However, it's only because of some coping mechanisms I've built up over time.  And without those coping mechanisms, I still basically stink at dealing with ambiguity at a core human level.

There is a sequence that I go through all the time:

  1. No explanation yet
  2. What is the real question?
  3. How to file open questions while I'm working to find the real question?
  4. What is the TTL on open questions?
  5. How do I review open questions?
  6. How do I forget to revisit something important?

And I always feel uneasy when it gets to #4-#6.  I realize that GTD is all about managing a fixed-size attention span, and keeping track of things that fall outside that attention span.  However, I stink at the paperwork part of GTD, but I try to apply some of the principles in the context of open questions.

Here is a list of ideas that relate to each other and are related to this overall theme:

  • Ambiguous results
  • Anomalous results
  • Open question
  • Loss reflex
  • Disorientation cost
  • Orientation rate
  • Orientation ability
  • Orientation cost
  • Learning pipeline
  • Fixed buffer size of open questions
  • Open question LRU/LN cache eviction (least recently used, least needed)
  • Isolating open questions in code

Here are some articles that relate to this theme:

This post is totally alpha and I don't even know where to go with it, but I wanted to get it out there to think about it some more, since I always think better after pressing "Post" than before.

05 March 2014

Why Instead of What

Writing a good commit message is an important collaborative skill on a software project.


While all the formatting advice can seem pretty nitpicky, there are a couple principles that stand out to me:
  1. Above all, clearly and concisely state WHY you are making the change
  2. Also, include all the considerations that went into this change and why you are making THIS change instead of any others you considered
Even if the formatting isn't perfect or wouldn't meet the linux kernel mailing list standards, just having the context later when you are scratching your head wondering "why?" - that is invaluable.

03 March 2014

Code doc: "Responsible for ..."

Instead of writing class documentation like:

 * This class does X.

Instead of that, please start writing class documentation like:

 * Responsible for X.

The difference seems subtle at first, however the latter is better because it sends you down the path of the Single Responsibility Principle as a default way of thinking about things.

If it's hard to write a sentence fragment "responsible for X" because X is really X, Y, and Z, then that is a code smell that can point to a need for more refined subdivision of responsibility.