09 April 2014

Heartbleed Reaction Part 2

A particularly relavent statement from http://heartbleed.org (server side):
"Fortunately many large consumer sites are saved by their conservative choice of SSL/TLS termination equipment and software. Ironically smaller and more progressive services or those who have upgraded to latest and best encryption will be affected most."

There doesn't appear to be any up-to-the-minute current registry that I could find of sites that are affected on the server.  The scan posted on github is fairly out of date at this point, and from what I can tell only takes the homepage into consideration, not sites that only forward to https for things like login / checkout.

Here is the best one-off checker I could find (server side):
- https://www.ssllabs.com/ssltest/

Also, it may not be necessary to update Chrome/Firefox, based on the following language on the security stackexchange site:
- http://security.stackexchange.com/questions/55119/does-the-heartbleed-vulnerability-affect-clients-as-severely
"Chrome (all platforms except Android): Probably unaffected (uses NSS)"
"Chrome on Android: 4.1.1 may be affected (uses OpenSSL). Source. 4.1.2 should be unaffected, as it is compiled with heartbeats disabled."
"Mozilla products (e.g. Firefox, Thunderbird, SeaMonkey, Fennec): Probably unaffected, all use NSS"

The potential vulnerability of clients is discussed here:
- http://security.stackexchange.com/questions/55119/does-the-heartbleed-vulnerability-affect-clients-as-severely
- https://www.openssl.org/news/secadv_20140407.txt (language: "client or server")
- http://heartbleed.com
"Furthermore you might have client side software on your computer that could expose the data from your computer if you connect to compromised services."

My guess is that curl going to an https site would be affected, or other programs that use OpenSSL.  Maybe a chat client or if programs are downloading their own "auto-updates" over SSL.  Those are the only kinds of things that come to mind right now.

Reacting to Heartbleed

It's 2:37am and I can't sleep.  It feels like the internet fell down around my ears.

What I am doing:

  1. Got educated at http://heartbleed.com
  2. Updated Chrome to the 34.x version manually (promoted to stable yesterday)
  3. Checked for vulnerability in sites I use
  4. Completely clearing cookies and cache on ALL my computers, family & work, including phones
  5. Installing LastPass and resetting ALL my passwords as I become confident that each site is patched
    • I am assuming that all my user/passwords are either already known at this point, or can be discovered by anyone who recorded SSL traffic in the past 2 years
  6. Wondering what will happen because of this
UPDATE: Chrome update seems to be not strictly necessary as stated here.  But I'm upgrading anyway, because the Chrome stable release on 8 Apr. 2014 has a lot of other security fixes in it.

UPDATE: More details that I've learned are here in a follow-up post.

03 April 2014

Paying Down Learning Risk

I've heard: "Solve the hardest problem first."  As a rule of thumb, that works great to reduce risk early on a software project.  But I found myself saying something different to a co-worker recently:
Sometimes I start with the hardest problem, but sometimes I like to start with a really easy problem.  Why do I do that?
Why would it be a good idea to start with the easiest problem?  What kind of risk are you trying to pay down in a given situation?

Here are some reasons that would justify breaking the "hardest problem first" rule:

  • If you need to gain experience in a new domain, starting with something easy can help you get experience before tackling the harder problems.
  • If the world has changed out from underneath an existing system in an unpredictable way, starting with changing something easy or predictable can help you observe the source of the chaos.
  • If you are sharing work, handing the easy work items out to others based on their learning goals can help them learn better.
  • If tackling a hard problem will take a very long time, and others are waiting for you, then picking an easier part of the problem can help ease integration while still letting you engage on the hard problem.
The kind of risk you want to pay down first is important.  Here are the kinds of risk that would be payed down by the above behaviors:
  • risk of getting lost while learning
  • risk of being unable to bring order to a chaotic system
  • risk of assigning impossible tasks to someone who just wants to ramp up
  • risk of high integration costs because of trying to change too much at once
Most of the time, the risk caused by the uncertainty inherent in solving a hard problem is the most important risk to pay down first.  But sometimes, there are other factors at play, and other subtle variables that need to be managed to achieve a successful group outcome.

Thank you to Michael Nelson for his instructive collaboration on this topic.

29 March 2014

Painless localhost demos

Quick demo?  Easier than heroku?  Look at @ngrok.

I have periodically needed something that lets me painlessly set up a demo from my laptop that I could just email a link to anyone on the internet.

I guess that ngrok.com would be a pretty valuable target to pwn.  Maybe it wouldn't be too hard to install on my own host.

Thanks to @lmonson for retweeting about @ngrok.

20 March 2014

Frame heap idea for Ruby

After listening to @srawlins great talk about new allocation stats features from @tmm1 in ruby 2.1, I got thinking about garbage collection in ruby.  BTW, MWRC 2014 is awesome so far!

I was wondering if there is some hybrid model that takes advantage of the call stack patterns that exist in server-side apps, or in apps that do a lot of iterative processing work.

If I could annotate a certain stack frame as a collection barrier, then I could write something that would modify collection behavior at that call boundary.  On the way in, I could pre-allocate per-call heaps that would be used up to capacity by all code from that point in the call graph on down.  On the way out, collection could be triggered to get rid of the garbage.

The benefits I can see here are as follows:
  • there is greater locality of reference in the heap for objects allocated within a certain call graph
  • objects that survive the frame heap collection would be merged into the global heap
  • objects that overflow the frame heap space would either cause another linked frame heap to be allocated, or would overflow to the global heap
  • there is greater opportunity for compacting the heap, per-call-graph collections imply compaction
  • different collection barriers can be collected in parallel, because unless the call graphs won't overlap, even if they happen concurrently, so concurrent collection logic is potentially lock-free

Possible points of annotation as collection barrier:
  • rack request
  • http client request
  • json parse / save
  • File.readlines w/ block
  • long-running loop{}s early in the stack

My thinking about heap and garbage collection is formed by what I've seen happen over the years in the HotSpot JVM heap structures.  First there was mark/sweep, with large pauses.  Then generational garbage collection came along, optimized for collecting new garbage, and only doing mark/sweep when an object has survived many early collection cycles.  Then there was the garbage-first collector which did memory region analysis, prioritizing entire regions of heap that are likely to contain the most garbage.

Ruby has seen some garbage collection innovation recently, and there is more to come in ruby 2.2.  However, maybe this is a novel idea that can help out.   Or maybe this idea isn't new and has been tried and has failed, but it was at least worth a try to get this out there.

12 March 2014

Map-reduce-friendly graph traversal

You may be well acquainted with this, but I thought I'd pass it along.

    https://giraph.apache.org/intro.html

It does iterative graph processing.  The cool thing is that it lets you break down your graph traversal into substeps that are distributed.

Here is some sample code:

  public void compute(Iterable<DoubleWritable> messages) {
    double minDist = Double.MAX_VALUE;
    for (DoubleWritable message : messages) {
      minDist = Math.min(minDist, message.get());
    }
    if (minDist < getValue().get()) {
      setValue(new DoubleWritable(minDist));
      for (Edge edge : getEdges()) {
        double distance = minDist + edge.getValue().get();
        sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));
      }
    }
    voteToHalt();
  }

Anyway, it looks like a very interesting project.

If I can do map reduce over change history, then it would then be possible to do contribution graph processing -- deep inspection of the graph of users and records they edit, possibly overlaid with the graph of users and which records they are connected to by ancestry.

For example, given a defined group of users, it would be interesting to see how their edits overlap with each other.  Another example would be to determine which top 1000 editing users made the most "distant" edits, based on some kind of ancestral distance measure.

07 March 2014

Unanswered Questions

In my experience, programmers vary in their ability to tolerate ambiguity, or in their ability to proceed without an answer to a critical question.

For myself, I've had the sense that I have advanced in my ability to tolerate not knowing the answer to an important question.  However, it's only because of some coping mechanisms I've built up over time.  And without those coping mechanisms, I still basically stink at dealing with ambiguity at a core human level.

There is a sequence that I go through all the time:

  1. No explanation yet
  2. What is the real question?
  3. How to file open questions while I'm working to find the real question?
  4. What is the TTL on open questions?
  5. How do I review open questions?
  6. How do I forget to revisit something important?


And I always feel uneasy when it gets to #4-#6.  I realize that GTD is all about managing a fixed-size attention span, and keeping track of things that fall outside that attention span.  However, I stink at the paperwork part of GTD, but I try to apply some of the principles in the context of open questions.

Here is a list of ideas that relate to each other and are related to this overall theme:

  • Ambiguous results
  • Anomalous results
  • Open question
  • Loss reflex
  • Disorientation cost
  • Orientation rate
  • Orientation ability
  • Orientation cost
  • Learning pipeline
  • Fixed buffer size of open questions
  • Open question LRU/LN cache eviction (least recently used, least needed)
  • Isolating open questions in code


Here are some articles that relate to this theme:



This post is totally alpha and I don't even know where to go with it, but I wanted to get it out there to think about it some more, since I always think better after pressing "Post" than before.