12 March 2014

Map-reduce-friendly graph traversal

You may be well acquainted with this, but I thought I'd pass it along.

    https://giraph.apache.org/intro.html

It does iterative graph processing.  The cool thing is that it lets you break down your graph traversal into substeps that are distributed.

Here is some sample code:

  public void compute(Iterable<DoubleWritable> messages) {
    double minDist = Double.MAX_VALUE;
    for (DoubleWritable message : messages) {
      minDist = Math.min(minDist, message.get());
    }
    if (minDist < getValue().get()) {
      setValue(new DoubleWritable(minDist));
      for (Edge edge : getEdges()) {
        double distance = minDist + edge.getValue().get();
        sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));
      }
    }
    voteToHalt();
  }

Anyway, it looks like a very interesting project.

If I can do map reduce over change history, then it would then be possible to do contribution graph processing -- deep inspection of the graph of users and records they edit, possibly overlaid with the graph of users and which records they are connected to by ancestry.

For example, given a defined group of users, it would be interesting to see how their edits overlap with each other.  Another example would be to determine which top 1000 editing users made the most "distant" edits, based on some kind of ancestral distance measure.

No comments:

Post a Comment