Deliberate Thinking: 2011-12

31 December 2011

Useful Git Tips

Every one of these tips was useful to me:

http://mislav.uniqpath.com/2010/07/git-tips/

It just amazed me that these were written over a year ago, and I've been trying to learn as much about git as I could, but didn't even stumble across these in manpages.

I found this by looking for an option that would let "git remote update" only fetch a subset of the remotes I have attached.

It is rewarding to feel like I'm working with a rich toolset.

Thanks to @mislav for posting this.

BTW: Poking around in his tweet stream also yielded this gem. I've always wondered if there was a lightweight XPath for JSON, and there it is. Some background is in order, as much of the "XPath for Javascript" mentality was based on early JQuery thoughts.

Subscribing to FamilySearch Updates

Keeping up with all that's going on at FamilySearch is harder than it needs to be, but I finally figured out how.

Most sites give you large categories of feeds to track, but FamilySearch has all of 14 different feeds, some of which overlap or not (the feeds are available by appending '/feed' to the URL). But I couldn't find any "everything" feed URL.

Because I work for FamilySearch, and because I care about what goes on around me, I want to subscribe to everything, if only just to skim. The most important announcements get forwarded through internal email, however the additional articles are very useful context, especially for an employee.

Unfortunately, the default "subscribe" link just subscribes you to feeds #2, #3, & #4, or Indexing, Record Search, & Indexing, respectively. The "Subscribe" link is visible on an individual article's page, but isn't even present on the main blog page.

Here is the feed link you can put in your feed reader to subscribe to everything:

https://www.familysearch.org/taxonomy/term/1+2+3+4+5+6+7+8+9+10+11+12+13+14/0/feed

When feed/category/blog #15 gets defined, this feed URL will be invalid again. Maybe there is a better way. If anyone knows a better way, please let me know.

I want an "everything" subscribe link that I won't have to update in my feed reader, or maybe an "everything but trivia" link.

24 December 2011

Missed Tweets Through News.me

I've never really been a twitter nut. There are already enough sources of distraction for me, I don't need to add another source. It just seemed really noisy.

However, people have been saying and linking to important things on twitter for a while now. I'm just not discerning enough and don't have enough time to filter the firehose in a useful way.

The most helpful summary of interesting news I skim regularly is the weekly LinkedIn emails. In fact, I think it may have been through a link chain from one of those emails that I stumbled on unionfs, as reported in the last post. If I could have the same thing for twitter, but personalized to the people I follow, that would be really helpful!

Ironically, as part of the effort to make this blog more stuble-upon-able, I was looking for a way to auto-post on twitter and facebook whenever I post on this blog, and I found twitterfeed.com, which does all of that and more. And it's easy to set up.

After setting that up this morning, I wondered: "What is the company behind twitterfeed.com?", and found betaworks. They run some very interesting sites/companies, some of which I knew about (bit.ly), some of which were new to me (findings.com, chartbeat.com). The interesting one for me right now was another site called news.me.

Turns out that news.me is just the kind of thing that has the chance of making twitter useful to me. If things go well, you'll probably hear more about it.

21 December 2011

Intrusion Detection through Stackable Filesystems

I've always wondered what exploit might be running on my system, and never had any time to devise/install a detection system that would have the right balance of useful detection (maximize) and performance impact (minimize).

When I stumbled upon unionfs a couple weeks ago, I thought that was an interesting idea from a change-logging perspective. It's sometimes useful to be able to keep a filesystem-based diff of what a certain operation does to a system, and then bake it onto the system when I know it did what I wanted to. The takeaway for me was that unionfs's performance profile had the opportunity to be so low because it was so thin and so baked into the kernel.

This compounded with my recent discovery and fiddling with fusefs (in user-land) and I wondered about what kind of useful logic I could put underneath a filesystem. The GlusterFS feature set, the recent LessFS GC stuff, the bup-fuse stuff, and the S3FS stuff is all just *really* cool. I ended up gazing longingly at the big backlog of fuse filsystem suggestions on their wiki, and wandered back into the unionfs space.

So, when I saw the I3FS paper (linked from here) about the modular application of this technique to intrusion detection, this really triggered the Useful *AND* Performant neurons and I got really excited.

Unfortunately, on first glance, the stackable filesystems stuff seemed pretty cryptic to set up in a lightweight, just-works kind of way (think custom mount command-lines complete with arcane stacking options).

It would be soooo awesome to have an easy-to-compose ruby DSL for doing some kind of rack-like filesystem mashup with a kernel-level unionfs layer underneath a user-land fusefs layer, but all expressed in the same DSL.

This would be an awesome tool to put on top of the Arch-derived clone I want to put up for people at work. There are folks who care about living more on the edge of linux stuff, but that don't care to install from scratch, and also might not care that much about not having a full Gnome stack if things just work. And if I could give them the same tuned IDS-on-the-desktop solution (or upgrade their developer stack by letting them pull a filesystem delta over), that would be really cool.

The cheap development observation [PDF, linked from] because of modularity is one of attributes valued highly by the Ruby community as well. It is one the key things that makes Ruby as a community awesome.

These kinds of ideas really matter, and making them so cheap and stable that you don't have to think about them really matters even more.

NOTE: The I3FS paper is really pretty old (2004), and the whole unionfs stack is older than that. The fuse stack came into the kernel a long time ago too (2005). So while this is new to me, it's been around for quite a long time. I'm playing catch-up.

Unicode Support Better Now in Ruby

Unicode support in Ruby has historically been one of Ruby's major problems for me.

With the advent of Ruby 1.9 of course, Unicode support started being added to the language. However, it's not as straightforward as Java, which supported some version of Unicode from the beginning.

Even though it's been a rough decade, things are finally looking up. In fact, I actually like the way things got factored after all of the mess.

The thing I like is that the minimal amount of support is included in the standard library, and it's easy to compose things in non-standard ways for weird scenarios or data in improperly encoded formats.

The core library has support for strings of codepoints and bytes and a flexible set of encoding facilities.

In addition, there are two libraries of interest:

unicode_utils - includes implementations of word-break functionality, grapheme boundaries, etc.
jcode (if you're stuck on ruby 1.8.x)

This series of posts gives you a full understanding of the topic. Highly recommended!

This post gives a high-level view of where things were at around 2006, much of which is valuable background.

This post has a good summary of unicode-related resources, as does this stackoverflow question.

Semantic Versioning

After having skimmed the semantic versioning proposal/spec, I really like it, and I'm going back for a deep read.

The most notable violator of this that has bit me in the past has been the jersey framework, and maybe earlier versions of commons-collections.

19 December 2011

Some Git/Ruby stuff of interest

While taking the git survey a while back, I encountered:

git-annex: awesome git-enabled large file archive support

which led me to:

git-bup: way cool! git-like backup, even serves as a remote for git-annex

also this morning ran into:

rib: irb, except better, including rib-heroku
rest-core: like rack, except for RESTful clients
gemgem: easy gem building, without all the hoe dependency cruft

I just wish that gem-man read the README{,.md} in the root of a gem (like gemgem does) if that gem didn't have a manpage. Sounds like a good patch to send. That will help me understand how gems can access their own filesystem post-install.

Thought you might like to know.

And btw, rbenv is way nicer to interact with than rvm, at least that's been my experience over the last couple weeks.

15 December 2011

srev-taming: Migrating Subversion to Git

After reading about Mediawiki's pending svn => git migration, it sounded very familiar to me, because I was proposing such a conversion at work about a year ago.

Mass svn => git migration tooling doesn't really exist in the way I wish.

The most popular tool is probably svn2git. But even with all the conversion goodies baked into it, it's still based on git-svn.

Anything based on git-svn means that for an svn repo that has multiple projects in it, you have to take N passes over the repository, which is a huge hit for repos in the 20,000+ commit range.

The most performant conversion tool is svn-fe, but it doesn't do much more than just import at the repo boundary: 1 svn repo => 1 git repo. And it doesn't even begin to deal with the situation of multiple svn histories as we have migrated from one repo to another.

The closest thing that exists is a set of scripts posted as part of this thread.

Here is a rough cut of a project I wish existed:

"srev-taming" ~~ "svn-migrate"
(anagram, in the spirit of "snerp-vortex" ~~ "svn-exporter")

scan of SVN dump for projects & codelines => annotated projects & branches list
easy invocation of svn-fe for mass-import => generate me a command-line
editable auto-detected project tags & codelines => language that easily expresses projects, codelines & grafts
easy clone/filter-branch invocation to extract & stitch codelines together (based on projects & branches list and configured grafts)
post-import filter for author fixup, SVN & migration artifact removal => generation of starting author list & svn URL minifier

Unless I get explicitly assigned to do the migration, I guess it's not going to prioritize high enough for me to do anything about it. At least svn-fe is being maintained and enhanced.

But I'm posting it here to see if anything exists that I don't know about -- or in case it sparks some ideas for someone to create this thing.