→ Forking, Continued

21 January 2009

David Welton has posted a thoughtful reply to my comment. Unfortunately, he missed my point. I can only assume this is due to a lack of clarity on my part. (My comment was very brief.)

I mentioned the Network Graph and Fork Queue but David mentioned neither. I think he doesn’t know what they are, probably because I didn’t explain what they are :)

So, in an effort to be more clear, let me propose an alternate workflow to David’s. He says, “Here’s a concrete example of how things might go wrong.” I see his example and think, “That’s a concrete example of how things have gone right.”

(I’m going to make this visual so you don’t have to take my word for it or keep jumping between here and GitHub to try it out.)

The Network Graph

root@fortrock:~# gem1.8 search -r actionwebservice

*** REMOTE GEMS ***

actionwebservice (1.2.6)
datanoise-actionwebservice (2.2.2)
dougbarth-actionwebservice (2.1.1)
nmeans-actionwebservice (2.1.1)

The problem is obvious: which actionwebservice should we use?

Well, we know three of those gems are on GitHub. And all three of the GitHub gems have higher versions than the Rubyforge gem. So let’s check GitHub first.

I go to http://github.com/search and search for actionwebservice. (Actually I do this from my LaunchBar template, but you get the idea.)

In the results, I see that the first actionwebservice appearing has the most forks and watchers, and was the most recently active.

We could stop right here and choose @datanoise@’s fork. But let’s be sure.

I click on the repo and arrive at http://github.com/datanoise/actionwebservice. I click on the Network tab.

I arrive at http://github.com/datanoise/actionwebservice/network and glance at the graph.

There are commits unique to certain forks, but datanoise has the most activity and the most recent commits.

At this point, there’s no doubt: datanoise is the most recent, most active version of actionwebservice. Will it always be? Who knows. But for now, this is the best choice.

The Fork Queue

We can take it a step further, too. We see there are unmerged commits – changes people have made which have not been pulled into datanoise. We can examine them straight from the Network Graph, or we can attempt to merge them into actionwebservice.

I fork @datanoise@’s actionwebservice using the ‘fork’ button, still on the Network Graph’s page:

Now I have my own version of actionwebservice.

Time to visit the Fork Queue. (Only people with write access to the repository see this tab.)

Okay, so it looks like a lot of the changes made will not apply cleanly.

If any of these commits had a green background, we’d be able to apply them right there on the site (as explained by the legend). But we can’t.

Pull Requests

Since we can’t apply the commits, we could send datanoise a message asking him to check over his Fork Queue and merge in the changes that look promising. We could do it ourselves, too, by resolving the conflicts. All we’d need is to clone our repository, add one of the others as a remote, then merge and fix the conflicts before pushing back to our version. After that we could even send datanoise a pull request, since we’ve done the work for him. Or just wait for him to check his Fork Queue and see that our changes are green.

I’m just going to delete my fork, though.

Keeping Current

Even ignoring the forking and the Fork Queue, the point is this: it’s not hard to see which project is the most active. Yes, we (GitHub) need to make it more clear. We want to say “this is the fork you’re looking for” on the first page you see. And we want that to change as the most active, latest fork changes. But, for the time being, you can figure that all out with a single click.

What happens when it does change, though? Perhaps datanoise will lose interest and someone else will take up development. That’s the beautiful part, and that’s why you just can’t do GitHub without Git (or any equally powerful DVCS): switching the remote and pulling in changes from the new repository is trivial. Git doesn’t care where you pull from. It is not married to a remote URL in the same way a centralized VCS like Subversion is.

Imagine hitting the datanoise repository and being informed “this may not be the most active repository in the network. Check Person X’s.” Switch your remote to the new one, pull, and you’re up to date. This is the type of information the Network Graph makes available, we just need to make it more visible and plain. (Heck, maybe that message could be printed out when you git pull from an inactive remote.)

For instance, hitting @dougbarth@’s actionwebservice fork and checking the Network Graph makes it obvious that datanoise is the repo we want:

If we were previously using @dougbarth@’s fork it would be clear that @datanoise@’s is the one to watch in the future.

Moving Forward

It may seem strange, and perhaps even like a lot of work. “Why should I have to check to see which is the most current? In the old model, there’s always a canonical repository.”

In the old model, actionwebservice wouldn’t have made it past 1.2.6. Welcome to distributed version control.