Categories
advocacy community programming

2012: the year of UNcollaborative development, or: when GitHub kills Open Source

What happens when you get 2 developers working together, sharing their source? What about 10? … or a 100?

There was a dream, 20 years ago, that the total would be greater than the sum of the parts. That developers could *re-use* each-other’s code.

Sadly, that dream – in 2012 – is poisoned.

What I’m going to describe here happens a lot – although in absolute terms, I hope it’s just a drop in the ocean. Maybe it’s nothing to worry about. Or maybe … well. In the last 15-odd GitHub projects I’ve tried to use, it affected more than a third of them. Such tiny stats are statistically meaningless, of course – but if you look at the causes of this, I think it’s more likely part of a general trend – and that really is worrying.

So. What’s going on?

The curse of Github

I love GitHub, I’m a paying member (and I regularly sell it to clients and colleagues) but … in some ways, it’s IMHO actively preventing collaboration.

Just to be clear: it doesn’t have to be this way – you can run your own projects on GitHub and prevent this happening.

But … GitHub makes this the path of least resistance, and that means – in the world of Open Source – it’s the path that gets most followed

When you fix a bug on GitHub, you have to wait for the original project author to “accept” your fix.

If they don’t accept it, as far as collaboration goes: you’re screwed. There is no “plan B” for collaboration.

Your only option is to tell the world:

“Stop using his project! It sucks! Use my project instead! I promise I’ll be a better merger!”

But then … if *you* stop accepting fixes for a while, one of the developers fixing YOUR bugs will have to do the same thing.

And each of these “Stop! Use mine instead!” calls is one-way: once another developer who’s making use of the source moves to a sub-fork, they can never go back. In theory, the original Author could do a back-dated merge … but in reality, that won’t happen, because of the cost involved:

Back-dated merging is combinatorially expensive

In practice, that’s more expensive than a normal person can afford, in terms of time and effort.

For each SubAuthor they want to back-merge with, they have to check every single change that person has made … against every change that they’ve merged already, from every single source. Otherwise they break the previously-merged code. Usually, each individual SubAuthor makes an incompatible change sooner or later – and so prevents the original Author from ever merging with them.

It’s no surprise – usually by this point the Sub Author has given up on the original Author (can you blame them? the Author has disappeared and ignored merge requests for months or years by this point)

So, in practice, very few GitHub authors (so far: none that I’ve seen) re-merge SubAuthor projects once the SubAuthor has really got going. On the projects I’ve been involved in, when a popular SubAuthor disappears for a while, there’s been a desperate scramble by the SubSubAuthors to find the guy/gal and beg/bribe/bully them into merging – otherwise we know that our combined efforts are about to be blown up.

What? Well …

The actions of the Author can undo the work of the Collaborators

Say you have Author “A”, and 3 people making changes and fixes to the code (“B”, “C”, and “D”).

At first, while A accepts merges quickly, B, C and D are all sharing code together – in practice, they are collaborating. However, they are not truly sharing code – GitHub does not allow this – they are sharing code with a Master (A), who is forwarding their work to all 3 of them.

When A disappears, B C and D can no longer collaborate. If A disappears with merges pending … then B/C/D find they have 3 distinct codebases, and no way within GitHub to do a simple cross-merge.

Now, the situation is not lost – if B, C, and D get in contact (somehow) and negotiate which one of them is going to become “the primary SubAuthor” (somehow), and they issue manual patches to each other’s code (surprisingly tricky to do on GitHub) … then they can resume collaboration. I’ve done this myself – it works. But it’s massively more complex than the process they were using before, which was *one-click-merge*.

In practice, at this point B/C/D will stop collaborating. Sad, but true. This happens over and over again on GitHub projects – when a SubAuthor arises, the other collaborators stop collaborating and become new SubAuthors in their own right.

Often it feels like watching a religion split, with each of the senior priests declaring themself “the new Prophet”, and going forth to spread (their) word…

Net effect: GitHub may be killing open-source projects

In theory, GitHub is wonderful.

But the combination of its bad design around some core use-cases, and its intransigence when it comes to the VERY common case of a single person disappearing … have lead to the point where I believe it’s killing projects. This is a gross generalization – and not every project that loses its Author will get this problem – but I’ve encountered more and more “dead” projects on GitHub over the course of 2011.

Of course … the way GitHub is designed, *those projects do not appear to be dead*. Often they appear to be very much “alive” – there’s tonnes of activity.

But all that activity is going on in radically different and massively incompatible forks. It’s wasted time and energy, it’s programmers fixing the same bugs – multiple times – because they are NOT collaborating any more.

In the case I cited at the start, 100-plus developers have (probably) re-written the same fixes for the same problems.

i.e. the total effect of this project is tending towards ONE HUNDRED TIMES less than the sum of its parts.

Note: LESS … not more!

There’s some value there, still – anyone can come along and start from the original project and make their own fork. But it’s a sad and sorry fraction of what the Open Source world dreamed of when the word Collaboration was fresh and exciting.

This is UnCollaboration. And its becoming depressingly common.