Categories
advocacy community programming

2012: the year of UNcollaborative development, or: when GitHub kills Open Source

What happens when you get 2 developers working together, sharing their source? What about 10? … or a 100?

There was a dream, 20 years ago, that the total would be greater than the sum of the parts. That developers could *re-use* each-other’s code.

Sadly, that dream – in 2012 – is poisoned.

What I’m going to describe here happens a lot – although in absolute terms, I hope it’s just a drop in the ocean. Maybe it’s nothing to worry about. Or maybe … well. In the last 15-odd GitHub projects I’ve tried to use, it affected more than a third of them. Such tiny stats are statistically meaningless, of course – but if you look at the causes of this, I think it’s more likely part of a general trend – and that really is worrying.

So. What’s going on?

The curse of Github

I love GitHub, I’m a paying member (and I regularly sell it to clients and colleagues) but … in some ways, it’s IMHO actively preventing collaboration.

Just to be clear: it doesn’t have to be this way – you can run your own projects on GitHub and prevent this happening.

But … GitHub makes this the path of least resistance, and that means – in the world of Open Source – it’s the path that gets most followed

When you fix a bug on GitHub, you have to wait for the original project author to “accept” your fix.

If they don’t accept it, as far as collaboration goes: you’re screwed. There is no “plan B” for collaboration.

Your only option is to tell the world:

“Stop using his project! It sucks! Use my project instead! I promise I’ll be a better merger!”

But then … if *you* stop accepting fixes for a while, one of the developers fixing YOUR bugs will have to do the same thing.

And each of these “Stop! Use mine instead!” calls is one-way: once another developer who’s making use of the source moves to a sub-fork, they can never go back. In theory, the original Author could do a back-dated merge … but in reality, that won’t happen, because of the cost involved:

Back-dated merging is combinatorially expensive

In practice, that’s more expensive than a normal person can afford, in terms of time and effort.

For each SubAuthor they want to back-merge with, they have to check every single change that person has made … against every change that they’ve merged already, from every single source. Otherwise they break the previously-merged code. Usually, each individual SubAuthor makes an incompatible change sooner or later – and so prevents the original Author from ever merging with them.

It’s no surprise – usually by this point the Sub Author has given up on the original Author (can you blame them? the Author has disappeared and ignored merge requests for months or years by this point)

So, in practice, very few GitHub authors (so far: none that I’ve seen) re-merge SubAuthor projects once the SubAuthor has really got going. On the projects I’ve been involved in, when a popular SubAuthor disappears for a while, there’s been a desperate scramble by the SubSubAuthors to find the guy/gal and beg/bribe/bully them into merging – otherwise we know that our combined efforts are about to be blown up.

What? Well …

The actions of the Author can undo the work of the Collaborators

Say you have Author “A”, and 3 people making changes and fixes to the code (“B”, “C”, and “D”).

At first, while A accepts merges quickly, B, C and D are all sharing code together – in practice, they are collaborating. However, they are not truly sharing code – GitHub does not allow this – they are sharing code with a Master (A), who is forwarding their work to all 3 of them.

When A disappears, B C and D can no longer collaborate. If A disappears with merges pending … then B/C/D find they have 3 distinct codebases, and no way within GitHub to do a simple cross-merge.

Now, the situation is not lost – if B, C, and D get in contact (somehow) and negotiate which one of them is going to become “the primary SubAuthor” (somehow), and they issue manual patches to each other’s code (surprisingly tricky to do on GitHub) … then they can resume collaboration. I’ve done this myself – it works. But it’s massively more complex than the process they were using before, which was *one-click-merge*.

In practice, at this point B/C/D will stop collaborating. Sad, but true. This happens over and over again on GitHub projects – when a SubAuthor arises, the other collaborators stop collaborating and become new SubAuthors in their own right.

Often it feels like watching a religion split, with each of the senior priests declaring themself “the new Prophet”, and going forth to spread (their) word…

Net effect: GitHub may be killing open-source projects

In theory, GitHub is wonderful.

But the combination of its bad design around some core use-cases, and its intransigence when it comes to the VERY common case of a single person disappearing … have lead to the point where I believe it’s killing projects. This is a gross generalization – and not every project that loses its Author will get this problem – but I’ve encountered more and more “dead” projects on GitHub over the course of 2011.

Of course … the way GitHub is designed, *those projects do not appear to be dead*. Often they appear to be very much “alive” – there’s tonnes of activity.

But all that activity is going on in radically different and massively incompatible forks. It’s wasted time and energy, it’s programmers fixing the same bugs – multiple times – because they are NOT collaborating any more.

In the case I cited at the start, 100-plus developers have (probably) re-written the same fixes for the same problems.

i.e. the total effect of this project is tending towards ONE HUNDRED TIMES less than the sum of its parts.

Note: LESS … not more!

There’s some value there, still – anyone can come along and start from the original project and make their own fork. But it’s a sad and sorry fraction of what the Open Source world dreamed of when the word Collaboration was fresh and exciting.

This is UnCollaboration. And its becoming depressingly common.

26 replies on “2012: the year of UNcollaborative development, or: when GitHub kills Open Source”

True.

But not using github (or similar) is even worse: There’s patches I submitted to e.g. google code projects for dead projects. I never heard back from the maintainer and the project is effectively unmaintained.

So what is better? Forking on github or throwing away the patch because the maintainer does not maintain?

Thank you for this post and pointing out the issue. What I miss here, is that it probably is a problem of distributed version control systems in general. As soon as one has one big central repository it boils down to a central version control with all its problems.

But DVCS still ease work contributions from “primary SubAuthors”. But this requires proper management from the project owner(s) and one has a people problem. At least for the Linux kernel this seems to work.

Maybe github/bitbucket/etc. could focus more on these SubAuthor managing problems.

It would seem that an ideal fix would be if GitHub would allow a project owner to designate a new project lead.
This would solve a lot of issues you raise:
a) The SubAuthors will not have to decide between them who is the “lead”, which is hard due to human tendencies.
b) Github can transfer pointers internally so that it can continue to be point and click for every commiter.
c) Github will actually save by not having to host multiple repositories
d) Many owners will be happy to set up a successor, especially if this will allow them to resume being the center of the code if later they wish to.
e) Newcomers to the project will be able to continue to know which is the “main” fork.

I find this to be the biggest issue. There are so many forks of many projects, that there is no way to find what you are looking for. This is especially bad as many of these forks are not intended to become their own projects – they are just conveniences to allow pushing to the host.

If GitHub does not do this, BitBucket or Assembla should – they are clearly looking for an initiative with which to start creating communities.

Looking at single project this problem can be very painful. But from a distant point of view, looking at the whole world of open-source, this is not so bad, i think.
The simple solution is that only projects with wise and proficients authors will survive. A lot (probably most of them) of projects are doomed to die, disappearing in the dust. But from a large majority of forgotten projects a few of them (the really good ones) arise from ashes and became more and more popular (and probably migrate on their own web sites).
Obviously github could be much better, but this is only a github problem.

While I think I understand your point of view, mine is a bit different…

The pull request and forking functionality greatly simplifies a prospective contributor’s entry into a project. It has happened on a few occasions that I fix a bug/add a feature to a project and, in stead of going through the whole “find project page/bug tracker/mailing list, contact developer, submit patch”-process (as I admittedly should have), I simply don’t. With pull requests this whole process is reduced to a few clicks on GitHub. If a non-GitHub project is dead or stagnating your patches would also lay on the bug tracker/mailing list, unused. It is the same “use it, don’t use it” scenario that we had before GitHub, but simpler.

Furthermore the network graph allows one to easily see work being done on forks of any given project. Before GitHub (and distributed version control systems) contributors had their own checkouts/clones on their private machines and had no “simple” way to share their changes. With GitHub you simply push your changes and it will appear in the *original* project’s network graph. It is therefore quite simple for ad hoc (albeit uncoordinated) collaboration or at least finding similar changes by other people. Again, this is the scenario we have always had, but simpler thanks to GitHub.

My view is thus that, while GitHub does not really solve the problems you have identified, it certainly did not create them and, in fact, simplifies it.

I completely agree: GitHub encourages one man shows and forks and discourages collaboration. However, I doubt they want it that way, they made organisations to enable collaboration, and it works very well. The problem is that few projects use them (mostly large ones like jquery and clojure).

What differentiated GitHub from all the other code hosting sites from day one was that they focus on the person, not the project. That’s pretty cool.

However, that’s also their curse. So far, I always went the pull request way instead of forking or starting my own project. But I always thought that has put me on a disadvantage:

People (e.g. potential employers) looking at my GitHub projects will see that I’ve forked another project, but many users fork projects without ever contributing, so nobody knows whether I’m a developer of that project. They don’t see at a glance which organisations I’m part of. In short: They don’t see what projects I’m actually working on. All they see is the projects I started.

On the other hand, if people are looking at projects, they see the founder, not the collaborators. What if one of the collaborators did far more work than the founder? They need to dig in to the history to figure that out.

Here’s what I think GitHub could do to solve these problems:
1. Make it easier to move a project from an individual user to an organisation – no forks with “this project has moved” READMEs required.
2. Display the collaborators of a project prominently on the project page.
3. Display the projects users are collaborating on equally prominent to the projects they’ve founded on user pages.

All of these sound like valid issues that github could address and make their service even better – but I don’t think it rises to the level of “kill open source” by any means.

OSS projects seem to follow a power law distribution with a few getting lots of attention/success, and a long tail of failed projects (look at the sourceforge wasteland) – just like the evolutionary tree of life…

Everything you describe is predicated on one condition: Bad ownership. The easiest way to avoid this problem is to be a good steward of your project – and that’s always been what made the biggest difference in OSS (after having users who actually need your sw).

In short: github as it is, is just how the OSS dynamic is playing out today. Just as its morphed to this model, it’ll continue to adapt. SW wants to be free, people want to collaborate. Life finds a way :) Leave off the sky is falling rhetoric and this is a great list of improvements.

“Bad ownership. The easiest way to avoid this problem is to be a good steward of your project”

Your “good steward” argument only works with hierarchical, controlled, non-chaotic (or dictatorial) development models. Which is the opposite of most of the development I see happening with Git (pace Linux).

I believe the single biggest problem with GitHub now is that it disempowers the multitude working on a project, and gives them no way to share power, share authority. Especially if the original author disappears.

The project URL – the one that others will have quoted all over the internet – will ALWAYS point to the original Author’s dead, unsupported, out of date code. And no matter how much the fork’ers improve their own copies, they can never change that.

This post doesn’t even *mention* the organizations support fhd mentioned. As soon as a project gets big enough to have that many potential contributors (which is a good problem to have), then the project should be transitioned to living under a GitHub organization.

I do think GitHub could do a better job of encouraging this, however.

This problem has nothing to do with GitHub. Trust me, this problem existed way before Git or Github was written. Consider these two facts:

– All Open Source projects need centralized maintainers. (Without maintainers, every codebase would quickly become overrun by spam.)
– Open Source Maintainers sometimes get tired of doing the unpaid work of being a maintainer.

Luckily, there are plenty of simple solutions:

1) Ask the original author to update their README to point to your version.
https://github.com/wycats/bundler

2) Create an “organization” to own the code. The organization can have a list of maintainers with the “commit bit”. In extreme cases, the password to this organization can be passed from maintainer to maintainer.
https://github.com/rails

3) Setup a web page that tells people where the code is. (github can even host it.) This doesn’t solve the problem, but at least provides an indirect pointer.
http://gembundler.com/

> When you fix a bug on GitHub, you have to wait for the original project author to “accept” your fix.

No, that was the old way in SVN. With git, no fork is “better” than any other fork. The idea of “maintainer” is entirely a social convention. If you maintain the code, you are the maintainer. What do you do if there are a lot of forks and no maintainer? Well, if it’s important to you, you start maintaining it.

> I think it’s more likely part of a general trend

The vast majority of open source projects are not going to become so popular that someone pays the maintainer to maintain it. Therefore, the vast majority of projects are headed for abandoned sooner or later. (see Sturgeon’s Law.)

You only noticed the problem because GitHub made it *easier* for others to pickup the ball and run with it. So there is less “complete abandonment” of projects. This is a good thing.

(This post is especially ironic considering your previous post: “T-Machine may briefly disappear”. Hmm, did someone forget to do some maintenance? If only I could fork your site..)

@BraveNewCurrency

If you could “ask the original author” *anything* … then this problem wouldn’t exist. This all starts with the point where the original author won’t even spend the time to click an “accept” button.

If you think “no fork is better than any other fork”, you’re not using the GitHub website, which very clearly gives different weightings to different forks, as a side effect of the UX design.

Finally .. the T-Machine post was because a previously high-quality Domain company stopped accepting credit cards. I fail to understand what that has to do with maintenance tasks.

I disagree with the article.

Of course, many Github projects may not offer active collaboration, but that’s not the same as killing open source. It’s actually still supporting open source by making it publicly available.

The point is that you need active committers/mergers for any project to work. For example Drupal CMS has a well established and active core commit culture. And they are thriving well using custom GIT servers to maintain core and extensions.

Github offers anyone the tools to build up an active collab culture. But this takes time and effort, and many smaller projects simply lose the maintainer’s interest because they moved on to other projects. That’s not Github’s fault.

@Morningtime

I think you mis-read the post.

This:

“many smaller projects simply lose the maintainer’s interest because they moved on to other projects. That’s not Github’s fault.”

…is not what we’re talking about. We’re talking about what happens AFTER the maintainer “moves on” – and what happens then is very much GitHub’s fault, for good or bad.

ya. you don’t like it. fork it. simple.Sometimes applying patch doesn’t mean the patch is working.

One thing it feels like you’re missing is that local Git offers far more possibilities. Don’t lock yourself into Github’s user interface – if you need to do something more complicated, you can always pull in a number of other peoples remotes and cherry pick/merge to your hearts content.

The problem starts with the way github puts the developer / organisation first; it is designed to put people before projects, which is good for the ego, but bad for the users / projects / packagers / collaboration, eventually.

It’s actually sad that bzr / Launchpad didn’t catch on, they get a number of the things you mention right (focus on the project, had project-centric teams from the very beginning); the only thing I don’t like about Launchpad is that it is not exactly obvious how you get to the code itself, even if it takes, like, 3 clicks.

No one can guard against all project members losing interest, but with teams you have better chances to get a response, and, maybe, be designated the new project driver if you want.

Part of a running a “healthy” software project is ensuring that the project (not just the code) is well managed and maintained. Github makes this easier, but it isn’t going to do it for you. If there is only *one* maintainer, and he/she is hit by a bus, goes into a drug induced coma, or simply stops caring, then tough luck.

…still, a more “project centric” interface would be nice, but since supporting opensource project’s is not GitHub’s primary business (as it is with LauchPad, etc) I don’t think this will happen.

@Alex – your comment is technically true, but it goes against the grain of Open Source, and if we all believed it, there would be very little open source software. IMHO it would be a true and rapid death of OSS. OSS was founded (philosophically) on the long-tail: anything you make that is useful, you should share the source.

Not “only a handful of projects, with rich backers, can afford to exist”.

UPDATE:

A project I contribute to is actively missing out on programmers and users BECAUSE THE ORIGINAL AUTHOR WON’T DO ANYTHING (he ignores twitter @messages, he ignores emails, he ignores the project).

Forks on GitHub *are not allowed* to have:

– Documentation
– Issue Tracker

…so the whole project is gradually going FUBAR, because of GitHub’s top-down, command-and-control approach to projects :(.

We *cannot fix* items in the tracker, and we *are not allowed* to have our own tracker. WTF is a project supposed to do?

Ah, correction: we could run a separate, independent tracker – I couldn’t find the option, but a colleague just showed me where to find it :)

Still getting dozens of bug reports that *are already fixed* in the main fork, but no-one can see that, because the original author has disappeared :(

Great article. I’m looking for a way to search all the commits in https://github.com/pinax/django-notification/network for anything to do with a Django 1.4 fix. It seems the only way to do this is manually via the network graph. You used to be able to see commits on forks via the ‘Fork Queue’ but I never got to play with that. I also found http://dev.choonkeat.com/branchesapp/ which looks interesting, but still doesn’t really solve the problem of being able to pick all the cool new features off forks when I find them. There are 176 forks for the repo above… Id love to hear others workflows for finding gems in forks when your not the root parent fork.

Many thanks,

Nathan

Comments are closed.