Staging: What Stack Overflow Can Learn from Git and Garbage Collection

Well, there’s been another I Hate Stack Overflow post on reddit (The decline of Stack Overflow: How trolls have taken over your favorite programming Q&A site), along with some aftershocks. I wrote my own essay on the subject a few months ago (My Love-Hate Relationship with Stack Overflow: Arthur S., Arthur T., and the Soup Nazi), and have had time to ponder things. Unlike Mr. Slegers’s blog post, I don’t believe the purely negative rant is useful without some shred of suggestion for improvement. Hence this post. (It’s here, rather than on embeddedrelated.com because Stephane rightly prefers technical content, so I’m trying to move the really tangential stuff to my personal website.)

I’ve been coming up the Git learning curve over the last few months. While I find a good chunk of the syntax annoying because of its non-intuitiveness and inconsistency (git reset --hardgit branch -m… the fact that git add has to be done on every commit, so shouldn’t it be called git stage? And WTF is rebase, really?), there is one feature I like. Well, most of the time I do.

Git separates the selection of files for the next commit, from the commit itself. This is different than Mercurial or SVN, which by default commit the changes in all tracked files. So with Git, you have to get in the habit of telling Git before you commit, Hey, you stupid git! These are the files I’m going to commit, when I get around to committing changes! by using git add or your favorite UI. It’s called staging. And the reason it’s so useful is that it helps partition changes into individual pieces, and because it’s an explicit step, it makes it less likely you’ll accidentally commit unwanted changes to the repository. You can even split the changes in a single file into separate commits, by staging or unstaging “hunks” (particular sections of a file).

Staging. Huh.

It got me thinking. And I had a flash of insight this morning. Stack Overflow has this problem caused by the dual use of the site. As I said in my article:

Well? So what if not everyone writes perfect questions that meet all the guidelines for Quality? So what? I mean really, what are the consequences here? There’s a tension here, because there are two completely different answers depending on which of the two purposes you think is most important.

If it’s the long-term Q&A archive, of course it matters! You want to keep around good questions and answers. You don’t want the bad ones. Get rid of them. Strive for Quality.

If it’s the short-term help for programming questions, on the other hand, the No Soup for You approach just hurts people. It hurts people who don’t know all of the rules. It hurts people who make the occasional error (as we all do). And it hurts people who are doing nothing wrong but are subject to an errant judgement by someone with power.

So here’s the insight: SO’s problem is that all questions are dumped into one bucket. When you post a question, it has the same behavior as a 5-year-old question. People can upvote, downvote, comment, vote to close, edit, you name it. So the same standards for long-term questions are applied to short-term questions. And the people who get pissed off at the low quality of site questions get angry and impatient and vote to close after a few minutes. The Soup Nazis win: grovel and follow the rules for asking a good, long-term curated question, or No Soup for You!

Instead, the site should be following Git’s lead by using a staging area, partitioning the questions into new ones and long-term ones. It’s a lot like the Generational Garbage Collection used in Java and other languages… which I already mentioned in my Soup Nazi article. Oh. Drat. I forgot.

Except I really didn’t finish my point, I just alluded to generational garbage collection and implied it was some kind of behind-the-scenes mechanism, like GC as seen by the programmer, where all memory looks the same from the programmer’s point of view. If I have some object in Java, all this generational stuff happens in the VM, where I don’t see it. I never know whether an object is part of the ephemeral generation or the long-lived generation, and I don’t care. Technically there doesn’t have to be generational garbage collection at all. It could even be handled by elves at the North Pole or garden gnomes or secret mutant squirrels. Don’t care. Memory is memory, just as long as it works. (That’s abstraction for you.)

In SO, explicit partitioning in the style of Git staging would be a good thing. We should know whether a question is a new question, or a curated question. Promoting a question from new to curated should be a very visible process, based on some criteria (upvote and downvote count, input from high-rep users, etc.): when enough people think a question is good, it should be promoted to the bucket of curated questions, and it should be possible for Stack Overflow users to look only at the new questions or only at the curated questions. Two buckets. If you’re a SO curmudgeon who cares about curation: forget the new questions, they’re just noise, look at the curated ones, spend your effort there. Want to help people? Then look at the new questions. Yeah, there’s going to be noise. Tough. Them’s the breaks. Some of the new questions don’t deserve to be curated. That’s ok. But they shouldn’t be unceremoniously closed just because someone thinks they’re not appropriate material for curation. That just alienates people, both the newcomers to the site and some of us oldtimers.

In any case, I think SO has to do something soon. Otherwise the atmosphere will drive out people with good will and the remaining site users will consist with the n00bs and the Soup Nazis.

What do you think?