Experimenting with Git at Slide (Part 1/3)

For the past two months I've been experimenting with varying levels of success with Git inside of Slide, Inc.. Currently Slide makes use of Subversion and relies heavily on branches in Subversion for everything from project specific branches to release branches (branches that can live anywhere from under 12 hours to three weeks). There are plenty of other blog posts about the pitfalls of branching in Subversion that I won't go into here, suffice to say, it is...sub-par. Below is a rough diagram of our general current workflow with Subversion (I've had some other developers ask me "why don't you just work in trunk?" to which I usually wax poetic about the chaos of trunk when any project gets over 5 active developers (Slide engineering is somewhere between 30-50 engineers)).

There's always a catch
Subversion at Slide
Toying with Git
Git at Slide

There's always a catch

There are three major problems we've run up against with utilizing Subversion as our version control system at Slide:

Subversion's "branches" make context switching difficult
Depending on the age of a branch cut from trunk/, merges and maintainence is between difficult and impossible
Merging Subversion branches into each other causes a near total loss of revision history

Given that branches are a critical part of Slide's development process, we've historically looked at branch-strong version control systems as alternatives, such as Perforce. Before I joined Slide in April of 2007, I was a heavy user of Perforce for my own consulting projects as well as for some of my work with the FreeBSD project as part of the Summer of Code program. In fact, my boss sent out a "Perforce Petition" to our engineering list on my third day at Slide...we still haven't switched away from Perforce.

Up until earlier this year I hadn't given it a second thought until the team I was working with grew and grew such that between me and four other engineers we were pushing a release anywhere from once to three times a week. That meant we were creating a Subversion "branch" multiple times a week, and a significant part of my daily routine became merging to our release branch and refreshing project branches from trunk/. All of a sudden Git was looking prettier and prettier, despite some of its warts. At this point in time I was already using Git for some of my personal projects that I never have time for, so I knew at the bare minimum that it was functional. What I didn't know was how to deploy and use it with a large engineering team that works in very high churn short iterations, like Slide's.

Subversion at Slide

Moving our source tree over into a system other than Subversion, from Subverison, was destined to be painful. The tree at Slide is deceptively large, we have a substantial amount of Python running around (as Slide is built, top-to-bottom, in Python) and an incredible amount of Adobe Flash assets (.swf files), Adobe Illustrator assets (.ai files) and plenty of binary files, like images (.png/gif/jpeg). Currently a full checkout of trunk/ is roughly 2.5GB including artwork, flash, server and web application code. We also have roughly 88k revisions in Subversion, the summation of three years of the company's existence. Fortunately somebody along the line wrote a script (in Perl however) called "git-svn(1)" that is designed to do exactly what I needed, move a giant tree from Subversion to Git, from start to finish (similar to svn2p4 in Perforce parlance).

Toying with Git

When I first ran the command `git-svn init $SVN` I let the the command run for somewhere close to 6-7 hours before completing, I was shocked at the size of the generated repository. If our Git repository were to be left unpacked .git/ alone would be close to 9GB, adding the actual code on top of it, ~11GB. I decided that maybe packing this repository would be a good idea so I ran `git gc` and went to grab a coke from the fridge ... and the machine ran out of memory. One of our quad-core, 8GB RAM, shared development machines ran out of memory?!

After lurking in #git on Freenode for a while I determined two things

Apparently nobody uses Git for projects this large
Git was retaining too much memory, like a memory leak, but just don't call it a memory leak.

To compound this, the rules for memory usage with Git are vastly different between a 32-bit machine and a 64-bit machine, and because we're just that cool, we're using 64-bit machines across the board. The amount of memory Git decides to keep resident while doing repository-intensive operations like the `git gc`, is 256MB on 32-bit machines, and 8GB on 64-bit machines. As these machines are shared between developers we use of ulimit(1), when you limit memory usage with ulimit(1) it restricts address space meaning both virtual and resident memory. When Git tried to mmap(2) gigabytes of address space to do it's operations, the kernel stepped in to intervene and started returning ENOMEM to Git which promptly exited.

After raising this enough times, I finally caught spearce who was able to identify the problem and supply a patch that fixed the memory allocation issues with Git and a repository of Slide's size. First obstacle overcome, now I could actually test a Git workflow inside of Slide.

Git at Slide

Now that I could pack the repository on our development machines, I could get the repository down to a reasonable 3.0GB, i.e. .git/ weighed in at 3GB making a entire tree ~5.5GB (far more managable than 11GB). Despite Git being a decentralized version control system, we still needed some semblance of centralization to ensure a couple basic rules for a sane workflow:

A centralize place to synchronize distributed versions of the repository
Changesets cannot be lost, ever.
QA must not be over-burdened when testing releases

This meant we needed a centralized, secure, repository which left us two options: Git over WebDav (https/http) or Gitosis. After discovering that `git-http-push(1), the executable responsible for doing Git pushes over WebDav has tremendous memory issues, I abandoned that as an option (a `git push` of the repository resulted in memory usage peaking at 11GB virtual, 3.5GB resident memory).

If you are looking to deploy Git for a larger audience in a corporate environment, I highly recommend Gitosis. What Gitosis does is allows for SSH to be used as the transport protocol for Git, and provides authentication by use of limited-shell user accounts and SSH keys; it's not perfect but it's the closest thing to maintainable for larger installations of Git (in my opinion).

So far the experimenting with Git at Slide is pretty localized to just my team, but with a combination of Gitosis, git-svn(1) and some "best practices" defined for handling the new system we've successfully continued development for over the past month without any major issues.

As this post is already quite lengthy, I'll be discussing the following two parts of our experimenting in subsequent posts:

Team Development with Git
Git back to Subversion, mostly automatically.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide