Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Old Navy Sucks.

I'm going to go ahead and admit something, something that's difficult for most men to admit in my situation. I shop at Old Navy. I'm sorry, I like their collared shirts. Sue me.

This past weekend I decided to use an oldnavy.com gift card that I was given to buy some new jeans (as my favorite pair now has a hole in the knee). A "cute" side effect of redeeming an oldnavy.com gift card was that I needed to create an oldnavy.com account. "Cute".

After I created my account, with a site-specific password (I generate throw-away passwords for sites that abuse the privilege of my business), I received the following email:


Like I said, "cute". Damn idiots.
Read more →

Amazon Sucks Too

On the topic of online shopping "sucking", I have been sitting on this beautiful screenshot for a while.

A couple of months ago I bought a watch on Amazon. Not a spectacular watch, a very basic Seiko analog watch that I had previously owned but had lost. I went on to Amazon to buy "my watch", and after finding it, I happily ordered the watch.

Shortly after the watch arrived, I noticed a huge influx of quite topical SPAM.



I'm pleased to say that I've not purchased anything from Amazon since I discovered that Amazon, or somebody that Amazon deals with sold my information to everybody.

This still makes my blood boil. Rat bastards.
Read more →

Git Protip: A picture is worth a thousand words (git tag)

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the fourth Protip written to date.




While the concept of "tagging" or "labeling" code is not a new, or original idea that was introduced with Git, our use of tags in a regular workflow does not predate the migration to Git however. At it's most basic level, a "tag" in any version control system is to take a "picture" of how the tree looks at a certain point in time such that it can be re-created later. This can be extremely helpful for both local and team development, take the following scenario for local development using tags:

Tim is extremely busy, most of his days working at an exciting, fast-paced start-up seem to fly by. With one particular project Tim is working on, a lot of code is changing at a very fast pace and the branch he's currently working in is stable one minute and destabilized the next. Tim has two basic options for leaving himself "bread-crumbs" to step back in time to a stable or an unstable state. The first, complicated option, is to mark his commit messages with something like "STABLE", etc so he can git diff or git reset --hard from the current HEAD to the last stable point of the branch.


The second option is to make use of tags. Whenever Tim reaches a stable point in his turmultuous development, he can simply run:
git tag wip-protips_`date "+%s"
(or something similar, `date` added to ensure the tag is unique). If Tim finds himself too far down the wrong path, he can rollback his branch to the latest tag (git reset --hard protiptag), create a new stable branch based on that tag (git checkout -b wip-protip-2 protiptag), or diff his current HEAD to the tag to see what all he's changed since his branch was stable (git diff protiptag...HEAD)



This local development scenario can become a team development scenario involving tags, if for example, Tim needed QA to start testing portions of his branch (his changes are just that important). Since the current HEAD of Tim's branch is incredibly unstable, he can push his tag to the central repository so QA can push a stage using the tag to the last stable point in the branch's history with the command: git push origin tag protiptag

Tags are similar to most other "refs" in Git insofar that they are distributable, if I execute git fetch your-repo --tags, I can pull the tags you've set in "your-repo" and apply them locally aid development. The distributed nature is primarily how tags differ in Git from Subversion, nearly the rest of the concept is the exact same.

Currently at Slide, tag usage is dominated by the post-receive hook in the central repository, where every push into the central repository ("origin") in the branch release branch is tagged. This allows us to quickly "revert" bad live pushes temporarily, by simply pushing the last "good" tagged release, to ensure minimal site destabilization (while we correct live issues outside of the release branch).

For more specific usage of `git tag` refer to the git-tag(1) man page



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Read more →

Proposal: Imuse, an IMAP-capable FUSE filesystem

I've spent the better part of my weekend messing around with mail clients, and once again Evolution comes out on top and once again, I'm not happy about it. I tried: Claws, Thunderbird, Alpine (formerly Pine), Mutt, Balsa, KMail and TkRat. None of them worked as well as I wanted, is it too much to ask for to have a mail client that doesn't puke and die on large (>2GB) of IMAP mail? Supports proper jwz mail threading? And caches IMAP mail locally so I can actually access it while disconnect? Turns out it actually is too much to ask.

That's not what this is about though. While hunting around, I started to look at my Slide IMAP mail account, and see something interesting: it looks suspiciously like a filesystem. The general layout I have right now is something like this:
  • /
    • INBOX
    • Sent
    • Drafts
    • Development/
      • Commits
      • Pushes
    • External/
      • Git
      • Hudson
    • Metrics
    • QA/
      • Exceptions
      • Trac


Clearly, it's a very filesystem-esque looking tree of mail (and a couple gigabytes of it). When you start to really dig into e-mail technology, you really get a feeling for how royally screwed up the whole ecosystem is. Between Exchange, IMAP and POP3 (and their SSL counterparts), mbox and Maildir, and of course the venerable SMTP; e-mail technology is a clusterfuck. No wonder barely anybody can implement an e-mail client that doesn't suck.

At a basic level, mail is organized into messages and folders. Messages map very easily to actual files on the filesystem, and folders naturually map into actual directories on the filesystem. Imagine that you could chose any program you wanted to read and write your email? The only pre-requisite: can it read from the filesystem? You could have any program register to receive filesystem events to notify you when mail "appears" in specific directories, and you could move mail around with a simple drag-and-drop in Nautilus/Thunar/Finder. What about writing mail though? Easy enough, you create a new file in the "Drafts" folder, writes would naturally be propogated to the "Drafts" folder on the IMAP server, and when you were done with the message, you could copy or move it into the "Sent" folder, which would have a hook to recognize the new file, and send it. The IMAP tree from above, starts to look something like this:
  • ~/Imuse
    • Settings
    • Accounts/
      • Slide/
        • INBOX
        • Sent
        • Drafts
        • Development/
          • Commits
          • Pushes
        • External/
          • Git
          • Hudson
        • Metrics
        • QA/
          • Exceptions
          • Trac


"Accounts" and "Settings" would likely need to be "special" insofar that Imuse would just create them out of thin air, Accounts would need to be a virtual directory to actually contain the appropriate account listings, and in Settings I'd likely want to have a couple of flat configuration "files" that you could edit in order to actually configure Imuse appropriately.

If there are simply lists of files in each of the Accounts' folders, each representing a particular email, then the problem of dealing with all my e-mail becomes a much easier one to handle, then it's just a matter of picking my filesystem browser of choice. Even then it's not really limited to filesystem browsers like Nautilus, the scope of programs that I can use to access my mail is opened up to $EDITOR as well. Most editors like Notepad++, Vim, Emacs, Gedit, and TextMate all support the ability to view a directory, and open it's contents up for reading/editing. I'm a big fan using Vim, so Imuse coupled with vtreeexplorer would be phenomonal to say the least.

I've started toying around with building FUSE filesystems and I've pushed my experimenting up to GitHub in my imuse repository. It's currently in C, since I either cannot get either of the two FUSE Python bindings to work properly. This presents a certain level of difficulty, since the standard means of accessing IMAP data from C seems to be c-client, which is reasonably well documented, but lacks sample code. On the other hand, if I can get the Python bindings to cooperate, then I have access to the wonderful Twisted Mail library (or even the basic imaplib).

Given my obvious time restrictions, I wanted to open the idea up to more eyes and ears to see what others thought and maybe even find somebody else willing to pitch in. For the time being however, Evolution is still sifting through my mail, and I'm still not enjoying it :(

Read more →

But Who Will Write The Tests?

In addition to frothing at the mouth about Git, I've been really getting into the concept of automated unit tests lately (thus my interest in Hudson). Just like code comments however, tests are good, no tests is bad, wrong tests is worse. That means once you give in to the almighty power of unit testing, you are saddled with the curse of knowing that you will have to update them, forever.

Taking up Test-driven Development is like having a child, if you are at a point in your life where you're ready to accept that kind of responsibility, it can be wonderful, a lot of work, but ultimately you will feel satisfied with your new role as a Responsible Developer (tm). If you're not prepared to take on the burden that TDD will present you with, you will likely regret it or neglect your tests (Deadbeat Developer, I like this metaphor).

In the Top Friends Team at Slide, we practice the more "loose" definition of TDD; tests are not written before functionality is written, but rather functionality is written, and then as part of the QA and release process, the appropriate and accompanying tests are written. Our basic workflow is usually as follows:
  • Tickets are written and assigned to milestones and developers in Trac
  • Branch is created in central Git repository
  • General plan-of-action is discussed between developers
  • Hack-hack-hack
  • Code complete is reached, QA starts to test milestone
  • Developers write tests if needed for functionality
  • Once QA signs off, and tests look solid, code is shipped live


There are two primary flaws with this workflow, the first and most obvious one is that it is far to easy to "forget to write the tests." That is, the project scheduled to start development tends to "flow forward" into the allotted test-writing time. As important as test coverage is, at the end of the day Slide did not raise funding on having solid test coverage, and our priorities lie in shipping software, first and foremost. Solving the flow-forward of scheduled projects into any available space is something that can be worked on, but never solved, it really comes down to discipline between those in charge of setting up any given project's particular roadmap.

The second, more subtle flaw in this workflow, and I think all Test-driven Devleopment workflows, revolves around the writer of the tests. The fundamental nature of almost all bugs in software is human error, our natural tendency to make mistakes means that nothing we do will ever be perfect, including our tests. If Developer A is writing a couple new methods to handle data validation prior to that data going into the database. Chances are that Developer A's life is going to be made far easier by writing some test cases to run through some predefined user-input, and pass his validation code over it. Therein lies the problem, if the developer doesn't think of a particular edge case when he's writing the code to handle the data validation, the chances he'll remember and account for that particular edge case while he's working on the unit tests is nil.

How do you really ensure that tests are of high enough quality to actually catch errors and regressions?

I think a certain extent of intra-team test writing and code review, depending on the level of communication between developers, can really help. In this case less developer communication is better. If Developer A tells Developer B how his code works, Developer B is now going to have an unnecessary expectation when he starts to write tests for Developer A's code. If Developer B reviews the code for what it actually is, instead of what Developer A thinks it is, the tests that will ultimately be written will be more thorough than if Developer A had written the whole suite himself.

This still isn't sufficiently fool-proof to where I feel all that confident in test coverage, the tests being written are subject to the availability, thoroughness and understanding that Developer B brings to the table. Inside a small team like this one, one of those is almost always in short supply (usually availability).

One approach I'm anxious to try is the more active involvement of QA engineers in the test writing process, both in the pre-fail and post-fail scenarios. The pre-fail scenario being one like that which I detailed above, where new code is being written. In this case a QA engineer's experience can help guide the developer on what sets of user-input have typically caused issues in the past. The second case, post-fail, is actually already occuring at Slide; a live issue, data validity bug, or regression is caught by QA engineers who detail the reproduction case in Trac and as a result a regression test can be written for that specific issue.

This still is subject to the three things I cited above: availability, thoroughness and understanding of those involved. I still have a lot of unanswered questions about the ideal QA and Dev workflow however, how does this scale to a team of tens or hudnreds? Who writes the tests for large teams? What about a team of 1 Dev and a 1 QA, what about the lone-hacker? How do you write quality code, without getting bogged down in the mush of writing thousands of tests for everything you can imagine could go wrong?

Who writes the tests?
Read more →

Extremely brief review of the Nokia n810

A coworker of mine was kind enough to let me borrow his Nokia n810 for a couple days to try it out as he know I was considering purchasing one for myself. I'm very glad I tried it before buying it, since I'm not going to buy it now (sorry Nokia! The princess is in another castle!)

The thought of a handheld, wireless capable, Linux device is very intriguing for me. That said, I'm not sure what I would even do with it! As I mentioned in my previous post, I like to feel cool, and the prospect of answering the question "is that Linux in your pocket or are you just happy to see me" is far to enticing to pass up. Regardless, I think the n810 suffers from some critical hardware, and software, deficiencies.

Hardware
The n810 is powered by a 400Mhz ARM processor, and comes equipped with either 128MB or 256MB or RAM (from what I can tell), I'm not entirely certain which is to blame for the sluggishness of the experience, but my guess is on the RAM. Particularly when running the browser (Gecko-based) I would experience "hiccups" where the device spent a few seconds registering input, before actually following a clicked link. This may be more at fault of the software, but for an internet tablet, the sluggishness of the browser in both user interaction and rendering time was absolutely infuritating.

The built-in keyboard is smooth, a little too smooth for my taste; I found myself constantly struggling to hit the right keys with my fingers (my thumb is the width of 2.5 columns of keys). Unlike most US keyboard layouts, the n810 keyboard has a lot of keys in "weird" places that I could not get a hang of over the course of a weekend. I eventually gave up on trying to chat or use SSH on the device because I found it so painful to try to type on the device.

The battery life was nothing to write home about, closer to a laptop battery life, instead of a phone's battery life.


The Software
Despite being Linux-based, the device doesn't feel like Linux at all, which I think is a good thing for the mass-market. The "Home" screen was pretty slick, with the ability to add applets to the "desktop" to report things like weather, time, VPN status, etc. A cross-between systray and Dashboard, the Home screen was where I felt most comfortable in the device (the "home" screen in my Smartphone is set up with similar informational panels). Once leaving "Home" I was soon frustrated again, I still haven't figured out whether or not the "Accounts" preference in the Control Panel (for IM accounts) and the installation of Pidgin are the same thing or not. Email and IM, the two other foundations of what I would expect from an "internet tablet" were weak. Neither of them cooperated with any of the IMAP/SSL or Jabber/SSL servers I use, and they both seemed to be targeted at webmail and chat services like GMail and GTalk.

Maemo does use .deb packages for installation, so I could pretty easily find some of my favorite open source applications in the Maemo repositories, unfortunately the GUI frontend for apt-get on Maemo allows for only one operation at a time (no checking multiple boxes and then clicking "Install") so adding new software was literally a 30 minute operation.


Conclusion
I don't think I'm being too negative in saying that I'm disappointed in Nokia for releasing what I think is such a substandard product. With the ubiquity of wireless in San Francisco, having a nice solid ultra-portable machine that I can actually fit into my pocket is exciting, The Nokia n810 is certainly not that machine.

This week I'm shipping my ASUS Eee PC off for my little sister, so I'm starting to look more and more for something even more portable to fill the void, right now the leader is the OQO model 02 which is about 2 times the price of the n810, and ships with Vista by default, but with Ubuntu and close to 6 hours of battery life I think it could be the ultra-portable that I've been looking for.
Read more →

I'm using Git because it makes me feel cool

Let's be honest for a second, anybody who knows me knows that I'm clearly an insecure person; I spend the majority of my time trying my best to appear cool. I've owned a lot of Macs in my life, not because they're solid machines with a fantastic operating system, but because I felt so damn smug and cool whenever I was doing anything on my Macs. I also developed Mac software for a while, not because it was my passion or Objective-C and Cocoa are practically God's gift to software, but because Mac developers are so cool, what with the black-rimmed glasses and fancy coffees. Hell, I remember when I finally traded my MacBook Pro for a Thinkpad running Linux; it had nothing to do with an ideological stance against Apple's treatment of developers or frustrations with Leopard, it was all about the new geek-chic that was Linux. Thus far, my life has basically been one big quest for more leet-points.

Then came Git.

When I started out in the software world, I was using CVS, which was a notch less cool than a slim IBM salesman's tie. The constant moaning and groaning of fellow developers using CVS, combined with the shame that I felt when I finally told my parents about my use of CVS was too much to bear. I had to switch.

I remember the first time I tried Subversion, I remember talking to Dave and saying "Meh, I'll stick with CVS!" Soon enough, just like the Macarena, Subversion swept the nation up. Subversion was the newest, coolest thing ever, developers rushed into the streets exclaiming "it sucks less than CVS! It sucks less than CVS!" I switched over to Subversion and all of a sudden I was cool again. One by one, open source projects I knew about switched over to Subversion, then Source Forge switched over to Subversion and in an instant, Subversion replaced CVS and became the mainstream version control system. Subversion had grown up, gotten married, a 401k and health insurance, how uncool.

After joining Slide, which used Subversion, I found myself burning up inside. Here I was at this hip start-up, really feeling cool, but still using the same version control system that uncool companies like, Yahoo! and Sun use. I would not stand for this. As 2007 became 2008 the writing was on the wall, Git was our new bicycle. It had been blessed by Saint Torvalds and clearly we needed to get in on the ground floor of the new cool before it became mainstream.

We needed to switch to Git immediately. Who cares if Git is extremely fast, it's not like time is money or something ridiculous like that. What do I care if Git handles branches and merge histories unlike CVS or Subversion? With its immense coolness-factor, I didn't even consider that Git will allow us to work in a decentralized workflow or a centralized workflow, nope, didn't even cross my mind. If one were to make a list of Pros and Cons of Git versus whichever other version control system, you could just put "Pro: Cool" at the top of the list, underlined, in bold, and the rest would be moot as far as I'm concerned.

Unlike Subversion or Perforce, Git doesn't have corporate backing, Git is distributed, like a guerilla-force sweeping through the jungle ready to pownce on an unsuspecting platoon; that's freakin' cool. Git rides a motorcycle, wears a leather jacket, makes women swoon and kicks ass and/or jukeboxes.

Git is the Fonz. Cool.

Don't make any false assumptions about my feelings towards Git, it's not like it's a clearly superior version control system or anything, I'm using it only because I want to be cool too.

Read more →

Git Protip: By commiting that revision, you fucked us (git revert)

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the third Protip written to date.




The concept of "revert" in Git versus Subversion is an interesting, albeit subtle, change, mostly in part due to the subtle differences between Git and Subversion's means of tracking commits. Just as with Subversion, you can only revert a committed change, unlike Subversion there is a 1-to-1 mapping of a "commit" to a "revert". The basic syntax of revert is quite easy: git revert 0xdeadbeef, and just like a regular commit, you will need to push your changes after you revert if you want others to receive the revert as well.

In the following example of a revert of a commit, I also use the "-s" argument on the command line to denote that I'm signing-off on this revert (i.e. I've properly reviewed it).


xdev3% git revert -s c20054ea390046bd3a54693f2927192b2a7097c2
----------------[Vim]----------------
Revert "merge-to-release unhide 10000 coin habitat"

This reverts commit c20054ea390046bd3a54693f2927192b2a7097c2.

Signed-off-by: R. Tyler Ballance <tyler@slide.com>
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch wip-protips
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: bt/apps/pet/data.py


+ python bt/qa/git/post-commit.py -m svn@slide.com
Sending a commit mail to svn@slide.com
Created commit a6e93b8: Revert "merge-to-release unhide 10000 coin habitat"
1 files changed, 4 insertions(+), 3 deletions(-)



Reverting multiple commits


Since git revert will generate a new commit for you every time you revert a previous commit, reverting multiple commits is not as obvious (side note: I'm aware of the ability to squash commits, or --no-commit for git-revert(1), I dislike compressing revision history when I don't believe there shouldn't be compression). If you want to revert a specific merge from one branch into the other, you can revert the merge commit (provided one was generated when the changes were merged). Take the following example:

commit 81a94bb976dfaaaae42ae2600b7e9e88645ebd81
Merge: 8134d17... d227dd8...
Author: R. Tyler Ballance <tyler@slide.com>
Date: Thu Nov 20 10:15:16 2008 -0800

Merge branch 'master' into wip-protips



I want to revert this merge since it refreshed my wip-protips branch from master, and brought in a lot changes tat have destablized my branch. In the case of reverting a merge commit, you need to specify -m and a number to denote where the mainline branch for Git to pivot off of is, -m 1 usually suffices. So the revert of the commit above will look something like this:

git revert 81a94bb976dfaaaae42ae2600b7e9e88645ebd81 -m 1



Then my revert commit will be committed after I review the change in Vim:

commit 8cae4924c4c05dadaaeccb3851cfc9ec1b8efd0f
Author: R. Tyler Ballance <tyler@slide.com>
Date: Thu Nov 20 10:20:44 2008 -0800

Revert "Merge branch 'master' into wip-protips"

This reverts commit 81a94bb976dfaaaae42ae2600b7e9e88645ebd81.



Let's take the extreme case where I don't have a merge commit to pivot off of, or I have a particular set of bare revisions that I need to revert in one pass, you can start to tie Git subcommands together like git-rev-list(1) to accomplish this. This hypothetical situation might occur if some swath of changes have been applied to a team-master that need to be backed out. Without a merge commit to key off of, you have to revert the commits one by one, but that doesn't mean you have to revert each one by hand:
for r in `git rev-list master...master-fubar --since="8:00" --before="12:00" --no-merges`; do git revert --no-edit -s $r; done
In the above example, I can use git-rev-list(1) to give me a list of revisions that have occurred on "master-fubar" that have not occurred on "master" between the times of 8 a.m. and 12 p.m., excluding merge commits. Since git-rev-list(1) will return a list of commit hashes by default, I can loop through those commit hashes in chronological order and revert each one. The inner part of the loop signs-off on the revert (-s) and then tells git-revert(1) to auto-commit it without opening the commit message in Vim (--no-edit). What this then outputs is the following:

xdev% for r in `git rev-list master...master-fubar --since="8:00" --before="12:00" --no-merges`; do git revert --no-edit -s $r; done
Finished one revert.
Created commit b6810d7: Revert "a test, for you"
1 files changed, 1 insertions(+), 1 deletions(-)
Finished one revert.
Created commit 83156bd: Revert "These are not the droids you are looking for
1 files changed, 2 insertions(+), 0 deletions(-)
Finished one revert.
Created commit 782f328: Revert "commented out stuff"
1 files changed, 0 insertions(+), 3 deletions(-)
Finished one revert.
Created commit 2b8d664: Revert "back on again"
1 files changed, 1 insertions(+), 1 deletions(-)
xdev%



For specific usage of "git-revert" or "git-rev-list" refer to the git-revert(1) man page or the git-rev-list(1) man page



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Git Protip: Learning from your history (git log)

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the second Protip written to date.





One of the major benefits to using Git is the entirety of the repository being entirely local and easily searched/queried. For this, Git has a very useful command called git log which allows you to inspect revision histories in numerous different ways between file paths, branches, etc. There are a couple basic scenarios where git log has become invaluable, for me at least, in order to properly review code but also to track changes effectively from point A to point B.


  • What's Dave been working on lately? (with diffs)
    • git log -p --no-merges --author=dave

    The --no-merges option will prevent git log from displaying merge commits which are automatically generated whenever you pull from one Git branch to another



  • Before I merge this branch down to my team master, I want to know what files have been changed and what revisions
    • git log --name-status master-topfriends...proj-topfriends-thing


    Git supports the ability with git log and with git diff to provide unidirectional branch lookups or bidirectional branch lookups. For example, say the left branch has commits "A, B" and the right branch has commits "A, C". The ... syntax will output "C", whereas .. will output "B, C"



  • I just got back from a vacation, I wonder what's changed?
    • git log --since="2 weeks ago" --name-status -- templates

    At the tail end of a git log command you can specify particular paths to look up the histories for with the -- operator, in this case, I will be looking at the changes that have occured in the templates directory over the past two weeks



  • Most recent X number of commits? (with diffs)

    • git log -n 10 --no-merges -p

All git log commands automatically filter into less(1) so you can page through the output like you would normally if you executed a svn log | less. Because git log is simply reading from the locally stored revision history you can quickly grep the history by any number of different search criteria to gain a better understanding of how the code base is changing and where.

For more specific usage of `git log` refer to the git log man page




Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Why we chose Git, a rebuttal.

One thing I learned early on in the internet, when is was more of a cobbling instead of a series, of tubes, was not to feed trolls. That said, I found that my post "Delightfully Wrong About Git" had found it's way into such silly news aggregation machines as DZone, Reddit and Hacker News. Some of the points raised in the comments were valid and warrant a response, while the majority of them were the standard responses to any discussion about version control "psh, dumb. should have used [Bazaar/Mercurial/Darcs/Subversion/Team Foundation System]"

Why not another (D)VCS?
One of the most resounding criticisms/questions was this one, why not Bazaar? Why not Mercurial! My favorite, albeit childish, retort is "why?" But I can say that I have tried a variety of other version control systems, Git, Bazaar, CVS, Subversion, Perforce and some other proprietary VCSes at previous employers. While both Darcs and Mercurial seem to be very solid DVCSes, they suffer from a problem of momentum, Darcs in particular. They both appear to be victims of Git's success, while there is inherently nothing wrong with either of them, they are competing with Linus' love-child, Git. When chosing to move to a new VCS in a company that is well over 50+ employees, the staying power of the technology you chose is important. I feel confident that Git will not only be supported, but actively developed and improved for years to come.

More importantly than that though, I like Git. Is that not enough right there? Slide makes excessive use of branches, tags and other "complex" VCS concepts that centralized systems like CVS and Subversion have trouble with.Git Branch Madness! With Subversion creating branches in the volume in which we create branches spiralled out of control with branches becoming "stale" quickly, meaning that if we didn't refresh the branch regularly with updates from trunk it would be nearly impossible to cleanly merge back down into trunk. With my current Git clone of our primary repository, I have 23 branches (roughly 6 personal local branches, 5 old branches, and 12 active branches). Our primary Git repository has been online for about 6 months and currently has 68 branches in it, rougly 55 are active.
Why all the love for Git, but nobody every talks about Bazaar, Mercurial, Darcs, etc? Sure Git is faster, but unless you've got a enormous code base (like the linux kernel), it seems like Bazaar or Mercurial would be a better choice than Git.

One of the better known selling points of Git is that it's fast. My cloned repository of the primary Slide Git repository weighs in at a hefty 7.1GB. The latest revision number in our Subversion repository is in the 103,000 range, tacked onto that our tree is just over 2GB in size, and you've got a lot of history to keep track of. Git handles this without a sweat. despite hitting the disk extremely hard when switching to a very out of date branch. With this fix from Nico, the last of the mmap(2) allocation issues we were experiencing vanished as well.

Stop re-inventing the wheel!
One of the more interesting sentiments I noticed perusing the various comments made regarding my previous post were that we are "re-inventing the wheel" by writing scripts, hooks and other wrappers to use a product like Git. The notion that having scripts and hooks for something you use in daily development is re-inventing the wheel, or gratuituous strikes me as laughable at best. We're developers. We write scripts. Why didn't I ever write a myriad of scripts when I was an avid Subversion user? I did.. There's an enormous different between writing scripts to compensate for a poorly performing product, and writing scripts to further enhance you or your colleagues' workflows, Git's hook support falls into the latter category.

The "religion" aspect of the whole version control debate was never considered in our transition to Git, nor was the buzz. I'm far more interested in what makes other VCSes better or worse than Git, so that Git can be improved instead of a justification to ditch Git for yet-another-dvcs. I like to think of the various tools like version control that we developers use as something more relatable: work pants. A good pair of work pants should be flexible enough to allow you to get your work done, modest enough to stay out of the way and most importantly, a good pair of work pants should keep your junk safe ;)

I'm still happy to answer more specific questions about Git and how/why it works for us as well as it has, but I think most of the questions I've seen thus far have been answered above.




Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Git Protip: Stash the goods yo (git stash)

For about a month now I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide. I've been using them to slowly and casually introduce some of the more "interesting" features Git has to offer as we move away from Subversion entirely. Below is the first Protip I sent around, I'll be sure to send the rest in good time.





Given the nature of how Git is structured, in that your "working copy" is also your "repository" you might find yourself switching branches relatively often. While you can actually switch branches with uncommited changes outstanding in your Git directory, it's not advised (as you might forget you're commiting changes in the wrong branch, etc). You have two options if you are halfway through some work, one is to commit a checkpoint revision, but the other is to make use of the git stash command.


A scenario where this becomes especially useful would be: Bill is working in his local branch "wip-funtime" on replacing large swaths of unnecessary useless code and Ted accidentally pushes some of Bill's other changes from another branch live and things break. Bill could commit his code and write a fairly uninformative log message like "checkpoint" which cheapens the value of the revision history of his changes or Bill can use git stash to snapshot his currently working state and context switch. In this case Bill would execute the following commands:


  • git stash
  • git checkout master-media
  • perform hotfixes
  • git checkout wip-funtime
  • git stash pop


After performing the git stash pop command, Bill's Git repository will be in the exact same state, all his uncommitted changes in tact, as it was when he originally stashed and context-switched.


For specific usage of `git stash` refer to the git stash man page



Example usage of `git stash`


Stashing changes away

tyler@starfruit:~/source/git/main/bt> git stash
Saved working directory and index state "WIP on master-topfriends: 7b1ce9e... TOS copy fix"
(To restore them type "git stash apply")
HEAD is now at 7b1ce9e TOS copy fix
tyler@starfruit:~/source/git/main/bt>
Looking at the stash

tyler@starfruit:~/source/git/main/bt> git stash list
stash@{0}: WIP on master-topfriends: 7b1ce9e... TOS copy fix
stash@{1}: On master-topfriends: starfruit complete patchset
stash@{2}: On wip-classmethod: starfruit patches
tyler@starfruit:~/source/git/main/bt>
Grabbing the latest from the stash

tyler@starfruit:~/source/git/main/bt> git stash pop
Dropped refs/stash@{0} (94b9722b5a999c32c4361d795ee8f368d8412f9a)
tyler@starfruit:~/source/git/main/bt>
Grabbing a specific stash

tyler@starfruit:~/source/git/main/bt> git stash list
stash@{0}: WIP on master-topfriends: 7b1ce9e... TOS copy fix
stash@{1}: On master-topfriends: starfruit complete patchset
stash@{2}: On wip-classmethod: starfruit patches
tyler@starfruit:~/source/git/main/bt> git stash apply 2
# On branch master-topfriends
# Changed but not updated:
# (use "git add ..." to update what will be committed)
#
# modified: db/dbroot.py
# modified: gogreen/coro.py
# modified: py/bin/_makepyrelease.py
# modified: py/initpkg.py
# modified: py/misc/_dist.py
# modified: py/misc/testing/test_initpkg.py
# modified: py/path/local/local.py
# modified: py/test/terminal/terminal.py
tyler@starfruit:~/source/git/main/bt>



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Git integration with Hudson and Trac.

As I mentioned in my previous post about Git at Slide, I wanted to answer some questions that we had to answer to migrate to Git for our development workflow. One of the major questions that had to be answered, especially for our QA department to sign off on the idea was:
How will Git integrate with Hudson, Trac and our other pieces of development infrastructure?


For us to use any version control system, centralized or decentralized, there had to be a "central" point for changes to integrate into in order for us to properly test releases and then ship them to the live site. With this requirement, we oriented our use of Git around a centralized repository which developers pull from, and push to on a regular basis.

In order for Git to integrate into Trac and Hudson, we opted for baking the functionality we needed into the post-receive hook on the centralized repository instead of relying on GitTrac, or the Hudson Git plugin to do what we needed it do to.

You can find the script below, or in this GitHub repository. The script requires the Trac XML-RPC plugin to be installed in order to properly annotate tickets when changes are pushed into the central repository. The notation syntaxes that the post-receive.py script supports in commit messages are:
re #12345
qa #12345
attn bbum,fspeirs


As one might expect, the first notation: "re #12345" will simply annotate a ticket with the commit message and the branch in which the commit was pushed into. The "qa #12345" notation part of an internal notation of marking tickets in Trac as "Ready for QA", which let's our QA engineers know when tickets are ready to be verified; a "qa" note in a commit message will reference the commit and change the status of the ticket in question. The final notation that the script supports: "attn bbum,fspeirs" is purely for calling attention to a code change, or to ask for a code review. When a commit is pushed to the central repository with "attn" in the commit message, an email with the commit message and diff will be emailed to the specified recipients.

In addition to updating Trac tickets, pushes into any branch that have a Hudson job affiliated will use the Hudson External API to queue a build for that branch. In effect, it you "git push origin master", the post-receive.py script will ping Hudson and ask it to queue a build of the "master" job.

I have included the script inline below for those weary of clicking links like this one to the GitHub repository containing the script. Enjoy :)

'''
Copyright (c) 2008 Slide, Inc

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
'''

'''
For questions, patches, etc contact R. Tyler Ballance
'''
import getpass
import os
import re
import socket
import smtplib
import sys
import time
import xmlrpclib

from optparse import OptionParser

MAIL_SERVER = 'your_mail_server.com'
MAIL_SUFFIX = '@mycompany.com'
BUILD_HUDSON = True
HUDSON_URL = 'http://hudson'
TRAC_XMLRPC_URL = 'URL_TO_TRAC/projects/MYPROJECT/login/xmlrpc'

def rpcProxy(user='qatracbot', password=None):
password = password or os.getenv('TRAC_PASS')
return xmlrpclib.ServerProxy('https://%s:%s@%s' % (user, password, TRAC_XMLRPC_URL))

def _send_commit_mail(user, address, subject, branch, commits, files, diff):
print 'Sending a GITRECEIVE mail to %s' % address
message = 'Commits pushed to %s:\n--------------------------------------\n\n%s\n--------------------------------------\n%s\n--------------------------------------\n%s' % (branch, commits, files, diff)
_send_mail(user, address, subject, message)
def _send_attn_mail(user, destuser, diff):
print 'Sending a "please review" mail to %s' % destuser
message = '''Good day my most generous colleague! I would hold you in the highest esteem and toast you over my finest wines if you would kindly review this for me\n\n\t - %(user)s\n\nDiff:\n------------------------------------------------\n%(diff)s''' % {'diff' : diff, 'user' : user}
addresses = []
for d in destuser.split(','):
addresses.append('%s%s' % (d, EMAIL_SUFFIX))
_send_mail(user, addresses, 'Please review this change', message)

def _send_mail(user, address, subject, contents):
try:
if not isinstance(address, list):
address = [address]
s = smtplib.SMTP(MAIL_SERVER)
message = 'From: %s%s\r\nTo: %s\r\nSubject: %s\r\n\r\n%s\n' % (user, MAIL_SUFFIX, ', '.join(address), subject, contents)
s.sendmail('%s%s' % (user, MAIL_SUFFIX), address, message)
s.quit()
except:
print 'Failed to send the email :('

def _update_ticket(ticket, message, options={}):
rpc = rpcProxy()
rpc.ticket.update(ticket, message, options)
return rpc

def find_re(commit):
return map(int, re.findall(r'(?i)\s+re\s*#([0-9]+)', commit))
def handle_re(branch, commit, ticket):
print 'Annotating ticket #%s' % ticket
message = '''The following was committed in "%(branch)s":
\{\{\{
%(commit)s \}\}\}
''' % {'branch' : branch, 'commit' : commit}
_update_ticket(ticket, message)

def find_qa(commit):
return map(int, re.findall(r'(?i)\s+qa\s*#([0-9]+)', commit))
def handle_qa(branch, commit, ticket):
print 'Marking ticket #%s as "ready for QA"' % ticket
message = '''The following was committed in "%(branch)s":
\{\{\{
%(commit)s \}\}\}
''' % {'branch' : branch, 'commit' : commit}
rpc = _update_ticket(ticket, message, options={'status' : 'qa'})

def find_attn(commit):
return re.findall(r'(?i)\s+attn\s*([A-Za-z,]+)', commit)
def handle_attn(branch, commit, attn):
# Unpack commit from this: "commit 5f4c31f3c31347c62d68ecb5f2c9afa3333f4ad0\nAuthor: R. Tyler Ballance \nDate: Wed Nov 12 16:57:32 2008 -0800 \n\n Merge commit 'git-svn' \n\n \n \n"
try:
commit_hash = commit.split('\n')[0].split(' ')[1]
except:
return # fuk it
diff = os.popen('git show --no-color %s --pretty=format:"Author: %%cn <%%ce>%%n%%s%%n%%n%%b%%n%%n%%H"' % commit_hash).read().rstrip()
_send_attn_mail(getpass.getuser(), attn, diff)

def mail_push(address, oldrev, newrev, refname):
user = getpass.getuser()
machine = socket.gethostname()
base_git_diff = 'git diff %s %s' % (oldrev, newrev)
files_diffed = os.popen('%s --name-status' % (base_git_diff)).read().rstrip()
full_diff = os.popen('%s -p --no-color' % (base_git_diff)).read().rstrip()
''' git rev-parse --not --branches | grep -v "$new" | git rev-list "$old".."$new" --stdin '''
commits = os.popen('git rev-parse --not --branches | grep -v "%s" | git rev-list %s..%s --stdin --pretty=format:"Author: %%cn <%%ce>%%nDate: %%cd %%n%%n %%s %%n%%n %%b %%n %%n-------[post-receive marker]------%%n" --first-parent ' % (newrev, oldrev, newrev)).read().rstrip()
branch = refname.split('/')[-1]
mail_subject = 'GITRECEIVE [%s/%s] %s files changed' % (machine, branch, len(files_diffed.split('\n')))

if branch == 'master-release':
print 'Tagging release branch'
tagname = 'livepush_%s' % (time.strftime('%Y%m%d%H%M%S', time.localtime()))
sys.stderr.write('Creating a tag named: %s\n\n' % tagname)
os.system('git tag %s' % tagname)
mail_subject = '%s (tagged: %s)' % (mail_subject, tagname)

if BUILD_HUDSON_JOB:
print 'Queuing the Hudson job for "%s"' % branch
os.system('/usr/bin/env wget -q -O /dev/null http://%s/job/%s/build' % (HUDSON_URL, branch))

_send_commit_mail(user, address, mail_subject, branch, commits, files_diffed, full_diff)

if branch == 'master':
return # we don't want to update tickets and such for master/merges

commits = filter(lambda c: len(c), commits.split('-------[post-receive marker]------'))
commits.reverse()
for c in commits:
if c.find('Squashed commit') >= 0:
continue # Skip bullshit squashed commit

for attn in find_attn(c):
handle_attn(branch, c, attn)

for ticket in find_re(c):
handle_re(branch, c, ticket)

for ticket in find_qa(c):
handle_qa(branch, c, ticket)


if __name__ == '__main__':
op = OptionParser()
op.add_option('-m', '--mail', dest='address', help='Email address to mail git push messages to')
op.add_option('-o', '--oldrev', dest='oldrev', help='Old revision we\'re pushing from')
op.add_option('-n', '--newrev', dest='newrev', help='New revision we\'re pushing to')
op.add_option('-r','--ref', dest='ref', help='Refname that we\'re pushing')
opts, args = op.parse_args()

if not opts.address or not opts.oldrev or not opts.newrev or not opts.ref:
print '*** You left out some needed parameters! ***'
exit

mail_push(opts.address, opts.oldrev, opts.newrev, opts.ref)





Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Delightfully Wrong About Git

A very long time ago I mentioned on Twitter that I was looking at Git as a replacement for Subversion and Perforce with my personal projects, but lamented moving to Git at Slide would not be feasible
Like most disagreements I've had with people on technology in the past, immediately after I said it, I actively tried to prove myself wrong. Back in April when I made the statement above, Subversion 1.4 was "good enough" (just barely) for what we wanted to do as far as source control, but I became more and more curious about whether or not we could move to Git.


Back in April, after spending a week with projects like Tailor and git-svn(1) I started to look at the potential of moving just my team over to Git for evaluation purposes. By the end of May I had requested Git to be installed on the machines that we use for development on a day-to-day basis and we moved the team over to Git by the second week of June.

What followed were six months of sloshing uphill, some of the most notable milestones that we had to figure out in this time frame were:
  • Whereas in the Subversion architecture with a central repository there is a very clear development focal point for sharing code between developers, what is this in the Git workflow?
  • How do you ensure developers don't forget code was committed "in that one branch, in that one repository" and keep track of code
  • How will Git integrate with Hudson, Trac and our other pieces of development infrastructure? (answered here)
I'll be answering these questions and share some of the scripts, hooks, and documentation we've written internally to make moving to Git throughout the company a reality. I wish I could say I was responsible for it all, but there were a number of other engineers that were extremely important in defining best practices, and what this shiny new world without Subversion would look like.

At the end of the day, I'm pleased as punch with the transition. I don't hate Subversion, I just love Git; call me "spoiled" but I think we deserve something more than a system that strives to be "a better CVS".

Update: I've posted an addendum: Why we chose Git, a rebuttal


Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Reliable Locks in Hudson

There has been some amount of discussion on the Hudson user's list recently about the status of the "Locks and Latches" plugin. The plugin allows for one to create "locks" for Jobs in a similar manner to how "locks" work in a multithreaded programming environment. The need for such a plugin becomes very clear once you start to run multiple jobs that depend on some set of shared resources, take the following example:
  • Jobs A,B,C must run unit tests that fetch data from a test site
  • Slave #1 can only run one instance of Apache at a time


How one would accomplish this with the Locks and Latches plugin would be to create a lock like "Site Lock" in the Hudson configuration, and then bind Jobs A, B, C to that Lock. Making the (large) assumption that the plugin works correctly and locks properly in order to prevent A and B from running concurrently, this would be enough to satisfy the requirements we have for the scenario above. Unfortunately it seems the plugin is largely unmaintained and buggy; in the past couple weeks of experimenting with such a set up on a variety of different slaves we've noticed that the locks aren't always respected, causing some locked jobs to execute in parallel spewing bad test results and build failures (the crux of this issue seems ot have been reported by Sergio Fernandes in #2450).

The Loopback Slave
The easiest way I found to work around the short-comings of the Locks and Latches plugin was to "break up" the Locks. Locks are only really useful if you have more than one "executor" on a Hudson node, in order to allow Hudson to execute jobs simultaneously. In essence, if you only have one executor, the Hudson queueing system will technically perform your "lock" for you by default. And thus the "loopback slave" was born! When explaining this to a co-worker, I likened my workaround to the fork(2) call, whereas the Locks and Latches plugin is much more of a pthread_mutex_lock(2) call. According to the "Distributed Builds" page on the Hudson wiki, you can start slave agent headlessly on any machine, so why not the master node?
Above is the configuration of one such "loopback slave" that took the place of one of the executors on the master node.
After setting up the loopback slave, it's just a matter of tying the Job to that node for building.

In short our set up was before: Jobs A, B, C all use the Lock "Site Job" in order to queue properly. With this change, now there is no lock, and Jobs A, B, C are all bound to the loopback slave in place of the lock on the master node. While certainly not ideal, given the frustrations of the Locks and Latches plugin going unmaintained this is the best short-term solution I've come up with thus far.



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Hudson Build Bookmarklet

During the usual friday-frenzy I sat down and wrote a quick 10 minute little bookmarklet to start a Hudson job. Unlike most bookmarklets that "do things" this one actually "does things" but doesn't take you away from your current page. Using the Hudson Remote Access API you can query information from Hudson programmatically, but you can also kick off builds remotely with nothing more than a simple HTTP request to the properly formed URL.

By dragging the link below to your bookmark bar, and updating the URL within ("http://hudson/") to the URL of your Hudson instance, you can queue a Hudson build from any page at any time (without leaving the page).


The Bookmarklet





The Code


Build Hudson Job


How it actually works

After talking the concept of making cross-domain HTTP requests over with Sergio, he suggested just using an "IMG" tag (or "IFRAME") to accomplish the task. The bookmarklet doesn't actually have to send any form parameters or receive any data, Hudson just needs to receive any HTTP request to the right URL. By creating the IMG object in JavaScript, and appending it to the body of the current page the user is on, it'll effectively con the browser into making the HTTP request without needing to pull off any XmlHttpRequest hacks. One of the more interesting things that we found out when playing with the end of the bookmarklet, was that if we returned "false" or tried to wrap the whole thing in a closure, the link would still execute and the browser would change pages. However, if we stuck an "alert()" call into the tail end of the bookmarklet JavaScript, execution would stop and the link wouldn't change the page in the browser (tested this in Firefox 3).


Happy Hudsoning :)
Read more →

Git back into Subversion, Mostly Automagically (Part 3/3)

Thus far I've covered most of the issues and hurdles we've addressed while experimenting with Git at Slide in parts 1 and 2 of this series, the one thing I've not covered that is very important to address is how to work in the "hybrid" environment we currently have at Slide, where as one team works with Git and the rest of the company works in Subversion. Our setup involves having a "Git-to-Subverison" proxy repository such that everything to the "left" of that proxy repository is entirely Subversion without exceptions and everything to the "right" of that repository is entirely Git, without exceptions. Part of my original motivation for putting this post at the end of the series was that, when I originally wrote the first post on "Experimenting with Git at Slide" I actually didn't have this part of the process figured out. That is to say, I was bringing code back and forth between Git and Subversion courtesy of git-svn(1) and some gnarly manual processes.

No Habla Branching

The primary issue when bringing changesets from Git to Subversion is based in the major differences between how the two handle branching and changesets to begin with. In theory, projects like Tailor were created to help solve this issue by first translating both the source and destination repositories into an intermediary changeset format in order to cross-apply changes from one end to the other. Unfortunately after I spent a couple days battling with Tailor, I couldn't get it to properly handle some of the revisions in Slide's three year history.

If you've ever used git-svn(1) you might be familiar with the git-svn dcommit command, which will work for some percentage of users that want to maintain dual repositories between Git and Subversion, things break down however once you introduce branching into the mix.
Up until Subversion 1.5, Subversion had no concept of "merge tracking" (even in 1.5, it requires the server and client to be 1.5, it also makes nasty use of svn props). Without the general support for "merge tracking" the concept of a changeset sourcing from a particular branch or the concept of a "merge commit" are entirely foreign in the land of Subversion. In less mumbo jumbo, this effectively means that the "revisions" that you would want to bring from Git into Subversion need to be "flattened" when being "dcommitted" into Subversion's trunk. Git supports a means of flattening revision history when merging and pulling by way of the "--squash" command line argument, so this flattening for git-svn is possible.

Giant Disclaimer

What I'm about to write I dutifully accept as Git-heresy, a nasty hack and not something I'm proud of.

Flattening into Subversion

First the icky bash script that supports properly flattening revisions into the "master" branch in the git-svn repository and dcommits the results:
#!/bin/bash

MERGE_BRANCH=mergemaster
REPO=$1
BRANCH=$2

if [[ -z "${1}" || -z "${2}" ]]; then
echo "===> You must provide a \"remote\" and a \"refspec\" for Git to use!"
echo "===> Exiting :("
exit 1;
fi

LATEST_COMMIT=`git log --max-count=1 --no-merges --pretty=format:"%H"`

function master
{
echo "==> Making sure we're on 'master'"
git checkout master
}

function setup_mergemaster
{
master
echo "==> Killing the old mergemaster branch"
git branch -D $MERGE_BRANCH

echo "==> Creating a new mergemaster branch"
git checkout -b $MERGE_BRANCH
git checkout master
}

function cleanup
{
rm -f .git/SVNPULL_MSG
}

function prepare_message
{
master

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

cp .git/SQUASH_MSG .git/SVNPULL_MSG

master
}

function merge_to_svn
{
git reset --hard ${LATEST_COMMIT}
master
setup_mergemaster

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

echo "==> Committing..."
git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase

cleanup
}

setup_mergemaster

prepare_message

merge_to_svn

master

echo "===> All done!"
Gross isn't it? There were some interesting things I learned when experimenting with this script, but first I'll explain how the script is used. As I mentioned above there is the "proxy repository", this script operates on the git-svn driven proxy repository, meaning this script is only invoked when code needs to be propogated from Git-to-Subversion as opposed to Subversion-to-Git which git-svn properly supports by default in all cases. Since this is a proxy repository, that means all the "real" code and goings-on occur in the "primary" Subversion, and "primary" Git repositories, so the code is going along this path: Primary_SVN <-> [proxy] <-> Primary_Git
This setup means when we "pull" (or merge) from Primary_Git/master we are going to be flattening at that point in order to properly merge it into the Primary_SVN. Without further ado, here's the breakdown on the pieces of the script:
function setup_mergemaster
{
master
echo "==> Killing the old mergemaster branch"
git branch -D $MERGE_BRANCH

echo "==> Creating a new mergemaster branch"
git checkout -b $MERGE_BRANCH
git checkout master
}
What the setup_mergemaster branch is responsible for is deleting any prior branches that have been used for merging into the proxy repository and Primary_SVN. It gives us a "mergemaster" branch in the git-svn repository that is effectively at the same chronological point in time as the master branch before any merging occurs.
function prepare_message
{
master

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

cp .git/SQUASH_MSG .git/SVNPULL_MSG

master
}
The prepare_message function is part of the nastiest code in the entire script, in order to get an accurate "squashed commit" commit message when the changesets are pushed into Primary_SVN, we have to generate the commit message separately from the actual merging. Since this function is performing a `git pull` from "master" into "mergemaster" the changesets that are being pulled are going to be the only ones that show up (for reasons I'm about to explain).
function merge_to_svn
{
git reset --hard ${LATEST_COMMIT}
master
setup_mergemaster

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

echo "==> Committing..."
git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase

cleanup
}
If you noticed above in the full script block, the "LATEST_COMMIT" code, here's where it's used, it is one of the most important pieces of the entire script. Basically the LATEST_COMMIT piece of script grabs the latest non-merge-commit hash from the `git log` output and saves it for later use (here) where it's used to rollback the proxy repository to the point in time just before the last merge commit. This is done to avoid issues with git-svn(1) not understanding how to handle merge commits whatsoever. After rolling back the proxy repository, a new "mergemaster" branch is created. After the mergemaster branch is created, the actual Primary_Git changesets that differ between the proxy repository and Primary_Git are pulled into the proxy repository's master branch, and sqaushed into the mergemaster branch where they are subsequently committed with the commit message that was prepared before. The "prepare_message" part of the script becomes important at that step because the "squashed commit" message that Git generates at this point in time will effectively contain every commit that has ever been proxied across in this manner ever.

After the "merge_to_svn" function has been run the "transaction" is entirely completed and the changesets that once differed between Primary_SVN/trunk and Primary_Git/master are now normalized.

Mostly Automagically

In the near future I intend on incorporating this script into the post-receive hook on Primary_Git in such a way that will truly propogate changesets automatically from Primary_Git into Primary_SVN, but currently I'm utilizing one of my new favorite "hammers', Hudson (see: One-line Automated Testing). Currently there are two jobs set up for proxying changesets across, the first "Subversion-to-Git" simply polls Subversion for changes and executes a series of commands when changes come in: git-svn fetch && git merge git-svn && git push $Primary_Git master. This is fairly straight-forward and fits in line with what git-svn(1) is intended to do. The other job that I created is "Git-to-Subversion" which must be manually invoked by a user, but still will automatically take care of squashing commits into Primary_SVN/trunk (i.e. bash svnproxy.sh $Primary_Git master).

Wrap-up

Admittedly, this sort of setup leaves a lot to be desired. In the ideal world, Tailor would have coped with both our Git and our Subversion repositories in such a way that would have made this script nothing more than a silly idea I had on a bus. Unfortunately that wasn't case and the time budget I had for figuring out a way to force Tailor to work was about 47.5 hours less than it took me to sit down and write the script above. I'd be interested to see other solutions other organizations are utilizing to migrate from one system to the other, but at the time of this writing I can't honestly say I've heard much about people dealing with the "hybrid" scenario that we have currently at Slide.



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Team Development with Git (Part 2/3)

In my last post on Git, Experimenting with Git at Slide, I discussed most of the technical hurdles that stood in our way with evaluating Git for a Subversion tree that has 90k+ revisions and over 2GB of data held within. As I've learned from any project that involves more than just myself, technology is only half the battle, the other half is the human element. One of the most difficult things to "migrate" when switching to something as critical to a developer's workflow as a VCS, is habits, good ones and bad ones.

The Bad Habits

When moving my team over to Git, I was able to identify some habits that I view as "bad" that could either be blamed on how we have used Subversion here at Slide, or the development workflow that Subversion encourages. For the sake of avoiding flamewars, I'll say it's 51% us, 49% the system.

  • The Occasional Committer
Chances are that if you're working on "something super important!" you fall into this bad habit. Because of the nature of trunk in Subversion, if you commit half-finished work into a team-branch or trunk itself, you could cause plenty of pain for your fellow developers. As a result, you tend to commit at the end of a long day working on something, or only after something has been completed. The 9 hours of sweat and frustration you've spent pounding away on 300 lines of code is now summed up in one commit message:
Turns out there was a race-condition here, re #52516
Now three months from now when you return to the same 300 lines of code and try to figure out what the hell led to this mess, you're left with the commit message above, and nothing more.
  • The Less-than-attentive Developer
I've worked on a Mac for the majority of my time at Slide, as do most of my compatriots, and sooner or later one of two things will happen:svn add some/directory/ and/or svn commit. This usually results in a second commit to come into the tree with a commit message like:
Whoops, accidentally checked in resource forks
This isn't that large of a problem, except for the implication of the second command there, svn commit will commit all outstanding changes in your working copy starting in the current working directory, and recursing through children directories. I'm probably more anal-retentive about my commits than most, but I usually do a diff before I commit to make sure I'm aware of what I'm about to commit, but I've seen plenty of developers skip this step.
  • The Over-Confident Merger
I've fallen into this trap numerous times when merging "old" branches back into trunk, especially with binary files that may have been changed in trunk, or in my branch (hell if I know!). One thing I can speak to anecdotally from our work at Slide, is that the probability of nonsensical conflicts rises with a branch's age. The rate of our repository progresses at about 50 commits to trunk per day (~150 commits across the board), if there is a branch cut from trunk, usually within two weeks it can become extremely difficult to merge back into trunk without constant "refreshes" or merges from trunk into the branch.

If you're not careful when folding that branch back down into trunk, you can inadvertantly revert old binary files or even text files to previous states which will usually cause other individuals in the engineering organization gripe at you and your QA department to pelt you with rocks. For bonus points, you could (as I have done before) accidentally commit conflicting files earning a gold star and a dunce hat for the day. This merging pain led me to originally write my merge-safe.py script so long ago.


The Slide Way to Git

Fortunately for us, I think the decentralized nature of Git has helped us enforce some best practices when it comes to the bad habits above. "The Occassional Committer" is all but done away with thanks to the ability to atomically commit and revert revisions at a whim and have those changes not propogated to other developers until there has been an explicit push or pull.

Unfortunately however, "The Less-than-attentive Developer" isn't solved so easily. To date I've sat next to two engineers that were new to Git, and watched them both execute the same fateful command: git add .
Not realizing their mistake, they accidentally commit a truckload of build and temporary files (.so, .swp, .pyc, etc) interspersed with their regular work that they meant to commit. Git cannot prevent a developer from shooting themselves in the foot, but it does prevent them from shooting everybody else in the foot along with it (unless they commit, and then push their changes upwards).

"The Over-confident Merger" grows more and more confident in the Git-based workflow. Since Git handles changesets atomically, it becomes trivial to merge branch histories together or cherry-pick one revision and apply to an entirely separate branch. I've not yet seen a Git conflict that wasn't a true conflict insofar that it was quite literally one line of code changing in two different ways between branch histories. As an aside, when using git-svn, be prepared for all the merging "fun" that Subversion has to offer when propogating changes between the two systems.

Basic Team Workflow

The manner in which we use Git is more like a centralized-decentralized version control system. We still have a "master" repository, which provides a central synchronization point when pushing stage servers, or when bringing code into Subversion to be pushed to live servers. For any particular project one of the developers will create a branch that will serve as the primary project branch, take the "superpoke-proj" branch as an example. That developer will push this branch to "origin" (or the master repository) such that other developers can "track" that branch and contribute code. For the purposes of this example, let's say Paul and Peter are working in "superpoke-proj", while Paul is working he will incrementally commit his work, but once he has resolved/fixed a ticket, he will perform a git push and then mark the ticket appropriately such that a QA engineer can verify the fix. If Paul and Peter are working on something that "breaks the build" but they need to collaborate on it together, Paul can perform a git pull from Peter and vice versa, and again, once they're done those changes will be pushed to origin. This model allows for developers to work in relative isolation so they're not inadvertantly stepping on each others' toes, but also close enough that they can collaborate in explicit terms, i.e. when they are ready for changes to be propogated to each other or the rest of the team.

Conclusion

Our workflow, like most things at companies under 500 employees is still a "work in progress™". I think we've found the right balance thus far for the team between freedom and process the allow for sufficient mucking around in the codebase in a way that provides the most amount of time actually writing code with as little possible time spent dealing with the overhead of anything else (merging, etc). There's nothing inherently special in the way we use Git, but we've found that it works for the way we work, which is to say in a very tight release schedule that's requires multiple branches per week and plenty of merging from branch-to-branch whether it be from another team or another part of the same team.

Of course, your mileage may vary.



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Facebook be riddled with swashbucklers!

I've seen a lot of user-feedback about how confusing and "boring" the new Facebook redesign is, but I'm glad to know they are still having fun down there in Palo Alto, even if it's with subtle changes to their site (click to zoom)
Arrr Facebok


To enable the pirate localization, find the language combo box at the bottom-left portion of the Facebook homepage.


Stay classy Facebook.
Read more →