Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Awesomely Bad

A coworker of mine, @teepark and I recently fell in love with tiling window managers, Awesome in particular. The project has been interesting to follow, to say the least. When I first installed Awesome, from the openSUSE package directory, I had version 2, it was fairly basic, relatively easy to configure and enough to hook me on the idea of a tiling window manager. After conferring with @teepark, I discovered that he had version 3 which was much better, had some new fancy features, and an incremented version number, therefore I required it.

In general, I’m a fairly competent open-source contributor and user. Autoconf and Automake, while I despise them, aren’t mean and scary to me and I’m able to work with them to fit my needs. I run Linux on two laptops, and a few workstations, not to mention the myriad of servers I’m either directly or peripherally responsible for. I grok open sources. Thusly, I was not put off by the idea of grabbing the latest “stable” tarball of Awesome to build and install it. That began my slow and painful journey to get this software built, and installed.

  • Oh, it needs Lua, I’ll install that from the repositories.
  • Hm, what’s this xcb I need, and isn’t in the repositories. I guess I’ll have to build that myself, oh but wait, there’s different subsets of xcb? xcb-util, xcb-proto, libxcb-xlib, xcb-kitchensink, etc.
  • Well, I need xproto as well, which isn’t in the repositories either.
  • CMake? Really guys? Fine.
  • ImLib2, I’ve never even heard of that!
  • libstartup-notification huh? Fine, i’ll build this too.

After compiling what felt like an eternity of subpackages, I discovered a number of interesting things about the varying versions of Awesome v3. The configuration file format has changed a few times, even between one release candidate to another. I ran across issues that other people had that effectively require recompilling X11’s libraries to link against the newly built xcb libraries in order to work (/usr/lib/libxcb-xlib.so.0: undefined reference to _xcb_unlock_io). Nothing I seemed to try worked as I might expect, if I couldn’t recompile the majority of my system to be “bleeding edge” I was screwed. The entire affair was absolutely infuriating.

There were a few major things that I think the team behind Awesome failed miserably at accomplishing, that every open source developer should consider when releasing software:

  • If you depend on a hodge-podge of libraries, don’t make your dependency on the bleeding edge of each package
  • Maintain an open dialogue with those that package your software, don’t try to make their job hell.
  • When a user cannot build your packages with the latest stable versions of their distribution without almost rebuilding their entire system, perhaps you’re “doin’ it wrong”
  • Changing file formats, or anything major between two release candidates is idiocy.
  • If you don’t actually care about your users, be sure to state it clearly, so then we don’t bother using or trying to improve your poor quality software

In the end, I decided that Haskell isn’t scary enough not to install XMonad, so I’ve started replacing machines that run Awesome, with XMonad, and I’m not looking back. Ever.

Read more →

Jython, JGit and co. in Hudson

At the Hudson Bay Area Meetup/Hackathon that Slide, Inc. hosted last weekend, I worked on the Jython plugin and released it just days after releasing a strikingly similar plugin, the Python plugin. I felt that an explanation might be warranted as to why I would do such a thing.

For those that don’t know, Hudson is a Java-based continuous integration server, one of the best CI servers developed (in my humblest of opinions). What makes Hudson so great is a very solid plugin architecture allowing developers to extend Hudson to support a wide variety of scripting languages as well as notifiers, source control systems, and so on (related post on the growth of Hudson’s plugin ecosystem). Additionally, Hudson supports slaves on any operating system that Java supports, allowing you to have a central manager (the “master” Hudson server/node) and a vast network of different machines performing tasks and executing jobs. Now that you’re up to speed, back to the topic at hand.

Jython versus Python plugin. Why bother with either, as @gboissinot pointed out in this tweet? The interesting thing about the Jython plugin, particularly when you use a large number of slaves is that with the installation of the Jython plugin, suddenly you have the ability to execute Python script on every single slave, regardless of whether or not they actually have Python installed. The more “third party” that can be moved into Hudson by way of the plugin system means reduced dependencies and difficulty setting up slaves to help handle load.

Take the “git” versus the “git2” plugin, the git plugin was recently criticized on the #hudson channel because of it’s use of the JGit library, versus “git2” which invokes git(1) on the command line. The latter approach is flawed for a number of reasons, particularly the reliance on the git command line executables and scripts to return consistent formatting is specious at best even if you aren’t relying on “porcelain” (git community terminology for front-end-ish script and code sitting on top of the “plumbing”, the breakdown is detailed here). The command-line approach also means you now have to ensure every one of your slaves that are likely to be executing builds have the appropriate packages installed. One the flipside however, with the JGit-based approach, the Hudson slave agent can transfer the appropriate bytecode to the machine in question and execute that without relying on system-dependencies.

The Hudson Subversion plugin takes a similar approach, being based on SVNKit.

Being a Python developer by trade, I am certainly not in the “Java Fanboy” camp, but the efficiencies gained by incorporating Java-based libraries in Hudson plugins and extensions is a no brainer, the reduction of dependencies on the systems incorporated in your build farm will save you plenty of time in maintenance and version woes alone. In my opinion, the benefits of JGit, Jython, SVNKit, and the other Java-based libraries that are running some of the most highly used plugins in the Hudson ecosystem continue to outweigh the costs, especially as we find ourselves bringing more and more slaves online.

Read more →

Template Theory

Since becoming the (de-facto) maintainer of the Cheetah project I’ve been thinking more and more about what a templating engine should do and where the boundary between template engine and language are drawn. At their most basic level, template engines are means of programmatically generating large strings or otherwise massaging chunks of text. What tends to separate template engines from one another are: the language they’re written in and what level of “host-language” access they offer the author of the template.

Cheetah is special in that for all intents and purposes Cheetah is Python which blurs the line between the controller layer and the view layer, as Cheetah is compiled into literal Python code. In fact, one of the noted strengths of Cheetah is that Cheetah templates can subclass from regular Python objects defined in normal Python modules, and vice versa. That being the case, how do you organize your code, and where should particular portions physically reside in the source tree? What qualifies code to be entered into a .py file versus a .tmpl file? If you zoom out from this particular problem, to a larger scope, I believe there is a much larger question to be answered here: as a language, what should Cheetah provide?

Since Cheetah compiles down to Python, does it merit introducing all the Python constructs that one has at their disposal within Cheetah, including:

  • Properties
  • Decorated methods
  • Full/multiple inheritance
  • Metaclasses/class factories

Attacked from the other end, what Cheetah-specific language constructs are acceptable to be introduced into Cheetah as a Python-based hybrid language? Currently some of the language constructs that exist in Cheetah that are distinct to Cheetah itself are:

  • #include
  • #filter
  • #stop
  • #shBang
  • #block
  • #indent
  • #transform
  • #silent
  • #slurp
  • #encoding

Some of the examples of unique Cheetah directives are necessary in order to manipulate template output in ways that aren’t applicable to normal Python (take #slurp, #indent, #filter for example), but where does one draw the line?

Too add yet another layer of complexity into the problem, Cheetah is not only used in the traditional Model-View-Controller set up (e.g. Django + Cheetah templates) but it’s also used to generate other code, i.e. Cheetah is sometimes used as a means of generating source code (bash, C, etc).

In My Humble Opinion

Cheetah, at least to me, is not a lump of text files that you can perform loops and use variables in, it is a fully functional, object-oriented, Pythonic text-aware programming language. Whether or not it compiles to Python or is fully interoperable with Python is largely irrelevant (that is not to say that we don’t make use of this feature). As far as “what should Cheetah provide?” I think the best way to answer the question is to not think about Cheetah as Python, or as a “strict” template engine (Mako, Genshi, etc) but rather as a domain specific language for complex text generation and templating. When deciding on what Python features to expose as directives in Cheetah (the language) the litmus test that should be evaluated against is: does this make generating text easier?

Cheetah need not have hash-directives for every feature available in Python, the idea of requiring meta-classes in Cheetah is ridiculous at best, a feature like decorators however could prove quite useful in text processing/generation (e.g. function output filters), along with proper full inheritance.

My goals ultimately with Cheetah, are to make our lives easier developing rich interfaces for our various web properties, but also to make “things” faster. Whereas “things” can fall under a few different buckets: development time, execution time, maintenance time.

Cheetah will likely look largely the same a year from now, and if we (the developers of Cheetah) have done our jobs correctly, it should be just as simple to pick up and learn, but even more powerful and expressive than before.

Read more →

Slide Open Source

It’s not been a secret that I’m a big fan of open source software, I would likely not be where I am today had I not started with projects like OpenBSD or FreeBSD way back when. While my passion for open source software and the “bazaar” method of developing software might not be shared by everybody at Slide, Inc, everybody can certainly get on board with the value of incorporating open source into our stack, which is almost entirely comprised of Python (and an assortment of other web technologies).

Along those lines, there’s been some amount of discussion about what we can or should open source from what we’ve developed at Slide, but we’ve not really pushed anything out into the ether as of yet. Today however, I think we finally put our foot in the door as far as contributing back to the open source community as a whole, we’re now on GitHub as “slideinc, yay! (coincidentally we have a slideinc twitter account too)

Currently the only project that’s come directly out of Slide, and shared via the slideinc GitHub account is PyVE, a Python Virtual Earth client that I hacked together recently to tinker with some Geocoding (released under a 3-clause BSD license). In the (hopefully) near future we’ll continue to open source some other components we’ve either created or extended internally.

If you’re not a GitHub user, you should definitely check GitHub out, it’s a pretty impressive site. If you are a GitHub user, or a Python developer, you should “follow” the slideinc user on GitHub to catch the cool stuff that we may or may not ever actually release ;)

Read more →

Breathing life into a dead open source project

Over the past couple years that I have been working at Slide, Inc. I’ve had a love/hate relationship with the Cheetah templating engine. Like almost every templating engine, it allows for abuse by its users, which can result in some templating code that looks quite horrendous, contributing significantly to some negative opinions of the templating engine. At one point, I figured an upgrade of Cheetah would help correct some of these abuses and I distinctly remember pushing to upgrade to the 2.xx series of Cheetah. I then found out that I had unintentionally volunteered myself to oversee the migration and also to update any ancient code that was lying around that depended on “features” (see: bugs) in Cheetah prior to the 2.xx series. We upgraded to Cheetah 2.xx and life was good, but Cheetah was practically dead.

The last official release of Cheetah was in November of 2007, this is not something altogether uncommon in the world of open source development. Projects come and go, some reach a point in their growth and development where they’re abandoned, or their community dissipates, etc. As time wore on, I found myself coming up with a patch here and there that corrected some deficiency in Cheetah, but I also noticed that many others were doing the same. There was very clearly a need for the project to continue moving forward, and with my introduction to both Git and GitHub as a way of distributing development, I did what any weekend hacker is prone to do, I forked it. Meet Community Cheetah —————- On January 5th, 2009 I started to commit to my local fork of the Cheetah code base (taken from Cheetah CVS tree), making sure my patches were committed but also taking the patches from a number of others on the mailing list. By mid-March I had collected enough patches to properly announce Cheetah Community Edition v2.1.0 to the mailing list. I was entirely unprepared for the response.

Whereas the previous 6 months of posts to the mailing list averaged about 4 messages a month, March exploded to 88 messages, 20 of them in the thread announcing Cheetah CE (now deemed Community Cheetah (it had a better ring to it, and an available domain name to boot)). All of a sudden the slumbering community is awake and the patches have started to trickle in.

We’ve fixed some issues with running Cheetah on Python 2.6, Cheetah now supports compiling templates in parallel, issues with import behavior have been fixed and added a number of smaller features. In 2008 there were six commits to the Cheetah codebase, thus far in 2009 there have been over seventy (I’m still waiting on a few patches from colleagues at other startups in Silicon Valley as well).

I’m not going to throw up a “Mission Accomplished” banner just yet, Cheetah still needs a large amount of improvement. It was written during a much different era of Python, the changes in Python 2.6 and moving forward to Python 3.0 present new challenges in modernizing a template engine that was introduced in 2001.

Being a maintainer

Starting your own open source project is tremendously easy, especially with the advent of hosts like Google Code or GitHub. What’s terrifying and difficult, is when other people depend on your work. By stepping up and becoming the de-facto maintainer of Community Cheetah, I’ve opened myself up to a larger collection of expectations than I originally anticipated. I feel as if I have zero credibility with the community at this point, which means I painstakingly check the changes that are committed and review as much code as possible before tagging a release. I’m scared to death of releasing a bad release of Community Cheetah and driving people away from the project, the nightmare scenario I play over in my head when tagging a release in Git is somebody going “this crap doesn’t work at all, I’m going to stick with Cheetah v2.0.1 for now” such that I cannot get them to upgrade to subsequent releases of Community Cheetah. I think creators of a project have a lot of “builtin street cred” with their users and community of developers, whereas I still have to establish my street cred through introduction of bug fixes/features, knowledge of the code base and generally being available through the mailing list or IRC.

Moving Forward

Currently I’m preparing the third Community Cheetah release (which I tagged today) v2.1.1 which comes almost a month after the previous one and introduces a number of fixes but also some newer features like the #transform directive, markdown support, and 100% Python 2.6 compatibility.

Thanks to an intrepid contributor, Jean-Baptiste Quenot, we have a v2.2 release lined up for the near future which fixes a large number of Unicode specific faults that Cheetah currently has (the code can currently be found in the unicode branch) and moves the internal representation of code within the Cheetah compiler/parser to a unicode string object in Python.

I eagerly look forward to more and more usage of Cheetah, with other templating engines out there for Python like Mako and Genshi I still feel Cheetah sits far and above the others in its power and versatility but has just been neglected for far too long.

If you’re interested in contributing to Cheetah, you can fork it on GitHub, join the mailing list or find us on IRC (#cheetah on Freenode).

This experiment on restarting an open source project is far from over, but we’re off to a promising start.

Read more →

Do not fear continuous deployment

One of the nice things about living in Silicon Valley is that you have relatively easy access to a number of the developers you may work with through open source projects, mailing lists, IRC, etc. Today Kohsuke Kawaguchi of Sun Microsystems, the founder of the Hudson project, stopped by the Slide offices to discuss Hudson and the "cloud", continuous deployment and our workflow with Hudson here at Slide. Continuous deployment being the most interesting topic for me, and the most relevant in terms of the importance of Hudson in our current infrastructure.


Since reading Timothy Fitz's post on the setup for "continuous deployment" at IMVU, I've become obsessed to a certain degree with pushing Slide in that direction as an engineering organization. Currently we push a number of times a day as necessary, and it's almost as if we have manual-continuous-deployment as it is right now, there's just a lot of room for optimizations and automation to cut down on the tedium and allow for more beer drinking.


@agentdero continuous deployment = when build is green, autoship? sounds terrifying...

     (@tlipcon)



As a concept, continuous deployment can be quite scary "wait, some robot is going to deploy code to my production site, wha!" It's important to remember that the concept of continuous deployment doesn't necessarily mean that no QA is involved in the release process, it is however ideal to have enough good test cases such that you can do a fully automated unit/integration/system test run. The biggest difficultly with the entire concept of "continuous deployment" however is not writing tests or actually implementing a system to deploy, it forces you to understand your releases and production environment; it's about eliminating the guess work from your process and reducing the amount of human error (or potential for human error) involved in deployments.

In my opinion, continuous deployment isn't about making a hard switch, firing your QA and writing boat-loads of tests to ensure that you can push the production site straight from "trunk" as much as humanly possible. Continuous deployment is far more about solidifying your understanding of your entire stack, evolving your code base to where it is both more testable and better covered by your tests, then putting your money where your mouth is and relying on those tests. If your codebase moves rapidly, unit/integration/system tests are only going to be up to date and valuable if you actually rely on them. If breaking a single unit test pre-deployment becomes a Big Deal™, then the developer responsible for the code being deployed will make sure that: (a) the test is valid and up to date and (b) the code that the test is covering does not contain any actual regressions.


Take the typical repository layout for most companies which is, as far as I've seen, made up of a volatile trunk, stable release branch and then a number of project branches. In an engineering department QA would be responsible for ensuring that projects are properly vetted before merging from project branches (also called "topic branches" in the Git community) into the more volatile trunk branch. Once the CI server (i.e. Hudson) picks up on changes in trunk, the testing process would begin at that particular revision. Provided the test suites passed with flying colors Hudson would start to kick up the process to do a slow/sampled deploy as Timothy describes in his post. If the tests failed however, alarms would start beeping, sirens would wail and there would be much gnashing of teeth, somebody has now broken trunk and is blocking any other deployments coming down the pipe. In this "disaster scenario" the QA involved in the process would be thoroughly shamed (obviously) but then given the choice to block future pushes while the developer(s) create a fix or revert their changes out of trunk and take them back to a project branch to correct the deficiencies. This attention to detail will have an larger benefit in that developers won't become numb to test failures to where they're no longer important.


What good is writing tests if there aren't have real-consequences for them failing? Releases shouldn't be a scary time of the day/week/month, you should certainly be nervous (keeps you sharp), but if you fear releases then it is probably an error in your release process that allows for too much uncertainty: inadequate test coverage, insufficient blackbox testing, poor release practices, etc. Continuous deployment might not be the magic solution to your woe of shipping software but the practice of moving towards continuous deployment will greatly improve your release process whether or not you ever actually make the switch over to a fairly automated deployment process as the engineers at IMVU have.


How confident are you in your test coverage?
Read more →

V8 and FastCGI, Exploring an Idea

Over the past couple years I've talked a lot of trash about JavaScript (really, a lot) but I've slowly started to come around to a more neutral stance, I actually hate browsers, I like JavaScript just fine by itself! While the prototype-based object system is a little weird at first coming from a more classical object-oriented background, the concept grows on you the more you use it.

Since I hate browsers so much (I really do), I was pleased as punch to hear that Google's V8 JavaScript Engine was embeddable. While WebKit's JavaScriptCore is quite a nice JavaScript engine, it doesn't lend itself to being embedded the same way that V8 does. The only immediate downside to V8 is that it's written entirely in C++, which does provide some hurdles to embedding (for example, I'm likely never going to be able to embed it into a Mono application), but for the majority of cases embedding the engine into a project shouldn't be all that difficult.

A few weekends ago I started exploring the possibility of running server-side JavaScript courtesy of V8, after reading about mod_v8 I felt more confident to try my project: FastJS.

In a nutshell, FastJS is a FastCGI server to process server-side JavaScript, this means FastJS can hook up to Lighttpd, Nginx, or even Apache via mod_fcgi. Currently FastJS is in a state of "extremely unstable and downright difficult", there's not a lot there as I'm exploring what should be provided by the FastJS server-side software, and what should be provided by JavaScript libraries. As it stands now, FastJS preloads the environment with jQuery 1.3.2 and a "fastjs" object which contains some important callbacks like:
fastjs.write() // write to the output stream

fastjs.log() // write to the FastCGI error.lgo
fastjs.source() // Include and execute other JavaScript files


On the server side, a typical request looks something like this (for now):
2009-03-09 05:04:06: (response.c.114) Response-Header:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-type: text/html
X-FastJS-Request: 1
X-FastJS-Process: 11515
X-FastJS-Engine: V8
Date: Mon, 09 Mar 2009 09:04:06 GMT
Server: lighttpd/1.4.18


Below is an example of the current test page "index.fjs":


var index = new Object();

index.header = function() {
fastjs.write("FastJS");
fastjs.write("");
fastjs.write("

FastJS Test Page


");
};

index.footer = function() {
fastjs.write("");
};

index.dump_attributes = function(title, obj) {
fastjs.write("
");
fastjs.write(title);
fastjs.write("


");

for (var k in obj) {
fastjs.write(k + " = ");

if (typeof(obj[k]) != "string")
fastjs.write(typeof(obj[k]));
else
fastjs.write(obj[k]);

fastjs.write("
\n");
}
};

(function() {
index.header();

fastjs.source("pages/test.fjs");

index.dump_attributes("window", window);
index.dump_attributes('location', location);
index.dump_attributes("fastjs.env", fastjs.env);
index.dump_attributes("fastjs.fcgi_env", fastjs.fcgi_env);


index.footer();

fastjs.log("This should go into the error.log");
})();

The code above generates a page that looks pretty basic, but informative nonetheless (click to enlarge):


Pretty fun in general to play with, I think I'm near on the point where I can stop writing more of my terrible C/C++ code and get back into the wonderful land of JavaScript. As it stands now, here's what still needs to be done:
  • Proper handling of erroring scripts via an informative 500 page that reports on the error
  • Templating? Lots of fastjs.write() calls are likely to drive you mad
  • Performance concerns? As of now, the whole stack (jQuery + .fjs) are evaluated every page request.
  • Tests! I should really get around to writing some level of integration tests to make sure that FastJS is returning expected results for particular chunks of .fjs scripts


The project is hosted on GitHub right now, here and is under a 2-clause BSD license.
Read more →

Git Protip: Split it in half, understanding the anatomy of a bug (git bisect)

I've been sending "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers.


There are those among us who can look at a reproduction case for a bug and just know what the bug is. For the rest of us mere mortals, finding out what change or set of changes actually introduced a bug is extremely useful for figuring out why a particular bug exists. This is even more true for the more elusive bugs or the cases where code "looks" correct and you're stumped as to why the bug exists now, when it didn't yesterday/last week/last month. The options in most classical version control systems you have available to you are to sift through diffs or wade through log message after log message trying to spot the particular change that introduced the regression you're now tasked with resolving.

Fortunately (of course) Git offers a handy feature to assist you in tracking down regressions as they're introduced, git bisect. Take the following scenario:
Roger has been working on some lower level changes in a project branch lately. When he left work last night, he ran his unit tests (everything passed), committed his code and went home for the day. When he came in the next morning, per his typical routine, he synchronized his project branch with the master branch to ensure his code wasn't stomping on released changes. For some reason however, after synchronizing his branch, his unit tests started to fail indicating that a bug was introduced in one of the changes that was integrated into Roger's project branch.

Before switching to Git, Roger might have spent an hour looking over changes trying to pinpoint what went wrong, but now Roger can use git bisect to figure out exactly where the issue is. Taking the commit hash from his last good commit, Roger can walk through changes and pinpoint the issue as follows:

## Format for use is: git bisect start [<bad> [<good>...]] [--] [<paths>...]
xdev4% git bisect start HEAD 324d2f2235c93769dd97680d80173388dc5c8253
Bisecting: 10 revisions left to test after this

[064443d3164112554600f6da39a36ffb639787d7] Changed the name of an a/b test.
xdev4%

This will start the bisect process, which is interactive, and start you halfway between the two revisions specified above (see the image below). Following the scenario above, Roger would then run his unit tests. Upon their success, he'd execute "git bisect good" which would move the tree halfway between that "good" revision and the "bad" revision. Roger will continue doing this until he lands on the commit that is responsible for the regression. Knowing this, Roger can either revert that change, or make a subsequent revision that corrects the regression introduced.

A sample of what this sort of transcript might look like is below:

xdev4% git bisect good
Bisecting: -1 revisions left to test after this
[bcf020a6c4ac7cc5df064c66b182b2500470000a] Merge branch 'cjssp' into master
xdev4% git bisect bad
bcf020a6c4ac7cc5df064c66b182b2500470000a is first bad commit
xdev4% git show bcf020a6c4ac7cc5df064c66b182b2500470000a
commit bcf020a6c4ac7cc5df064c66b182b2500470000a
Merge: 62153e2... 064443d...
Author: Chris <chris@foo>

Date: Tue Jan 27 12:57:45 2009 -0800

Merge branch 'cjssp' into master

xdev4% git bisect log
# bad: [7a5d4f3c90b022cb66fd8ea1635c5de6768882d7] Merge branch 'foo' into master
# good: [d1014fd52bebd3c56db37362548e588165b7f299] Merge branch 'bar'
git bisect start 'HEAD' 'd1014fd52bebd3c56db37362548e588165b7f299' '--' 'apps'

# good: [064443d3164112554600f6da39a36ffb639787d7] Changed the name of an a/b test. PLEASE PICK ME UP WITH NEXT PUSH. thx
git bisect good 064443d3164112554600f6da39a36ffb639787d7
# bad: [bcf020a6c4ac7cc5df064c66b182b2500470000a] Merge branch 'cjssp' into master
git bisect bad bcf020a6c4ac7cc5df064c66b182b2500470000a
xdev4% git bisect reset
xdev4%

Instead of spending an hour looking at changes, Roger was able to quickly walk a few revisions and run the unit tests he has to figure out which commit was the one causing trouble, and then get back to work squashing those bugs.

Roger is, like most developers, inherently lazy, and running through a series of revisions running unit tests sounds like "work" that doesn't need to be done. Fortunately for Roger, git-bisect(1) supports the subcommand "run" which goes hand in hand with unit tests or other tests. In the example above, let's pretend that Roger had a test case exhibiting the bug he was noticing. What he could actually do is let git bisect run automatically run a test script to run his unit tests to find the offending revision i.e.:

xdev4% git bisect start HEAD 324d2f2235c93769dd97680d80173388dc5c8253
Bisecting: 10 revisions left to test after this

[064443d3164112554600f6da39a36ffb639787d7] Changed the name of an a/b test.
xdev4% git bisect run ./mytest.sh

After executing the run command, git-bisect(1) will binary search the revisions between GOOD and BAD testing whether or not "mytest.sh" returns a zero (success) or non-zero (failure) return code until it finds the commit that causes the test to fail. The end result should be the exact commit the regression was introduced into the tree, after finding this Roger can either grab his rubber chicken and go slap his fellow developer around or fix the issue and get back to playing Nethack.

All in all git-bisect(1) is extraordinarily useful for pinning down bugs and diagnosing issues as they're introduced into the code base.


For more specific usage of `git bisect` refer to it's man page here: git-bisect(1) man page



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Head in the clouds

I've spent the entire day thinking about "cloud computing", which is quite a twist for me. Seeing "impressive" conferences centered around "cloud computing" I've ridiculed the concept mercilessly, it has a phenomenally high buzzword/usefulness ratio, which makes it difficult to take seriously. It tends to have an air of idiocy attached to it of the same style that the re-invention of thin-clients did a few years back. That said, I think the concept is sound, and useful for a number of companies and uses (once distilled of the buzz).

Take Slide for example, we have a solid amount of hardware, hundreds of powerful machines constantly churning away on a number of tasks: serving web pages, providing back-end services, processing database requests, recording metrics, etc. If I start the work-week needing a new pool of machines either set up or allocated for a particular task, I can usually have hardware provisioned and live by the end of the week (depending on my Scotch offering to the Operations team, I can get it as early as the next day). If I can have the real thing I clearly have no need for cloud computing or virtualization.

That's what I thought, at least, until I started to think more about what would be required to get Slide closer to the lofty goal of continuous deployment. As I was involved in pushing for and setting up our Hudson CI server, I constantly check on the performance of the system and help make sure jobs are chugging along as they should be, I've become the defacto Hudson janitor.


Our current continuous integration setup involves one four-core machine running three separate instances of our server software as different users, processing jobs throughout the day. One "job" typically consists of a full restart of the server software (Python) and running literally every test case in the suite (we walk the entire tree aggregating tests). On average the completion of one job takes close to 15 minutes, and executes around 400+ test cases (and growing). Fortunately, and unfortunately, our Hudson machine is no longer able to service this capacity during development peak in the middle of the day, this is where the "cloud" comes in.

We have a few options at this point:
  • Setup another one or more machines
  • Rethink how we provision hardware for continuous integration


The fundamental problem with provisioning resources for continuous integration, at least at Slide, is that the requirements are bursty at best. We typically queue a job for a particular branch when a developer executes a git push (via the Hudson API and a post-receive hook). From around 9 p.m. until 9 a.m. we don't need but maybe two actual "executors" inside Hudson to handle the workload the night-owl developers tend to place on Hudson, from 12 p.m. until 7 p.m. however our needs fluctuate rapidly between needing 4 executors, and 10 executors. To exacerbate things further, due to "natural traffic patterns" in how we work, mid-afternoon on Wednesday and Thursday require even more resources as teams are preparing releases and finishing up milestones.

The only two possible solutions to solve the problem are to: build a continuous integration farm with full knowledge capacity will remain unused for large amounts of time, or look into "cloud computing" with service provides like Amazon EC2 which will allow for Hudson slaves to be provisioned on demand. The maintainer of Hudson, Kohsuke Kawaguchi has already started work on "cloud support" for Hudson via the EC2 plugin which makes this a real possibility. (Note: using EC2 for this at Slide was Dave's idea, not mine :))

Using Amazon EC2 isn't the only way to solve this "bursty" problem however, we could just as easily solve the problem in house with provisioning of Xen guests across a few machines. The downside of doing it yourself is amount of time between when you know you need more capacity and when you can actually add that capacity to your own little "cloud". Considering Amazon has an API for not only running instances but terminating them, it certainly provides a compelling reason to "outsource" the problem to Amazon's cloud.

I recommend following Kohsuke's development of the EC2 plugin for Hudson closely, as continuous integration and "the cloud" seem like a match made in heaven (alright, that pun was unnecessary, it sort of slipped out). At the end of the day the decision comes down to a very fundamental business decision: which is more cost effective, building my own farm of machines, or using somebody else's?

(footnote: I'll post a summary of how and what we eventually do to solve this problem)
Read more →

Old Navy Sucks.

I'm going to go ahead and admit something, something that's difficult for most men to admit in my situation. I shop at Old Navy. I'm sorry, I like their collared shirts. Sue me.

This past weekend I decided to use an oldnavy.com gift card that I was given to buy some new jeans (as my favorite pair now has a hole in the knee). A "cute" side effect of redeeming an oldnavy.com gift card was that I needed to create an oldnavy.com account. "Cute".

After I created my account, with a site-specific password (I generate throw-away passwords for sites that abuse the privilege of my business), I received the following email:


Like I said, "cute". Damn idiots.
Read more →

Amazon Sucks Too

On the topic of online shopping "sucking", I have been sitting on this beautiful screenshot for a while.

A couple of months ago I bought a watch on Amazon. Not a spectacular watch, a very basic Seiko analog watch that I had previously owned but had lost. I went on to Amazon to buy "my watch", and after finding it, I happily ordered the watch.

Shortly after the watch arrived, I noticed a huge influx of quite topical SPAM.



I'm pleased to say that I've not purchased anything from Amazon since I discovered that Amazon, or somebody that Amazon deals with sold my information to everybody.

This still makes my blood boil. Rat bastards.
Read more →

Git Protip: A picture is worth a thousand words (git tag)

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the fourth Protip written to date.




While the concept of "tagging" or "labeling" code is not a new, or original idea that was introduced with Git, our use of tags in a regular workflow does not predate the migration to Git however. At it's most basic level, a "tag" in any version control system is to take a "picture" of how the tree looks at a certain point in time such that it can be re-created later. This can be extremely helpful for both local and team development, take the following scenario for local development using tags:

Tim is extremely busy, most of his days working at an exciting, fast-paced start-up seem to fly by. With one particular project Tim is working on, a lot of code is changing at a very fast pace and the branch he's currently working in is stable one minute and destabilized the next. Tim has two basic options for leaving himself "bread-crumbs" to step back in time to a stable or an unstable state. The first, complicated option, is to mark his commit messages with something like "STABLE", etc so he can git diff or git reset --hard from the current HEAD to the last stable point of the branch.


The second option is to make use of tags. Whenever Tim reaches a stable point in his turmultuous development, he can simply run:
git tag wip-protips_`date "+%s"
(or something similar, `date` added to ensure the tag is unique). If Tim finds himself too far down the wrong path, he can rollback his branch to the latest tag (git reset --hard protiptag), create a new stable branch based on that tag (git checkout -b wip-protip-2 protiptag), or diff his current HEAD to the tag to see what all he's changed since his branch was stable (git diff protiptag...HEAD)



This local development scenario can become a team development scenario involving tags, if for example, Tim needed QA to start testing portions of his branch (his changes are just that important). Since the current HEAD of Tim's branch is incredibly unstable, he can push his tag to the central repository so QA can push a stage using the tag to the last stable point in the branch's history with the command: git push origin tag protiptag

Tags are similar to most other "refs" in Git insofar that they are distributable, if I execute git fetch your-repo --tags, I can pull the tags you've set in "your-repo" and apply them locally aid development. The distributed nature is primarily how tags differ in Git from Subversion, nearly the rest of the concept is the exact same.

Currently at Slide, tag usage is dominated by the post-receive hook in the central repository, where every push into the central repository ("origin") in the branch release branch is tagged. This allows us to quickly "revert" bad live pushes temporarily, by simply pushing the last "good" tagged release, to ensure minimal site destabilization (while we correct live issues outside of the release branch).

For more specific usage of `git tag` refer to the git-tag(1) man page



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Read more →

Proposal: Imuse, an IMAP-capable FUSE filesystem

I've spent the better part of my weekend messing around with mail clients, and once again Evolution comes out on top and once again, I'm not happy about it. I tried: Claws, Thunderbird, Alpine (formerly Pine), Mutt, Balsa, KMail and TkRat. None of them worked as well as I wanted, is it too much to ask for to have a mail client that doesn't puke and die on large (>2GB) of IMAP mail? Supports proper jwz mail threading? And caches IMAP mail locally so I can actually access it while disconnect? Turns out it actually is too much to ask.

That's not what this is about though. While hunting around, I started to look at my Slide IMAP mail account, and see something interesting: it looks suspiciously like a filesystem. The general layout I have right now is something like this:
  • /
    • INBOX
    • Sent
    • Drafts
    • Development/
      • Commits
      • Pushes
    • External/
      • Git
      • Hudson
    • Metrics
    • QA/
      • Exceptions
      • Trac


Clearly, it's a very filesystem-esque looking tree of mail (and a couple gigabytes of it). When you start to really dig into e-mail technology, you really get a feeling for how royally screwed up the whole ecosystem is. Between Exchange, IMAP and POP3 (and their SSL counterparts), mbox and Maildir, and of course the venerable SMTP; e-mail technology is a clusterfuck. No wonder barely anybody can implement an e-mail client that doesn't suck.

At a basic level, mail is organized into messages and folders. Messages map very easily to actual files on the filesystem, and folders naturually map into actual directories on the filesystem. Imagine that you could chose any program you wanted to read and write your email? The only pre-requisite: can it read from the filesystem? You could have any program register to receive filesystem events to notify you when mail "appears" in specific directories, and you could move mail around with a simple drag-and-drop in Nautilus/Thunar/Finder. What about writing mail though? Easy enough, you create a new file in the "Drafts" folder, writes would naturally be propogated to the "Drafts" folder on the IMAP server, and when you were done with the message, you could copy or move it into the "Sent" folder, which would have a hook to recognize the new file, and send it. The IMAP tree from above, starts to look something like this:
  • ~/Imuse
    • Settings
    • Accounts/
      • Slide/
        • INBOX
        • Sent
        • Drafts
        • Development/
          • Commits
          • Pushes
        • External/
          • Git
          • Hudson
        • Metrics
        • QA/
          • Exceptions
          • Trac


"Accounts" and "Settings" would likely need to be "special" insofar that Imuse would just create them out of thin air, Accounts would need to be a virtual directory to actually contain the appropriate account listings, and in Settings I'd likely want to have a couple of flat configuration "files" that you could edit in order to actually configure Imuse appropriately.

If there are simply lists of files in each of the Accounts' folders, each representing a particular email, then the problem of dealing with all my e-mail becomes a much easier one to handle, then it's just a matter of picking my filesystem browser of choice. Even then it's not really limited to filesystem browsers like Nautilus, the scope of programs that I can use to access my mail is opened up to $EDITOR as well. Most editors like Notepad++, Vim, Emacs, Gedit, and TextMate all support the ability to view a directory, and open it's contents up for reading/editing. I'm a big fan using Vim, so Imuse coupled with vtreeexplorer would be phenomonal to say the least.

I've started toying around with building FUSE filesystems and I've pushed my experimenting up to GitHub in my imuse repository. It's currently in C, since I either cannot get either of the two FUSE Python bindings to work properly. This presents a certain level of difficulty, since the standard means of accessing IMAP data from C seems to be c-client, which is reasonably well documented, but lacks sample code. On the other hand, if I can get the Python bindings to cooperate, then I have access to the wonderful Twisted Mail library (or even the basic imaplib).

Given my obvious time restrictions, I wanted to open the idea up to more eyes and ears to see what others thought and maybe even find somebody else willing to pitch in. For the time being however, Evolution is still sifting through my mail, and I'm still not enjoying it :(

Read more →

But Who Will Write The Tests?

In addition to frothing at the mouth about Git, I've been really getting into the concept of automated unit tests lately (thus my interest in Hudson). Just like code comments however, tests are good, no tests is bad, wrong tests is worse. That means once you give in to the almighty power of unit testing, you are saddled with the curse of knowing that you will have to update them, forever.

Taking up Test-driven Development is like having a child, if you are at a point in your life where you're ready to accept that kind of responsibility, it can be wonderful, a lot of work, but ultimately you will feel satisfied with your new role as a Responsible Developer (tm). If you're not prepared to take on the burden that TDD will present you with, you will likely regret it or neglect your tests (Deadbeat Developer, I like this metaphor).

In the Top Friends Team at Slide, we practice the more "loose" definition of TDD; tests are not written before functionality is written, but rather functionality is written, and then as part of the QA and release process, the appropriate and accompanying tests are written. Our basic workflow is usually as follows:
  • Tickets are written and assigned to milestones and developers in Trac
  • Branch is created in central Git repository
  • General plan-of-action is discussed between developers
  • Hack-hack-hack
  • Code complete is reached, QA starts to test milestone
  • Developers write tests if needed for functionality
  • Once QA signs off, and tests look solid, code is shipped live


There are two primary flaws with this workflow, the first and most obvious one is that it is far to easy to "forget to write the tests." That is, the project scheduled to start development tends to "flow forward" into the allotted test-writing time. As important as test coverage is, at the end of the day Slide did not raise funding on having solid test coverage, and our priorities lie in shipping software, first and foremost. Solving the flow-forward of scheduled projects into any available space is something that can be worked on, but never solved, it really comes down to discipline between those in charge of setting up any given project's particular roadmap.

The second, more subtle flaw in this workflow, and I think all Test-driven Devleopment workflows, revolves around the writer of the tests. The fundamental nature of almost all bugs in software is human error, our natural tendency to make mistakes means that nothing we do will ever be perfect, including our tests. If Developer A is writing a couple new methods to handle data validation prior to that data going into the database. Chances are that Developer A's life is going to be made far easier by writing some test cases to run through some predefined user-input, and pass his validation code over it. Therein lies the problem, if the developer doesn't think of a particular edge case when he's writing the code to handle the data validation, the chances he'll remember and account for that particular edge case while he's working on the unit tests is nil.

How do you really ensure that tests are of high enough quality to actually catch errors and regressions?

I think a certain extent of intra-team test writing and code review, depending on the level of communication between developers, can really help. In this case less developer communication is better. If Developer A tells Developer B how his code works, Developer B is now going to have an unnecessary expectation when he starts to write tests for Developer A's code. If Developer B reviews the code for what it actually is, instead of what Developer A thinks it is, the tests that will ultimately be written will be more thorough than if Developer A had written the whole suite himself.

This still isn't sufficiently fool-proof to where I feel all that confident in test coverage, the tests being written are subject to the availability, thoroughness and understanding that Developer B brings to the table. Inside a small team like this one, one of those is almost always in short supply (usually availability).

One approach I'm anxious to try is the more active involvement of QA engineers in the test writing process, both in the pre-fail and post-fail scenarios. The pre-fail scenario being one like that which I detailed above, where new code is being written. In this case a QA engineer's experience can help guide the developer on what sets of user-input have typically caused issues in the past. The second case, post-fail, is actually already occuring at Slide; a live issue, data validity bug, or regression is caught by QA engineers who detail the reproduction case in Trac and as a result a regression test can be written for that specific issue.

This still is subject to the three things I cited above: availability, thoroughness and understanding of those involved. I still have a lot of unanswered questions about the ideal QA and Dev workflow however, how does this scale to a team of tens or hudnreds? Who writes the tests for large teams? What about a team of 1 Dev and a 1 QA, what about the lone-hacker? How do you write quality code, without getting bogged down in the mush of writing thousands of tests for everything you can imagine could go wrong?

Who writes the tests?
Read more →

Extremely brief review of the Nokia n810

A coworker of mine was kind enough to let me borrow his Nokia n810 for a couple days to try it out as he know I was considering purchasing one for myself. I'm very glad I tried it before buying it, since I'm not going to buy it now (sorry Nokia! The princess is in another castle!)

The thought of a handheld, wireless capable, Linux device is very intriguing for me. That said, I'm not sure what I would even do with it! As I mentioned in my previous post, I like to feel cool, and the prospect of answering the question "is that Linux in your pocket or are you just happy to see me" is far to enticing to pass up. Regardless, I think the n810 suffers from some critical hardware, and software, deficiencies.

Hardware
The n810 is powered by a 400Mhz ARM processor, and comes equipped with either 128MB or 256MB or RAM (from what I can tell), I'm not entirely certain which is to blame for the sluggishness of the experience, but my guess is on the RAM. Particularly when running the browser (Gecko-based) I would experience "hiccups" where the device spent a few seconds registering input, before actually following a clicked link. This may be more at fault of the software, but for an internet tablet, the sluggishness of the browser in both user interaction and rendering time was absolutely infuritating.

The built-in keyboard is smooth, a little too smooth for my taste; I found myself constantly struggling to hit the right keys with my fingers (my thumb is the width of 2.5 columns of keys). Unlike most US keyboard layouts, the n810 keyboard has a lot of keys in "weird" places that I could not get a hang of over the course of a weekend. I eventually gave up on trying to chat or use SSH on the device because I found it so painful to try to type on the device.

The battery life was nothing to write home about, closer to a laptop battery life, instead of a phone's battery life.


The Software
Despite being Linux-based, the device doesn't feel like Linux at all, which I think is a good thing for the mass-market. The "Home" screen was pretty slick, with the ability to add applets to the "desktop" to report things like weather, time, VPN status, etc. A cross-between systray and Dashboard, the Home screen was where I felt most comfortable in the device (the "home" screen in my Smartphone is set up with similar informational panels). Once leaving "Home" I was soon frustrated again, I still haven't figured out whether or not the "Accounts" preference in the Control Panel (for IM accounts) and the installation of Pidgin are the same thing or not. Email and IM, the two other foundations of what I would expect from an "internet tablet" were weak. Neither of them cooperated with any of the IMAP/SSL or Jabber/SSL servers I use, and they both seemed to be targeted at webmail and chat services like GMail and GTalk.

Maemo does use .deb packages for installation, so I could pretty easily find some of my favorite open source applications in the Maemo repositories, unfortunately the GUI frontend for apt-get on Maemo allows for only one operation at a time (no checking multiple boxes and then clicking "Install") so adding new software was literally a 30 minute operation.


Conclusion
I don't think I'm being too negative in saying that I'm disappointed in Nokia for releasing what I think is such a substandard product. With the ubiquity of wireless in San Francisco, having a nice solid ultra-portable machine that I can actually fit into my pocket is exciting, The Nokia n810 is certainly not that machine.

This week I'm shipping my ASUS Eee PC off for my little sister, so I'm starting to look more and more for something even more portable to fill the void, right now the leader is the OQO model 02 which is about 2 times the price of the n810, and ships with Vista by default, but with Ubuntu and close to 6 hours of battery life I think it could be the ultra-portable that I've been looking for.
Read more →

I'm using Git because it makes me feel cool

Let's be honest for a second, anybody who knows me knows that I'm clearly an insecure person; I spend the majority of my time trying my best to appear cool. I've owned a lot of Macs in my life, not because they're solid machines with a fantastic operating system, but because I felt so damn smug and cool whenever I was doing anything on my Macs. I also developed Mac software for a while, not because it was my passion or Objective-C and Cocoa are practically God's gift to software, but because Mac developers are so cool, what with the black-rimmed glasses and fancy coffees. Hell, I remember when I finally traded my MacBook Pro for a Thinkpad running Linux; it had nothing to do with an ideological stance against Apple's treatment of developers or frustrations with Leopard, it was all about the new geek-chic that was Linux. Thus far, my life has basically been one big quest for more leet-points.

Then came Git.

When I started out in the software world, I was using CVS, which was a notch less cool than a slim IBM salesman's tie. The constant moaning and groaning of fellow developers using CVS, combined with the shame that I felt when I finally told my parents about my use of CVS was too much to bear. I had to switch.

I remember the first time I tried Subversion, I remember talking to Dave and saying "Meh, I'll stick with CVS!" Soon enough, just like the Macarena, Subversion swept the nation up. Subversion was the newest, coolest thing ever, developers rushed into the streets exclaiming "it sucks less than CVS! It sucks less than CVS!" I switched over to Subversion and all of a sudden I was cool again. One by one, open source projects I knew about switched over to Subversion, then Source Forge switched over to Subversion and in an instant, Subversion replaced CVS and became the mainstream version control system. Subversion had grown up, gotten married, a 401k and health insurance, how uncool.

After joining Slide, which used Subversion, I found myself burning up inside. Here I was at this hip start-up, really feeling cool, but still using the same version control system that uncool companies like, Yahoo! and Sun use. I would not stand for this. As 2007 became 2008 the writing was on the wall, Git was our new bicycle. It had been blessed by Saint Torvalds and clearly we needed to get in on the ground floor of the new cool before it became mainstream.

We needed to switch to Git immediately. Who cares if Git is extremely fast, it's not like time is money or something ridiculous like that. What do I care if Git handles branches and merge histories unlike CVS or Subversion? With its immense coolness-factor, I didn't even consider that Git will allow us to work in a decentralized workflow or a centralized workflow, nope, didn't even cross my mind. If one were to make a list of Pros and Cons of Git versus whichever other version control system, you could just put "Pro: Cool" at the top of the list, underlined, in bold, and the rest would be moot as far as I'm concerned.

Unlike Subversion or Perforce, Git doesn't have corporate backing, Git is distributed, like a guerilla-force sweeping through the jungle ready to pownce on an unsuspecting platoon; that's freakin' cool. Git rides a motorcycle, wears a leather jacket, makes women swoon and kicks ass and/or jukeboxes.

Git is the Fonz. Cool.

Don't make any false assumptions about my feelings towards Git, it's not like it's a clearly superior version control system or anything, I'm using it only because I want to be cool too.

Read more →

Git Protip: By commiting that revision, you fucked us (git revert)

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the third Protip written to date.




The concept of "revert" in Git versus Subversion is an interesting, albeit subtle, change, mostly in part due to the subtle differences between Git and Subversion's means of tracking commits. Just as with Subversion, you can only revert a committed change, unlike Subversion there is a 1-to-1 mapping of a "commit" to a "revert". The basic syntax of revert is quite easy: git revert 0xdeadbeef, and just like a regular commit, you will need to push your changes after you revert if you want others to receive the revert as well.

In the following example of a revert of a commit, I also use the "-s" argument on the command line to denote that I'm signing-off on this revert (i.e. I've properly reviewed it).


xdev3% git revert -s c20054ea390046bd3a54693f2927192b2a7097c2
----------------[Vim]----------------
Revert "merge-to-release unhide 10000 coin habitat"

This reverts commit c20054ea390046bd3a54693f2927192b2a7097c2.

Signed-off-by: R. Tyler Ballance <tyler@slide.com>
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch wip-protips
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: bt/apps/pet/data.py


+ python bt/qa/git/post-commit.py -m svn@slide.com
Sending a commit mail to svn@slide.com
Created commit a6e93b8: Revert "merge-to-release unhide 10000 coin habitat"
1 files changed, 4 insertions(+), 3 deletions(-)



Reverting multiple commits


Since git revert will generate a new commit for you every time you revert a previous commit, reverting multiple commits is not as obvious (side note: I'm aware of the ability to squash commits, or --no-commit for git-revert(1), I dislike compressing revision history when I don't believe there shouldn't be compression). If you want to revert a specific merge from one branch into the other, you can revert the merge commit (provided one was generated when the changes were merged). Take the following example:

commit 81a94bb976dfaaaae42ae2600b7e9e88645ebd81
Merge: 8134d17... d227dd8...
Author: R. Tyler Ballance <tyler@slide.com>
Date: Thu Nov 20 10:15:16 2008 -0800

Merge branch 'master' into wip-protips



I want to revert this merge since it refreshed my wip-protips branch from master, and brought in a lot changes tat have destablized my branch. In the case of reverting a merge commit, you need to specify -m and a number to denote where the mainline branch for Git to pivot off of is, -m 1 usually suffices. So the revert of the commit above will look something like this:

git revert 81a94bb976dfaaaae42ae2600b7e9e88645ebd81 -m 1



Then my revert commit will be committed after I review the change in Vim:

commit 8cae4924c4c05dadaaeccb3851cfc9ec1b8efd0f
Author: R. Tyler Ballance <tyler@slide.com>
Date: Thu Nov 20 10:20:44 2008 -0800

Revert "Merge branch 'master' into wip-protips"

This reverts commit 81a94bb976dfaaaae42ae2600b7e9e88645ebd81.



Let's take the extreme case where I don't have a merge commit to pivot off of, or I have a particular set of bare revisions that I need to revert in one pass, you can start to tie Git subcommands together like git-rev-list(1) to accomplish this. This hypothetical situation might occur if some swath of changes have been applied to a team-master that need to be backed out. Without a merge commit to key off of, you have to revert the commits one by one, but that doesn't mean you have to revert each one by hand:
for r in `git rev-list master...master-fubar --since="8:00" --before="12:00" --no-merges`; do git revert --no-edit -s $r; done
In the above example, I can use git-rev-list(1) to give me a list of revisions that have occurred on "master-fubar" that have not occurred on "master" between the times of 8 a.m. and 12 p.m., excluding merge commits. Since git-rev-list(1) will return a list of commit hashes by default, I can loop through those commit hashes in chronological order and revert each one. The inner part of the loop signs-off on the revert (-s) and then tells git-revert(1) to auto-commit it without opening the commit message in Vim (--no-edit). What this then outputs is the following:

xdev% for r in `git rev-list master...master-fubar --since="8:00" --before="12:00" --no-merges`; do git revert --no-edit -s $r; done
Finished one revert.
Created commit b6810d7: Revert "a test, for you"
1 files changed, 1 insertions(+), 1 deletions(-)
Finished one revert.
Created commit 83156bd: Revert "These are not the droids you are looking for
1 files changed, 2 insertions(+), 0 deletions(-)
Finished one revert.
Created commit 782f328: Revert "commented out stuff"
1 files changed, 0 insertions(+), 3 deletions(-)
Finished one revert.
Created commit 2b8d664: Revert "back on again"
1 files changed, 1 insertions(+), 1 deletions(-)
xdev%



For specific usage of "git-revert" or "git-rev-list" refer to the git-revert(1) man page or the git-rev-list(1) man page



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →

Git Protip: Learning from your history (git log)

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the second Protip written to date.





One of the major benefits to using Git is the entirety of the repository being entirely local and easily searched/queried. For this, Git has a very useful command called git log which allows you to inspect revision histories in numerous different ways between file paths, branches, etc. There are a couple basic scenarios where git log has become invaluable, for me at least, in order to properly review code but also to track changes effectively from point A to point B.


  • What's Dave been working on lately? (with diffs)
    • git log -p --no-merges --author=dave

    The --no-merges option will prevent git log from displaying merge commits which are automatically generated whenever you pull from one Git branch to another



  • Before I merge this branch down to my team master, I want to know what files have been changed and what revisions
    • git log --name-status master-topfriends...proj-topfriends-thing


    Git supports the ability with git log and with git diff to provide unidirectional branch lookups or bidirectional branch lookups. For example, say the left branch has commits "A, B" and the right branch has commits "A, C". The ... syntax will output "C", whereas .. will output "B, C"



  • I just got back from a vacation, I wonder what's changed?
    • git log --since="2 weeks ago" --name-status -- templates

    At the tail end of a git log command you can specify particular paths to look up the histories for with the -- operator, in this case, I will be looking at the changes that have occured in the templates directory over the past two weeks



  • Most recent X number of commits? (with diffs)

    • git log -n 10 --no-merges -p

All git log commands automatically filter into less(1) so you can page through the output like you would normally if you executed a svn log | less. Because git log is simply reading from the locally stored revision history you can quickly grep the history by any number of different search criteria to gain a better understanding of how the code base is changing and where.

For more specific usage of `git log` refer to the git log man page




Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
Read more →