rtyler

Taking control of Git

2018-11-25T00:00:00+00:00

In the development of service-oriented applications we often will use the phrase “source of truth” when referring to data and its ownership. The expectation being that there is generally a single source of truth in the system. Take DNS for example, we generally trust that a nameserver somewhere out there is acting as the single source of truth for a single domain, such as brokenco.de. Without this guarantee, much of our experience on the internet would break down. For the software we write, increasingly GitHub has become the source of truth for the source code itself. So much so that systems have been built on top of GitHub which further wed the software ecosystem to a single source of truth, such as Golang’s dependency definition conventions.

I have no fear of the GitHub acquisition by Microsoft, but I do concern myself with increasingly large single points of failure. A single entity owning too much of my interactions or data makes me feel uneasy. A little timer starts in my brain: how long until the good vibes run out, and this ends up screwing me?

With our source code living in Git, switching up the source of truth has never been easier. I set out recently to take back control for the source of truth for my own free and open source work. Using a server I have at my disposal, I deployed Gitea. Based originally on Gogs, I have found Gitea rather pleasant and simple to work with.

Fortunately, somebody else has written a tool: the gitea-github-migrator which made initializing the Gitea instance with my repositories quite simple. Due to some GitHub rate limits and other weird transient network errors, I ended up running the migrator over and over again until everything was synchronized properly to my server.

A quick look at my GitHub profile and you may notice that nothing has been deleted. My objective is to own the source of truth, not to reduce the redundancy for my source code. Unfortunately as of today, Gitea cannot automatically push to another Git remote (issue #3480), but creating a script which can be configured as a post-receive hook is easy enough:

#!/bin/sh

echo
echo "Mirroring changes to GitHub under ${GITEA_REPO_USER_NAME}/${GITEA_REPO_NAME}"
echo
git push --mirror git@github.com:${GITEA_REPO_USER_NAME}/${GITEA_REPO_NAME}.git
echo

To support this script I needed to set up a few of things:

A newly generated SSH public/private key pair for Gitea to use.
The new SSH public key needed to be added to my GitHub account
The above script gitea-github-mirror installed on the server’s filesystem
The repositories I wished to mirror needed to have a post-receive hook configured which executes gitea-github-mirror

Once the desired repositories have been set up, I only needed to change my local repositories to point somewhere else for their origin remote. Not-too-coincidentally, this is where my previous blog post about transparently switching SSH between Tor and the LAN comes in.

I can now treat GitHub like a public backup for these repositories, and maintain control over the source of truth for each repository I own and maintain.

Mirroring other repositories

Gitea has another feature worth mentioning in this same vein, one which I am only now starting to use: (pull-based) repository mirroring. Inevitably I find myself relying on third-party repositories either as Git submodules, or for source-builds of some piece of software. Rather than trust that those repositories will exist in perpetuity in somebody else’s GitHub organization or user account, Gitea mirroring allows me to create an automatically-updated mirror of the upstream repository. I’ve since found myself creating new organizations in Gitea to house different collections of libraries and tools I depend on, all automatically synchronized by Gitea.

Data provenance is an important subject to me and while not everything is as easily decentralized as Git, I believe it’s worth the effort to try to own your data as much as possible. For those things which are easily added into source control, Gitea and a modicum of extra disk space does the job nicely!

(Of course, this blog post was published to GitHub pages, after being mirrored from my Gitea instance.)

A rebase-based workflow

2010-04-02T00:00:00+00:00

When I first started working with Git in mid 2008 I was blissfully oblivious to the concept of a “rebase” and why somebody might ever use it. While at Slide we were crazy for merging (see diagram to the right), everything pretty much revolved around merges between branches. To add insult to injury, development revolved around a single central repository which everyone had the ability to push to. Merges compounded upon merges led to a frustratingly complex merge history.

When I first arrived at Apture, we were still using Subversion, similar to Slide when I arrived (I have a Git-effect on companies). In order to work effectively, I had to use git-svn(1) in order to commit changes that weren’t quite finished on a day-to-day basis. Rebasing is fundamental to the git-svn(1) workflow, as Subversion requires a linear revision history; I would typically work in the master branch and execute git svn rebase prior to git svn dcommit to ensure that my changes could be properly committed at the head of trunk.

When we finally switched from Subversion to Git we adopted an “integration-manager workflow” which is far more conducive to rebase being useful than the purely centralized repository workflow I had previously used at Slide.

From the [Pro Git](http://progit.org/book/ch5-1.html) site

In addition to the publicly readable repositories for each developer, we use Gerrit religiously which I’ll cover in a later post.

We use rebase heavily in this workflow to accomplish three main goals:

Linear revision history
Concise commits covering a logical change
Reduction of merge conflicts

Creating a solid linear revision history, while not immediately important, is nicer in the longer term allowing developers (or new hires) to walk the history of a particular file or module and see a clear progression of changes.

Creating concise commits is probably the most important reason to use rebase, when working in a topic branch I will typically commit every 20-40 minutes. In order to not break my flow, the commit messages will typically be brief and cover only a few lines of changes, atomic commits are great when writing code but they’re lousy at informing other developers about the changes. To do this, an “interactive rebase” can be used, for example, collapsing the commits in a topic branch ticket-1234 would look like:

git checkout ticket-1234
git rebase -i master

This will bring up an editor with a list of commits, where you can “squash” commits together and re-write the final commit message to be more informative.

The Workflow

For the purposes of the example, let’s use the topic branch from above (ticket-1234) which we’ll assume has 3 commits unique to it.

Fetch the latest changes from the upstream “master” branch
- git fetch origin
Rebase the topic branch, effectively piling the 3 commits on top of the latest tip of the upstream “master” branch
- git rebase origin/master
Collapse the 3 commits in the topic branch down into one commit
- git rebase -i origin/master
(Later) Bringing those commits down into the “master” branch
- git checkout master && git rebase ticket-1234

With an interactive rebase, you can chop commits up, re-order them, squash them, etc, with the non-interactive rebase you can pile your commits on top of an upstream head making your changes apply cleanly to the latest code in the upstream repository.

git ready has a few nice articles on the subject as well, such as an intro to rebase and an article on squashing commits with rebase

Pre-tested commits with Hudson and Git

2009-12-31T00:00:00+00:00

A few months ago Kohsuke, author of the Hudson continuous integration server, introduced me to the concept of the “pre-tested commit”, a feature of the TeamCity build management and continuous integration system. The concept is simple, the build system stands as a roadblock between your commit entering trunk and only after the build system determines that your commit doesn’t break things does it allow the commit to be introduced into version control, where other developers will sync and integrate that change into their local working copies. The reasoning and workflow put forth by TeamCity for “pre-tested commits” is very dependent on a centralized version control system, it is solving an issue Git or Mercurial users don’t really run into. Those using Git can commit their hearts out all day long and it won’t affect their colleagues until they merge their commits with others.

In some cases, allowing buggy or broken code to be merged in from another developer’s Git repository can be worse than in a central version control system, since the recipient of the broken code might perform a knee-jerk git-revert(1) command on the merge! When you revert a merge commit in Git, what happens is you not only revert the merge, you revert the commits associated with that merge commit; in essence, you’re reverting everything you just merged in when you likely just wanted to get the broken code out of your local tree so you could continue working without interruption. To solve for this problem-case, I utilize a “pre-tested commit” or “pre-tested merge” workflow with Hudson.

My workflow with Hudson for pre-tested commits involves three separate Git repositories: my local repo (local), the canonical/central repo (origin) and my “world-readable” (inside the firewall) repo (public). For pre-tested commits, I utilize a constantly changing branch called “pu” (potential updates) on the world-readable repo. Inside of Hudson I created a job that polls the world-readable repo (public) for changes in the “pu” branch and will kick off builds when updates are pushed. Since the content of public/pu is constantly changing, the git-push(1) commands to it must be “forced-updates” since I am effectively rewriting history every time I push to public/pu.

To help forcefully pushing updates from my current local branch to public/pu I use the following git alias:

% git config alias.pup "\!f() { branch=\$(git symbolic-ref HEAD | sed 's/refs\\/heads\\///g');\
      git push -f \$1 +\${branch}:pu;}; f"

While a little obfuscated, thie pup alias forcefully pushes the contents of the current branch to the specified remote repository’s pu branch. I find this is easier than constantly typing out: git push -f public +topic:pu

In list form, my workflow for taking a change from inception to origin is:

hack, hack, hack
commit to local/topic
git pup public
Hudson polls public/pu
Hudson runs potential-updates job
Tests fail?
- Yes: Rework commit, try again
- No: Continue
Rebase onto local/master
Push to origin/master

Using this pre-tested commit workflow I can offload the majority of my testing requirements to the build system’s cluster of machines instead of running them locally, meaning I can spend the majority of my time writing code instead of waiting for tests to complete on my own machine in between coding iterations.

Code Review with Gerrit, a mostly visual guide

2009-12-07T00:00:00+00:00

Update: Some of this information is out of date. Instead of pushing to the gerrit master branch I recommend setting up “replication” and using the “Submit” button inside of the “Review” page.

A while ago, when Paul, Jason and I worked together, I became a big fan of code reviews before merging code. It was no surprise really, we were the first to adopt Git at the company and our workflow was quite ad-hoc, the need to federate knowledge within the group meant code reviews were a pretty big deal. At the time, we mostly did code reviews in person by way of “hey, what’s this you’re doing here?” or by literally sending patch emails with git-format-patch(1) to the team mailing list so all could participate in the discussion about what merits “good code” exhibited versus “less good code.” Now that I’ve left that company and joined another one, I’ve found myself in another small-team situation, where my teammates place high value on code review. Fortunately this time around better tools exist, namely: Gerrit.

The history behind Gerrit I’m a bit hazy on, what I do know is that it’s primary developer Shawn Pearce (spearce) is one of the Git “inner circle” who contributes heavily to Git itself as well as JGit, a Git implementation in Java which sits underneath Gerrit’s internals. What makes Gerrit unique in the land of code review systems is how tightly coupled Gerrit is with Git itself, so much so that you submit changes by pushing as if the Gerrit server were “just another Git repo.”

I recommend building Gerrit from source for now, spearce is planning a proper release of the recent Gerrit developments shortly before Christmas, but who has that kind of patience! To build Gerrit you will need Maven and the Sun JDK 1.6.

Setting up the Gerrit daemon

First you should clone one of Gerrit’s dependencies, followed by Gerrit itself:

banana% git clone git://android.git.kernel.org/tools/gwtexpui.git
banana% git clone git://android.git.kernel.org/tools/gerrit.git

Once both clones are complete, you can start by building one and then the other (which might take a while, go grab yourself a coffee, you’ve earned it):

banana% (cd gwtexpui && mvn install)
banana% cd gerrit && mvn clean package

After Gerrit has finished building, you’ll have a .war file ready to run Gerrit with (note: depending on when you read this article, your path to gerrit.war might have changed). First we’ll initialize the directory “/srv/gerrit” as the location where the executing Gerrit daemon will store its logs, data, etc:

banana% java -jar gerrit-war/target/gerrit-2.0.25-SNAPSHOT.war init -d /srv/gerrit
*** Gerrit Code Review v2.0.24.2-72-g4c37167
***

Initialize '/srv/gerrit' [y/n]? y

*** Git Repositories
***

Location of Git repositories   [git]:

*** SQL Database
***

Database server type           [H2/?]:

*** User Authentication
***

Authentication method          [OPENID/?]:

*** Email Delivery
***

SMTP server hostname           [localhost]:
SMTP server port               [(default)]:
SMTP encryption                [NONE/?]:
SMTP username                  :

*** SSH Daemon
***

Gerrit SSH listens on address  [*]:
Gerrit SSH listens on port     [29418]:

Gerrit Code Review is not shipped with Bouncy Castle Crypto v144
  If available, Gerrit can take advantage of features
  in the library, but will also function without it.
Download and install it now [y/n]? y
Downloading http://www.bouncycastle.org/download/bcprov-jdk16-144.jar ... OK
Checksum bcprov-jdk16-144.jar OK
Generating SSH host key ... rsa... dsa... done

*** HTTP Daemon
***

Behind reverse HTTP proxy (e.g. Apache mod_proxy) [y/n]? n
Use https:// (SSL)             [y/n]? n
Gerrit HTTP listens on address [*]:
Gerrit HTTP listens on port    [8080]: 

Initialized /srv/gerrit

After running through Gerrit’s brief wizard, you’ll be ready to start Gerrit itself (note: this command will not detach from the terminal, so you might want to start it within screen for now):

banana% java -jar gerrit-war/target/gerrit-2.0.25-SNAPSHOT.war daemon -d /srv/gerrit

Now that you’ve reached this point you’ll have Gerrit running a web application on port 8080, and listening for SSH connections on port 29418, congratulations! You’re most of the way there :)

Creating users and groups

Welcome to Gerrit

First thing you should do after starting Gerrit up is log in to make sure your user is the administrator, you can do so by clicking the “Register” link in the top right corner which should present you with an openID login dialog

After logging in with your favorite openID provider, Gerrit will allow you to enter in information about you (SSH key, email address, etc). It’s worth noting that the email address is very important as Gerrit uses the email address to match your commits to your Gerrit account

When you create your SSH key for Gerrit, it’s recommended that you give it a custom entry in ~/.ssh/config along the lines of:

Host gerrithost
    User 
    Port 29418
    Hostname 
    IdentityFile 

After you click “Continue” at the bottom of the user information page, you will be taken to your dashboard which is where your changes waiting to be reviewed as well as changes waiting to be reviewed by you will be waiting

Now that your account is all set up, let’s create a group for “integrators”, integrators in Git parlance are those that are responsible for reviewing code and integrating it into the “official” repository (typically integrators are project maintainers or core developers). Be sure to add yourself to the “Integrators” group, we’ll use this “Integrators” group later to create more granular permissions on a particular project:

Projects in Gerrit

Creating a new project in Gerrit is fairly easy but a little different insofar that there isn’t a web UI for doing so but there is a command line one:

banana% ssh gerrithost gerrit create-project -n

For the purposes of my examples moving forward, we’ll use a project created in Gerrit for one of the Python modules I maintain, py-yajl. After creating the “py-yajl” project with the command line, I can visit Admin > Projects and select “py-yajl” and edited some of its permissions. Here we’ll give “Integrators” the ability to Verify changes as well as Push Branch.

With the py-yajl project all set up in Gerrit, I can return to my Git repository and add a “remote” for Gerrit, and push my master branch to it

banana% git checkout master
banana% git remote add gerritrhost ssh://gerrithost/py-yajl.git
banana% git push gerrithost master

This will give Gerrit a baseline for reviewing changes against and allow it to determine when a change has been merged down. Before getting down to business and starting to commit changes, it’s recommended that you install the Gerrit Change-Id commit-msg hook documented here which will help Gerrit track changes through rebasing; once that’s taken care of, have at it!

banana% git checkout -b topic-branch
banana% 
banana% git commit 
banana% git push gerrithost HEAD:refs/for/master

The last command will push my commit to Gerrit, the command is kind of weird looking so feel free to put it behind a git-alias(1). After the push is complete however, my changes will be awaiting review in Gerrit

At this point, you’d likely wait for another reviewer to come along and either comment your code inline in the side-by-side viewer or otherwise approve the commit bu clicking “Publish Comments”

After comments have been published, the view in My Dashboard has changed to indicate that the change has not only been reviewed but also verified:

Upon seeing this, I can return back to my Git repository and feel comfortable merging my code to the master branch:

banana% git checkout master
banana% git merge topic-branch
banana% git push origin master
banana% git push gerrithost master

The last command is significant again, by pushing the updated master branch to Gerrit, we indicate that the change has been merged, which is also reflected in My Dashboard

Tada! You’ve just had your code reviewed and subsequently integrated into the upstream tree, pat yourself on the back. It’s worth noting that while Gerrit is under steady development it is being used by the likes of the Android team, JGit/EGit team and countless others. Gerrit contains a number of nice subtle features, like double-clicking a line inside the side-by-side diff to add a comment to that line specifically, the ability to “star” changes (similar to bookmarking) and a too many others to go into detail in this post.

While it may seem like this was a fair amount of set-up to get code reviews going, the payoff can be tremendous, Gerrit facilitates a solid Git-oriented code review process that scales very well with the number of committers and changes. I hope you enjoy it :)

On GitHub and how I came to write the fastest Python JSON module in town

2009-12-04T00:00:00+00:00

Perhaps the title is a bit too much ego stroking, yes, I did write the fastest Python module for decoding JSON strings and encoding Python objects to JSON. I didn’t however write the parser behind the scenes.

Over the summer I discovered “Yet Another JSON Library” on GitHub, written by Lloyd Hilaiel, jonesing for a Saturday afternoon project I started the “py-yajl” project to see if I could implement a Python C module atop Lloyd’s marvelous parsing library. After tinkering with the project for a while I got a working prototype building (learning how to define custom types in Python along the way) and let the project stagnate as my weekend ended and the workweek resumed.

A little over a week ago “autodata”, another GitHub user, sent me a “Pull Request” with some minor changes to make py-yajl build cleaner on amd64; my interest in the project was suddenly reignited, amazing what a little interest can do for motivation. Over the 10 days following autodata’s pull request I discovered that a former colleague of mine and fellow GitHub user “teepark” had forked the project as well, working on Python 3 support. Going from zero to two people interested in the project, I quickly converted the code from a stagnant, borderline embarrassing, dump of C code into a leak-free, swift JSON library for Python. Not one to miss out on the fun, I pinged Lloyd who quickly became as enamored with making py-yajl the best Python JSON module available, he forked the project and almost immediately sent a number of pull requests my way with further optimizations to py-yajl such as:

Swapping out the use of Python lists to a custom pointer stack for maintaining internal state
Accelerating parsing and handling of Number objects
Pruning a few memory leaks here and there

Thanks to mikeal’s JSON post and jsonperf.py script, Lloyd and I could both see how py-yajl was stacking up against cjson, jsonlib, jsonlib2 and simplejson; things got competitive. Below are the most recent jsonperf.py results with py-yajl v0.1.1:

json.loads:         6470.22037ms
simplejson.loads:   202.21063ms  
yajl.loads:         145.32621ms
cjson.decode:       102.44788ms

json.dumps:         2309.15286ms
cjson.encode:       276.49586ms   
simplejson.dumps:   201.59785ms
yajl.dumps:         161.00153ms

Over the coming days or weeks (as time permits) I’m planning on adding JSON stream parsing support, i.e. parsing a stream of data as it’s coming in off a socket or file object, as well as a few other miscellaneous tasks.

Given the nature of GitHub’s social coding dynamic, py-yajl got off the ground as a project but Yajl itself gained an IRC channel (#yajl on Freenode) and a mailing list (yajl@librelist.com). To date I have over 20 unique repositories on GitHub (i.e. authored by me) but the experience around Yajl has been the most exciting and finally proved the “social coding” concept beneficial to me.

Do you love Git too?

2009-11-03T00:00:00+00:00

In addition to RSS feeds, one of my favorite sources of reading material is the Git mailing list; I’m not really active, I simply enjoy reading the discussions around code and the best solutions for certain problems. If you read the list long enough, you’ll start to appreciate the time and attention the Git core developers (spearce, peff and junio (a.k.a. gitster)) put into cultivating the code and in cultivating new contributors. Of all the open source projects I watch to one extent or another, Git is very effective at bringing in new contributors and getting their contributions vetted for inclusion.

If you’re a heavy Git user (like me) you can certainly see the results of their tireless efforts, Junio’s (git.git’s maintainer) in particular. I highly recommend checking out his Amazon wishlist to thank him for his efforts.

Jython, JGit and co. in Hudson

2009-07-21T00:00:00+00:00

At the Hudson Bay Area Meetup/Hackathon that Slide, Inc. hosted last weekend, I worked on the Jython plugin and released it just days after releasing a strikingly similar plugin, the Python plugin. I felt that an explanation might be warranted as to why I would do such a thing.

For those that don’t know, Hudson is a Java-based continuous integration server, one of the best CI servers developed (in my humblest of opinions). What makes Hudson so great is a very solid plugin architecture allowing developers to extend Hudson to support a wide variety of scripting languages as well as notifiers, source control systems, and so on (related post on the growth of Hudson’s plugin ecosystem). Additionally, Hudson supports slaves on any operating system that Java supports, allowing you to have a central manager (the “master” Hudson server/node) and a vast network of different machines performing tasks and executing jobs. Now that you’re up to speed, back to the topic at hand.

Jython versus Python plugin. Why bother with either, as @gboissinot pointed out in this tweet? The interesting thing about the Jython plugin, particularly when you use a large number of slaves is that with the installation of the Jython plugin, suddenly you have the ability to execute Python script on every single slave, regardless of whether or not they actually have Python installed. The more “third party” that can be moved into Hudson by way of the plugin system means reduced dependencies and difficulty setting up slaves to help handle load.

Take the “git” versus the “git2” plugin, the git plugin was recently criticized on the #hudson channel because of it’s use of the JGit library, versus “git2” which invokes git(1) on the command line. The latter approach is flawed for a number of reasons, particularly the reliance on the git command line executables and scripts to return consistent formatting is specious at best even if you aren’t relying on “porcelain” (git community terminology for front-end-ish script and code sitting on top of the “plumbing”, the breakdown is detailed here). The command-line approach also means you now have to ensure every one of your slaves that are likely to be executing builds have the appropriate packages installed. One the flipside however, with the JGit-based approach, the Hudson slave agent can transfer the appropriate bytecode to the machine in question and execute that without relying on system-dependencies.

The Hudson Subversion plugin takes a similar approach, being based on SVNKit.

Being a Python developer by trade, I am certainly not in the “Java Fanboy” camp, but the efficiencies gained by incorporating Java-based libraries in Hudson plugins and extensions is a no brainer, the reduction of dependencies on the systems incorporated in your build farm will save you plenty of time in maintenance and version woes alone. In my opinion, the benefits of JGit, Jython, SVNKit, and the other Java-based libraries that are running some of the most highly used plugins in the Hudson ecosystem continue to outweigh the costs, especially as we find ourselves bringing more and more slaves online.

Git Protip: Split it in half, understanding the anatomy of a bug (git bisect)

2009-03-06T00:00:00+00:00

I've been sending "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers.

There are those among us who can look at a reproduction case for a bug and just know what the bug is. For the rest of us mere mortals, finding out what change or set of changes actually introduced a bug is extremely useful for figuring out why a particular bug exists. This is even more true for the more elusive bugs or the cases where code "looks" correct and you're stumped as to why the bug exists now, when it didn't yesterday/last week/last month. The options in most classical version control systems you have available to you are to sift through diffs or wade through log message after log message trying to spot the particular change that introduced the regression you're now tasked with resolving.

Fortunately (of course) Git offers a handy feature to assist you in tracking down regressions as they're introduced, git bisect. Take the following scenario:

Roger has been working on some lower level changes in a project branch lately. When he left work last night, he ran his unit tests (everything passed), committed his code and went home for the day. When he came in the next morning, per his typical routine, he synchronized his project branch with the master branch to ensure his code wasn't stomping on released changes. For some reason however, after synchronizing his branch, his unit tests started to fail indicating that a bug was introduced in one of the changes that was integrated into Roger's project branch.

Before switching to Git, Roger might have spent an hour looking over changes trying to pinpoint what went wrong, but now Roger can use git bisect to figure out exactly where the issue is. Taking the commit hash from his last good commit, Roger can walk through changes and pinpoint the issue as follows:



## Format for use is: git bisect start [<bad> [<good>...]] [--] [<paths>...]


xdev4% git bisect start HEAD 324d2f2235c93769dd97680d80173388dc5c8253


Bisecting: 10 revisions left to test after this





[064443d3164112554600f6da39a36ffb639787d7] Changed the name of an a/b test.


xdev4%

This will start the bisect process, which is interactive, and start you halfway between the two revisions specified above (see the image below). Following the scenario above, Roger would then run his unit tests. Upon their success, he'd execute "git bisect good" which would move the tree halfway between that "good" revision and the "bad" revision. Roger will continue doing this until he lands on the commit that is responsible for the regression. Knowing this, Roger can either revert that change, or make a subsequent revision that corrects the regression introduced.

A sample of what this sort of transcript might look like is below:



xdev4% git bisect good                              


Bisecting: -1 revisions left to test after this


[bcf020a6c4ac7cc5df064c66b182b2500470000a] Merge branch 'cjssp' into master


xdev4% git bisect bad


bcf020a6c4ac7cc5df064c66b182b2500470000a is first bad commit


xdev4% git show bcf020a6c4ac7cc5df064c66b182b2500470000a


commit bcf020a6c4ac7cc5df064c66b182b2500470000a


Merge: 62153e2... 064443d...


Author: Chris <chris@foo>





Date:   Tue Jan 27 12:57:45 2009 -0800





    Merge branch 'cjssp' into master





xdev4% git bisect log


# bad: [7a5d4f3c90b022cb66fd8ea1635c5de6768882d7] Merge branch 'foo' into master


# good: [d1014fd52bebd3c56db37362548e588165b7f299] Merge branch 'bar'


git bisect start 'HEAD' 'd1014fd52bebd3c56db37362548e588165b7f299' '--' 'apps'





# good: [064443d3164112554600f6da39a36ffb639787d7] Changed the name of an a/b test.  PLEASE PICK ME UP WITH NEXT PUSH.  thx


git bisect good 064443d3164112554600f6da39a36ffb639787d7


# bad: [bcf020a6c4ac7cc5df064c66b182b2500470000a] Merge branch 'cjssp' into master


git bisect bad bcf020a6c4ac7cc5df064c66b182b2500470000a


xdev4% git bisect reset 


xdev4%

Instead of spending an hour looking at changes, Roger was able to quickly walk a few revisions and run the unit tests he has to figure out which commit was the one causing trouble, and then get back to work squashing those bugs.

Roger is, like most developers, inherently lazy, and running through a series of revisions running unit tests sounds like "work" that doesn't need to be done. Fortunately for Roger, git-bisect(1) supports the subcommand "run" which goes hand in hand with unit tests or other tests. In the example above, let's pretend that Roger had a test case exhibiting the bug he was noticing. What he could actually do is let git bisect run automatically run a test script to run his unit tests to find the offending revision i.e.:



xdev4% git bisect start HEAD 324d2f2235c93769dd97680d80173388dc5c8253


Bisecting: 10 revisions left to test after this





[064443d3164112554600f6da39a36ffb639787d7] Changed the name of an a/b test.


xdev4% git bisect run ./mytest.sh

After executing the run command, git-bisect(1) will binary search the revisions between GOOD and BAD testing whether or not "mytest.sh" returns a zero (success) or non-zero (failure) return code until it finds the commit that causes the test to fail. The end result should be the exact commit the regression was introduced into the tree, after finding this Roger can either grab his rubber chicken and go slap his fellow developer around or fix the issue and get back to playing Nethack.

All in all git-bisect(1) is extraordinarily useful for pinning down bugs and diagnosing issues as they're introduced into the code base.

For more specific usage of `git bisect` refer to it's man page here: git-bisect(1) man page

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Git Protip: A picture is worth a thousand words (git tag)

2009-01-15T00:00:00+00:00

I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers. Below is the fourth Protip written to date.

While the concept of "tagging" or "labeling" code is not a new, or original idea that was introduced with Git, our use of tags in a regular workflow does not predate the migration to Git however. At it's most basic level, a "tag" in any version control system is to take a "picture" of how the tree looks at a certain point in time such that it can be re-created later. This can be extremely helpful for both local and team development, take the following scenario for local development using tags:

Tim is extremely busy, most of his days working at an exciting, fast-paced start-up seem to fly by. With one particular project Tim is working on, a lot of code is changing at a very fast pace and the branch he's currently working in is stable one minute and destabilized the next. Tim has two basic options for leaving himself "bread-crumbs" to step back in time to a stable or an unstable state. The first, complicated option, is to mark his commit messages with something like "STABLE", etc so he can git diff or git reset --hard from the current HEAD to the last stable point of the branch.

The second option is to make use of tags. Whenever Tim reaches a stable point in his turmultuous development, he can simply run:
git tag wip-protips_`date "+%s"
(or something similar, `date` added to ensure the tag is unique). If Tim finds himself too far down the wrong path, he can rollback his branch to the latest tag (git reset --hard protiptag), create a new stable branch based on that tag (git checkout -b wip-protip-2 protiptag), or diff his current HEAD to the tag to see what all he's changed since his branch was stable (git diff protiptag...HEAD)

This local development scenario can become a team development scenario involving tags, if for example, Tim needed QA to start testing portions of his branch (his changes are just that important). Since the current HEAD of Tim's branch is incredibly unstable, he can push his tag to the central repository so QA can push a stage using the tag to the last stable point in the branch's history with the command: git push origin tag protiptag

Tags are similar to most other "refs" in Git insofar that they are distributable, if I execute git fetch your-repo --tags, I can pull the tags you've set in "your-repo" and apply them locally aid development. The distributed nature is primarily how tags differ in Git from Subversion, nearly the rest of the concept is the exact same.

Currently at Slide, tag usage is dominated by the post-receive hook in the central repository, where every push into the central repository ("origin") in the branch release branch is tagged. This allows us to quickly "revert" bad live pushes temporarily, by simply pushing the last "good" tagged release, to ensure minimal site destabilization (while we correct live issues outside of the release branch).

For more specific usage of `git tag` refer to the git-tag(1) man page

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Find me on github (rtyler)

2009-01-05T00:00:00+00:00

Rod reminded me with his comment in one of my other posts that I've not yet mentioned github.

I've got a bunch of my nonsense thrown up on github.com/rtyler, it's awesome (no really, github rocks my socks, those guys are good people).