Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Git integration with Hudson and Trac.

23 Nov 2008

As I mentioned in my previous post about Git at Slide, I wanted to answer some questions that we had to answer to migrate to Git for our development workflow. One of the major questions that had to be answered, especially for our QA department to sign off on the idea was:

How will Git integrate with Hudson, Trac and our other pieces of development infrastructure?

For us to use any version control system, centralized or decentralized, there had to be a "central" point for changes to integrate into in order for us to properly test releases and then ship them to the live site. With this requirement, we oriented our use of Git around a centralized repository which developers pull from, and push to on a regular basis.

In order for Git to integrate into Trac and Hudson, we opted for baking the functionality we needed into the post-receive hook on the centralized repository instead of relying on GitTrac, or the Hudson Git plugin to do what we needed it do to.

You can find the script below, or in this GitHub repository. The script requires the Trac XML-RPC plugin to be installed in order to properly annotate tickets when changes are pushed into the central repository. The notation syntaxes that the post-receive.py script supports in commit messages are:

re #12345
qa #12345
attn bbum,fspeirs

As one might expect, the first notation: "re #12345" will simply annotate a ticket with the commit message and the branch in which the commit was pushed into. The "qa #12345" notation part of an internal notation of marking tickets in Trac as "Ready for QA", which let's our QA engineers know when tickets are ready to be verified; a "qa" note in a commit message will reference the commit and change the status of the ticket in question. The final notation that the script supports: "attn bbum,fspeirs" is purely for calling attention to a code change, or to ask for a code review. When a commit is pushed to the central repository with "attn" in the commit message, an email with the commit message and diff will be emailed to the specified recipients.

In addition to updating Trac tickets, pushes into any branch that have a Hudson job affiliated will use the Hudson External API to queue a build for that branch. In effect, it you "git push origin master", the post-receive.py script will ping Hudson and ask it to queue a build of the "master" job.

I have included the script inline below for those weary of clicking links like this one to the GitHub repository containing the script. Enjoy :)

'''


Copyright (c) 2008 Slide, Inc





Permission is hereby granted, free of charge, to any person obtaining a copy


of this software and associated documentation files (the "Software"), to deal


in the Software without restriction, including without limitation the rights


to use, copy, modify, merge, publish, distribute, sublicense, and/or sell


copies of the Software, and to permit persons to whom the Software is


furnished to do so, subject to the following conditions:





The above copyright notice and this permission notice shall be included in


all copies or substantial portions of the Software.





THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR


IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,


FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE


AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER


LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,


OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN


THE SOFTWARE.


'''





'''


	For questions, patches, etc contact R. Tyler Ballance 


'''


import getpass


import os


import re


import socket


import smtplib


import sys


import time


import xmlrpclib





from optparse import OptionParser





MAIL_SERVER = 'your_mail_server.com'


MAIL_SUFFIX = '@mycompany.com'


BUILD_HUDSON = True


HUDSON_URL = 'http://hudson'


TRAC_XMLRPC_URL = 'URL_TO_TRAC/projects/MYPROJECT/login/xmlrpc'





def rpcProxy(user='qatracbot', password=None):


	password = password or os.getenv('TRAC_PASS')


	return xmlrpclib.ServerProxy('https://%s:%s@%s' % (user, password, TRAC_XMLRPC_URL))





def _send_commit_mail(user, address, subject, branch, commits, files, diff):


	print 'Sending a GITRECEIVE mail to %s' % address


	message = 'Commits pushed to %s:\n--------------------------------------\n\n%s\n--------------------------------------\n%s\n--------------------------------------\n%s' % (branch, commits, files, diff)


	_send_mail(user, address, subject, message)


def _send_attn_mail(user, destuser, diff):


	print 'Sending a "please review" mail to %s' % destuser


	message = '''Good day my most generous colleague! I would hold you in the highest esteem and toast you over my finest wines if you would kindly review this for me\n\n\t - %(user)s\n\nDiff:\n------------------------------------------------\n%(diff)s''' % {'diff' : diff, 'user' : user}


	addresses = []


	for d in destuser.split(','):


		addresses.append('%s%s' % (d, EMAIL_SUFFIX))


	_send_mail(user, addresses, 'Please review this change', message)





def _send_mail(user, address, subject, contents):


	try:


		if not isinstance(address, list):


			address = [address]


		s = smtplib.SMTP(MAIL_SERVER)


		message = 'From: %s%s\r\nTo: %s\r\nSubject: %s\r\n\r\n%s\n' % (user, MAIL_SUFFIX, ', '.join(address), subject, contents)


		s.sendmail('%s%s' % (user, MAIL_SUFFIX), address, message)


		s.quit()


	except:


		print 'Failed to send the email :('





def _update_ticket(ticket, message, options={}):


	rpc = rpcProxy()


	rpc.ticket.update(ticket, message, options)


	return rpc





def find_re(commit):


	return map(int, re.findall(r'(?i)\s+re\s*#([0-9]+)', commit))


def handle_re(branch, commit, ticket):


	print 'Annotating ticket #%s' % ticket


	message = '''The following was committed in "%(branch)s":


			\{\{\{	


%(commit)s \}\}\}


		''' % {'branch' : branch, 'commit' : commit}


	_update_ticket(ticket, message)





def find_qa(commit):


	return map(int, re.findall(r'(?i)\s+qa\s*#([0-9]+)', commit))


def handle_qa(branch, commit, ticket):


	print 'Marking ticket #%s as "ready for QA"' % ticket


	message = '''The following was committed in "%(branch)s":


			\{\{\{ 


%(commit)s \}\}\}


		''' % {'branch' : branch, 'commit' : commit}


	rpc = _update_ticket(ticket, message, options={'status' : 'qa'})





def find_attn(commit):


	return re.findall(r'(?i)\s+attn\s*([A-Za-z,]+)', commit)


def handle_attn(branch, commit, attn):


	# Unpack commit from this: "commit 5f4c31f3c31347c62d68ecb5f2c9afa3333f4ad0\nAuthor: R. Tyler Ballance \nDate: Wed Nov 12 16:57:32 2008 -0800 \n\n Merge commit 'git-svn' \n\n  \n \n"


	try:


		commit_hash = commit.split('\n')[0].split(' ')[1]


	except:


		return # fuk it


	diff = os.popen('git show --no-color %s --pretty=format:"Author: %%cn <%%ce>%%n%%s%%n%%n%%b%%n%%n%%H"' % commit_hash).read().rstrip()


	_send_attn_mail(getpass.getuser(), attn,  diff)





def mail_push(address, oldrev, newrev, refname):


	user = getpass.getuser()


	machine = socket.gethostname()


	base_git_diff = 'git diff %s %s' % (oldrev, newrev)


	files_diffed = os.popen('%s --name-status' % (base_git_diff)).read().rstrip()


	full_diff = os.popen('%s -p --no-color' % (base_git_diff)).read().rstrip()


	''' git rev-parse --not --branches | grep -v "$new" | git rev-list "$old".."$new" --stdin '''


	commits = os.popen('git rev-parse --not --branches | grep -v "%s" | git rev-list %s..%s --stdin --pretty=format:"Author: %%cn <%%ce>%%nDate: %%cd %%n%%n %%s %%n%%n %%b %%n %%n-------[post-receive marker]------%%n" --first-parent ' % (newrev, oldrev, newrev)).read().rstrip()


	branch = refname.split('/')[-1]


	mail_subject = 'GITRECEIVE [%s/%s] %s files changed' % (machine, branch, len(files_diffed.split('\n')))





	if branch == 'master-release':


		print 'Tagging release branch'


		tagname = 'livepush_%s' % (time.strftime('%Y%m%d%H%M%S', time.localtime()))


		sys.stderr.write('Creating a tag named: %s\n\n' % tagname)


		os.system('git tag %s' % tagname)


		mail_subject = '%s (tagged: %s)' % (mail_subject, tagname)





	if BUILD_HUDSON_JOB:


			print 'Queuing the Hudson job for "%s"' % branch


			os.system('/usr/bin/env wget -q -O /dev/null http://%s/job/%s/build' % (HUDSON_URL, branch))





	_send_commit_mail(user, address, mail_subject, branch, commits, files_diffed, full_diff)





	if branch == 'master':


		return # we don't want to update tickets and such for master/merges





	commits = filter(lambda c: len(c), commits.split('-------[post-receive marker]------'))


	commits.reverse()


	for c in commits:


		if c.find('Squashed commit') >= 0:


			continue # Skip bullshit squashed commit





		for attn in find_attn(c):


			handle_attn(branch, c, attn)





		for ticket in find_re(c):


			handle_re(branch, c, ticket)





		for ticket in find_qa(c):


			handle_qa(branch, c, ticket)








if __name__ == '__main__':


	op = OptionParser()


	op.add_option('-m', '--mail', dest='address', help='Email address to mail git push messages to')


	op.add_option('-o', '--oldrev', dest='oldrev', help='Old revision we\'re pushing from')


	op.add_option('-n', '--newrev', dest='newrev', help='New revision we\'re pushing to')


	op.add_option('-r','--ref', dest='ref', help='Refname that we\'re pushing')


	opts, args = op.parse_args()





	if not opts.address or not opts.oldrev or not opts.newrev or not opts.ref:


		print '*** You left out some needed parameters! ***'


		exit





	mail_push(opts.address, opts.oldrev, opts.newrev, opts.ref)

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Delightfully Wrong About Git

22 Nov 2008

slide software development git

A very long time ago I mentioned on Twitter that I was looking at Git as a replacement for Subversion and Perforce with my personal projects, but lamented moving to Git at Slide would not be feasible

Like most disagreements I've had with people on technology in the past, immediately after I said it, I actively tried to prove myself wrong. Back in April when I made the statement above, Subversion 1.4 was "good enough" (just barely) for what we wanted to do as far as source control, but I became more and more curious about whether or not we could move to Git.

Back in April, after spending a week with projects like Tailor and git-svn(1) I started to look at the potential of moving just my team over to Git for evaluation purposes. By the end of May I had requested Git to be installed on the machines that we use for development on a day-to-day basis and we moved the team over to Git by the second week of June.

What followed were six months of sloshing uphill, some of the most notable milestones that we had to figure out in this time frame were:

Whereas in the Subversion architecture with a central repository there is a very clear development focal point for sharing code between developers, what is this in the Git workflow?
How do you ensure developers don't forget code was committed "in that one branch, in that one repository" and keep track of code
How will Git integrate with Hudson, Trac and our other pieces of development infrastructure? (answered here)

I'll be answering these questions and share some of the scripts, hooks, and documentation we've written internally to make moving to Git throughout the company a reality. I wish I could say I was responsible for it all, but there were a number of other engineers that were extremely important in defining best practices, and what this shiny new world without Subversion would look like.

At the end of the day, I'm pleased as punch with the transition. I don't hate Subversion, I just love Git; call me "spoiled" but I think we deserve something more than a system that strives to be "a better CVS".

Update: I've posted an addendum: Why we chose Git, a rebuttal

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Reliable Locks in Hudson

05 Nov 2008

slide hudson

There has been some amount of discussion on the Hudson user's list recently about the status of the "Locks and Latches" plugin. The plugin allows for one to create "locks" for Jobs in a similar manner to how "locks" work in a multithreaded programming environment. The need for such a plugin becomes very clear once you start to run multiple jobs that depend on some set of shared resources, take the following example:

Jobs A,B,C must run unit tests that fetch data from a test site
Slave #1 can only run one instance of Apache at a time

How one would accomplish this with the Locks and Latches plugin would be to create a lock like "Site Lock" in the Hudson configuration, and then bind Jobs A, B, C to that Lock. Making the (large) assumption that the plugin works correctly and locks properly in order to prevent A and B from running concurrently, this would be enough to satisfy the requirements we have for the scenario above. Unfortunately it seems the plugin is largely unmaintained and buggy; in the past couple weeks of experimenting with such a set up on a variety of different slaves we've noticed that the locks aren't always respected, causing some locked jobs to execute in parallel spewing bad test results and build failures (the crux of this issue seems ot have been reported by Sergio Fernandes in #2450).

The Loopback Slave
The easiest way I found to work around the short-comings of the Locks and Latches plugin was to "break up" the Locks. Locks are only really useful if you have more than one "executor" on a Hudson node, in order to allow Hudson to execute jobs simultaneously. In essence, if you only have one executor, the Hudson queueing system will technically perform your "lock" for you by default. And thus the "loopback slave" was born! When explaining this to a co-worker, I likened my workaround to the fork(2) call, whereas the Locks and Latches plugin is much more of a pthread_mutex_lock(2) call. According to the "Distributed Builds" page on the Hudson wiki, you can start slave agent headlessly on any machine, so why not the master node?

Above is the configuration of one such "loopback slave" that took the place of one of the executors on the master node.

After setting up the loopback slave, it's just a matter of tying the Job to that node for building.

In short our set up was before: Jobs A, B, C all use the Lock "Site Job" in order to queue properly. With this change, now there is no lock, and Jobs A, B, C are all bound to the loopback slave in place of the lock on the master node. While certainly not ideal, given the frustrations of the Locks and Latches plugin going unmaintained this is the best short-term solution I've come up with thus far.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

I hate it when The Onion is factual.

09 Oct 2008

opinion miscellaneous

I saw this referenced by an op-ed piece I read in a restaurant a few weeks back and had to share, as it's become more and more depressing.

This is a piece run by The Onion on January 17, 2001 shortly after George W. Bush came into office netitlted: Bush: 'Our Long National Nightmare Of Peace And Prosperity Is Finally Over

I'm going to go cry in a corner now.

Hudson Build Bookmarklet

04 Oct 2008

miscellaneous software development hudson

During the usual friday-frenzy I sat down and wrote a quick 10 minute little bookmarklet to start a Hudson job. Unlike most bookmarklets that "do things" this one actually "does things" but doesn't take you away from your current page. Using the Hudson Remote Access API you can query information from Hudson programmatically, but you can also kick off builds remotely with nothing more than a simple HTTP request to the properly formed URL.

By dragging the link below to your bookmark bar, and updating the URL within ("http://hudson/") to the URL of your Hudson instance, you can queue a Hudson build from any page at any time (without leaving the page).

The Bookmarklet

Build Hudson Job

The Code

Build Hudson Job

How it actually works

After talking the concept of making cross-domain HTTP requests over with Sergio, he suggested just using an "IMG" tag (or "IFRAME") to accomplish the task. The bookmarklet doesn't actually have to send any form parameters or receive any data, Hudson just needs to receive any HTTP request to the right URL. By creating the IMG object in JavaScript, and appending it to the body of the current page the user is on, it'll effectively con the browser into making the HTTP request without needing to pull off any XmlHttpRequest hacks. One of the more interesting things that we found out when playing with the end of the bookmarklet, was that if we returned "false" or tried to wrap the whole thing in a closure, the link would still execute and the browser would change pages. However, if we stuck an "alert()" call into the tail end of the bookmarklet JavaScript, execution would stop and the link wouldn't change the page in the browser (tested this in Firefox 3).

Happy Hudsoning :)

Git back into Subversion, Mostly Automagically (Part 3/3)

01 Oct 2008

slide software development git

Thus far I've covered most of the issues and hurdles we've addressed while experimenting with Git at Slide in parts 1 and 2 of this series, the one thing I've not covered that is very important to address is how to work in the "hybrid" environment we currently have at Slide, where as one team works with Git and the rest of the company works in Subversion. Our setup involves having a "Git-to-Subverison" proxy repository such that everything to the "left" of that proxy repository is entirely Subversion without exceptions and everything to the "right" of that repository is entirely Git, without exceptions. Part of my original motivation for putting this post at the end of the series was that, when I originally wrote the first post on "Experimenting with Git at Slide" I actually didn't have this part of the process figured out. That is to say, I was bringing code back and forth between Git and Subversion courtesy of git-svn(1) and some gnarly manual processes.

No Habla Branching

The primary issue when bringing changesets from Git to Subversion is based in the major differences between how the two handle branching and changesets to begin with. In theory, projects like Tailor were created to help solve this issue by first translating both the source and destination repositories into an intermediary changeset format in order to cross-apply changes from one end to the other. Unfortunately after I spent a couple days battling with Tailor, I couldn't get it to properly handle some of the revisions in Slide's three year history.

If you've ever used git-svn(1) you might be familiar with the git-svn dcommit command, which will work for some percentage of users that want to maintain dual repositories between Git and Subversion, things break down however once you introduce branching into the mix.

Up until Subversion 1.5, Subversion had no concept of "merge tracking" (even in 1.5, it requires the server and client to be 1.5, it also makes nasty use of svn props). Without the general support for "merge tracking" the concept of a changeset sourcing from a particular branch or the concept of a "merge commit" are entirely foreign in the land of Subversion. In less mumbo jumbo, this effectively means that the "revisions" that you would want to bring from Git into Subversion need to be "flattened" when being "dcommitted" into Subversion's trunk. Git supports a means of flattening revision history when merging and pulling by way of the "--squash" command line argument, so this flattening for git-svn is possible.

Giant Disclaimer

What I'm about to write I dutifully accept as Git-heresy, a nasty hack and not something I'm proud of.

Flattening into Subversion

First the icky bash script that supports properly flattening revisions into the "master" branch in the git-svn repository and dcommits the results:

#!/bin/bash





MERGE_BRANCH=mergemaster


REPO=$1


BRANCH=$2





if [[ -z "${1}" || -z "${2}" ]]; then


	echo "===> You must provide a \"remote\" and a \"refspec\" for Git to use!"


	echo "===> Exiting :("


	exit 1;


fi





LATEST_COMMIT=`git log --max-count=1 --no-merges --pretty=format:"%H"`





function master


{


	echo "==> Making sure we're on 'master'"


	git checkout master


}





function setup_mergemaster 


{


	master


	echo "==> Killing the old mergemaster branch"


	git branch -D $MERGE_BRANCH





	echo "==> Creating a new mergemaster branch"


	git checkout -b $MERGE_BRANCH


	git checkout master


}





function cleanup


{


	rm -f .git/SVNPULL_MSG


}





function prepare_message


{


	master





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	cp .git/SQUASH_MSG .git/SVNPULL_MSG





	master


}





function merge_to_svn


{


	git reset --hard ${LATEST_COMMIT}


	master


	setup_mergemaster





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	echo "==> Committing..."


	git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase





	cleanup


}





setup_mergemaster





prepare_message





merge_to_svn





master





echo "===> All done!"

Gross isn't it? There were some interesting things I learned when experimenting with this script, but first I'll explain how the script is used. As I mentioned above there is the "proxy repository", this script operates on the git-svn driven proxy repository, meaning this script is only invoked when code needs to be propogated from Git-to-Subversion as opposed to Subversion-to-Git which git-svn properly supports by default in all cases. Since this is a proxy repository, that means all the "real" code and goings-on occur in the "primary" Subversion, and "primary" Git repositories, so the code is going along this path: Primary_SVN <-> [proxy] <-> Primary_Git
This setup means when we "pull" (or merge) from Primary_Git/master we are going to be flattening at that point in order to properly merge it into the Primary_SVN. Without further ado, here's the breakdown on the pieces of the script:

function setup_mergemaster 


{


	master


	echo "==> Killing the old mergemaster branch"


	git branch -D $MERGE_BRANCH





	echo "==> Creating a new mergemaster branch"


	git checkout -b $MERGE_BRANCH


	git checkout master


}

What the setup_mergemaster branch is responsible for is deleting any prior branches that have been used for merging into the proxy repository and Primary_SVN. It gives us a "mergemaster" branch in the git-svn repository that is effectively at the same chronological point in time as the master branch before any merging occurs.

function prepare_message


{


	master





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	cp .git/SQUASH_MSG .git/SVNPULL_MSG





	master


}

The prepare_message function is part of the nastiest code in the entire script, in order to get an accurate "squashed commit" commit message when the changesets are pushed into Primary_SVN, we have to generate the commit message separately from the actual merging. Since this function is performing a `git pull` from "master" into "mergemaster" the changesets that are being pulled are going to be the only ones that show up (for reasons I'm about to explain).

function merge_to_svn


{


	git reset --hard ${LATEST_COMMIT}


	master


	setup_mergemaster





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	echo "==> Committing..."


	git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase





	cleanup


}

If you noticed above in the full script block, the "LATEST_COMMIT" code, here's where it's used, it is one of the most important pieces of the entire script. Basically the LATEST_COMMIT piece of script grabs the latest non-merge-commit hash from the `git log` output and saves it for later use (here) where it's used to rollback the proxy repository to the point in time just before the last merge commit. This is done to avoid issues with git-svn(1) not understanding how to handle merge commits whatsoever. After rolling back the proxy repository, a new "mergemaster" branch is created. After the mergemaster branch is created, the actual Primary_Git changesets that differ between the proxy repository and Primary_Git are pulled into the proxy repository's master branch, and sqaushed into the mergemaster branch where they are subsequently committed with the commit message that was prepared before. The "prepare_message" part of the script becomes important at that step because the "squashed commit" message that Git generates at this point in time will effectively contain every commit that has ever been proxied across in this manner ever.

After the "merge_to_svn" function has been run the "transaction" is entirely completed and the changesets that once differed between Primary_SVN/trunk and Primary_Git/master are now normalized.

Mostly Automagically

In the near future I intend on incorporating this script into the post-receive hook on Primary_Git in such a way that will truly propogate changesets automatically from Primary_Git into Primary_SVN, but currently I'm utilizing one of my new favorite "hammers', Hudson (see: One-line Automated Testing). Currently there are two jobs set up for proxying changesets across, the first "Subversion-to-Git" simply polls Subversion for changes and executes a series of commands when changes come in: git-svn fetch && git merge git-svn && git push $Primary_Git master. This is fairly straight-forward and fits in line with what git-svn(1) is intended to do. The other job that I created is "Git-to-Subversion" which must be manually invoked by a user, but still will automatically take care of squashing commits into Primary_SVN/trunk (i.e. bash svnproxy.sh $Primary_Git master).

Wrap-up

Admittedly, this sort of setup leaves a lot to be desired. In the ideal world, Tailor would have coped with both our Git and our Subversion repositories in such a way that would have made this script nothing more than a silly idea I had on a bus. Unfortunately that wasn't case and the time budget I had for figuring out a way to force Tailor to work was about 47.5 hours less than it took me to sit down and write the script above. I'd be interested to see other solutions other organizations are utilizing to migrate from one system to the other, but at the time of this writing I can't honestly say I've heard much about people dealing with the "hybrid" scenario that we have currently at Slide.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Team Development with Git (Part 2/3)

29 Sep 2008

slide software development git

In my last post on Git, Experimenting with Git at Slide, I discussed most of the technical hurdles that stood in our way with evaluating Git for a Subversion tree that has 90k+ revisions and over 2GB of data held within. As I've learned from any project that involves more than just myself, technology is only half the battle, the other half is the human element. One of the most difficult things to "migrate" when switching to something as critical to a developer's workflow as a VCS, is habits, good ones and bad ones.

The Bad Habits

When moving my team over to Git, I was able to identify some habits that I view as "bad" that could either be blamed on how we have used Subversion here at Slide, or the development workflow that Subversion encourages. For the sake of avoiding flamewars, I'll say it's 51% us, 49% the system.

The Occasional Committer

Chances are that if you're working on "something super important!" you fall into this bad habit. Because of the nature of trunk in Subversion, if you commit half-finished work into a team-branch or trunk itself, you could cause plenty of pain for your fellow developers. As a result, you tend to commit at the end of a long day working on something, or only after something has been completed. The 9 hours of sweat and frustration you've spent pounding away on 300 lines of code is now summed up in one commit message:

Turns out there was a race-condition here, re #52516

Now three months from now when you return to the same 300 lines of code and try to figure out what the hell led to this mess, you're left with the commit message above, and nothing more.

The Less-than-attentive Developer

I've worked on a Mac for the majority of my time at Slide, as do most of my compatriots, and sooner or later one of two things will happen:svn add some/directory/ and/or svn commit. This usually results in a second commit to come into the tree with a commit message like:

Whoops, accidentally checked in resource forks

This isn't that large of a problem, except for the implication of the second command there, svn commit will commit all outstanding changes in your working copy starting in the current working directory, and recursing through children directories. I'm probably more anal-retentive about my commits than most, but I usually do a diff before I commit to make sure I'm aware of what I'm about to commit, but I've seen plenty of developers skip this step.

The Over-Confident Merger

I've fallen into this trap numerous times when merging "old" branches back into trunk, especially with binary files that may have been changed in trunk, or in my branch (hell if I know!). One thing I can speak to anecdotally from our work at Slide, is that the probability of nonsensical conflicts rises with a branch's age. The rate of our repository progresses at about 50 commits to trunk per day (~150 commits across the board), if there is a branch cut from trunk, usually within two weeks it can become extremely difficult to merge back into trunk without constant "refreshes" or merges from trunk into the branch.

If you're not careful when folding that branch back down into trunk, you can inadvertantly revert old binary files or even text files to previous states which will usually cause other individuals in the engineering organization gripe at you and your QA department to pelt you with rocks. For bonus points, you could (as I have done before) accidentally commit conflicting files earning a gold star and a dunce hat for the day. This merging pain led me to originally write my merge-safe.py script so long ago.

The Slide Way to Git

Fortunately for us, I think the decentralized nature of Git has helped us enforce some best practices when it comes to the bad habits above. "The Occassional Committer" is all but done away with thanks to the ability to atomically commit and revert revisions at a whim and have those changes not propogated to other developers until there has been an explicit push or pull.

Unfortunately however, "The Less-than-attentive Developer" isn't solved so easily. To date I've sat next to two engineers that were new to Git, and watched them both execute the same fateful command: git add .
Not realizing their mistake, they accidentally commit a truckload of build and temporary files (.so, .swp, .pyc, etc) interspersed with their regular work that they meant to commit. Git cannot prevent a developer from shooting themselves in the foot, but it does prevent them from shooting everybody else in the foot along with it (unless they commit, and then push their changes upwards).

"The Over-confident Merger" grows more and more confident in the Git-based workflow. Since Git handles changesets atomically, it becomes trivial to merge branch histories together or cherry-pick one revision and apply to an entirely separate branch. I've not yet seen a Git conflict that wasn't a true conflict insofar that it was quite literally one line of code changing in two different ways between branch histories. As an aside, when using git-svn, be prepared for all the merging "fun" that Subversion has to offer when propogating changes between the two systems.

Basic Team Workflow

The manner in which we use Git is more like a centralized-decentralized version control system. We still have a "master" repository, which provides a central synchronization point when pushing stage servers, or when bringing code into Subversion to be pushed to live servers. For any particular project one of the developers will create a branch that will serve as the primary project branch, take the "superpoke-proj" branch as an example. That developer will push this branch to "origin" (or the master repository) such that other developers can "track" that branch and contribute code. For the purposes of this example, let's say Paul and Peter are working in "superpoke-proj", while Paul is working he will incrementally commit his work, but once he has resolved/fixed a ticket, he will perform a git push and then mark the ticket appropriately such that a QA engineer can verify the fix. If Paul and Peter are working on something that "breaks the build" but they need to collaborate on it together, Paul can perform a git pull from Peter and vice versa, and again, once they're done those changes will be pushed to origin. This model allows for developers to work in relative isolation so they're not inadvertantly stepping on each others' toes, but also close enough that they can collaborate in explicit terms, i.e. when they are ready for changes to be propogated to each other or the rest of the team.

Conclusion

Our workflow, like most things at companies under 500 employees is still a "work in progress™". I think we've found the right balance thus far for the team between freedom and process the allow for sufficient mucking around in the codebase in a way that provides the most amount of time actually writing code with as little possible time spent dealing with the overhead of anything else (merging, etc). There's nothing inherently special in the way we use Git, but we've found that it works for the way we work, which is to say in a very tight release schedule that's requires multiple branches per week and plenty of merging from branch-to-branch whether it be from another team or another part of the same team.

Of course, your mileage may vary.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Facebook be riddled with swashbucklers!

19 Sep 2008

facebook

I've seen a lot of user-feedback about how confusing and "boring" the new Facebook redesign is, but I'm glad to know they are still having fun down there in Palo Alto, even if it's with subtle changes to their site (click to zoom)

To enable the pirate localization, find the language combo box at the bottom-left portion of the Facebook homepage.

Stay classy Facebook.

Hudson notifications with libnotify

18 Sep 2008

miscellaneous software development linux hudson

I've been using a Gnome-based desktop for about the past 8-9 months and one of the things I've come to really appreciate is that most Gnome applications integrate with "libnotify". Libnotify is a simple Windows taskbar-like notification system that presents status messages at the bottom of your screen. Like all great pieces of software, it has a solid Python interface which allows for incorporating it in those little 10-minutes scripts I find myself writing every now and again.

One of the things I wanted to script was the notification of the build status of the numerous jobs that we're running in our Hudson instance here at Slide. Using the Universal Feed Parser and pynotify (listed under "notify-python"), I had a good little Gnome Hudson Notifier running in less than 10 minutes.

Source code after the jump.




import feedparser


import pynotify


import time





BASE_TITLE = 'Hudson Update!'





def success(job):


        n = pynotify.Notification(BASE_TITLE, 


                                        '"%s" successfully built :)' % job,


                                        'file:///usr/share/pixmaps/gnome-suse.png')


        n.set_urgency(pynotify.URGENCY_LOW)


        return n





def unstable(job):


        return pynotify.Notification(BASE_TITLE, 


                                        '"%s" is unstable :-/' % job,


                                        'file:///usr/share/pixmaps/gnome-suse.png')





def failure(job):


        n = pynotify.Notification(BASE_TITLE, 


                                        '"%s" failed!' % job,


                                        'file:///usr/share/pixmaps/gnome-suse.png')


        n.set_urgency(pynotify.URGENCY_CRITICAL)


        return n





def main():


        pynotify.init('Hudson Notify')


        old_items = []


        while True:


                feed = feedparser.parse('http://hudson/rssLatest')


                items = [t['title'] for t in feed['entries']]


                new_items = list(set(old_items).difference(items))





                for i in new_items:


                        i = i.split(' ')


                        job, build, status = (i[0], i[1], i[2])


                        status = status.replace('(', '').replace(')','')


                        if status == 'SUCCESS':


                                success(job).show()


                        elif status == 'UNSTABLE':


                                unstable(job).show()


                        elif status == 'FAILURE':


                                failure(job).show()





                old_items = items


                time.sleep(60)





if __name__ == '__main__':


        main()

It's pretty basic right now, but does everything I really wanted it to do. I may add it into a public Git repository in the near future if I spend any more time on the project. Hope you like it :)

Chase Auto-Finance FAIL.

16 Sep 2008

opinion

I've been getting voice-mails from Chase Auto-Finance recently bugging me to pay them some money (turns out they're strapped for cash lately, something silly about irresponsible lending).

All is well and good, I normally call Chase up once a month and navigate through increasingly painful phone menus and give Chase some of my money. As luck would have it, sometime between my last payment, and my current payment, Chase decided that you should really talk to a representative to make a payment. In effect, I have to talk to some poor soul working in a shitty 9-5 call center job to pay a car payment that I've paid for the past two years via an automated system. Hooray progress.

Back to the voice mails, each one I receive I normally receive when I am at work, turns out I am receiving the voice mails because I'm too busy working to answer the phone. Unfortunately the voice mails always contain some poor soul working in a shitty 9-5 call center job asking me to call a Chase representative back to resolve my outstanding payment issue.

Why is my bank making it so damned hard to give them money?

In the future I intend on staying with my other bank for my loans since not only do they have reasonable customer service representatives, but they make it incredibly easy to give them money.

← Newer posts Older posts →