Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Why we chose Git, a rebuttal.

24 Nov 2008

slide opinion software development git One thing I learned early on in the internet, when is was more of a cobbling instead of a series, of tubes, was not to feed trolls.

That said, I found that my post "Delightfully Wrong About Git" had found it's way into such silly news aggregation machines as DZone, Reddit and Hacker News. Some of the points raised in the comments were valid and warrant a response, while the majority of them were the standard responses to any discussion about version control "psh, dumb. should have used [Bazaar/Mercurial/Darcs/Subversion/Team Foundation System]"

Why not another (D)VCS?
One of the most resounding criticisms/questions was this one, why not Bazaar? Why not Mercurial! My favorite, albeit childish, retort is "why?" But I can say that I have tried a variety of other version control systems, Git, Bazaar, CVS, Subversion, Perforce and some other proprietary VCSes at previous employers. While both Darcs and Mercurial seem to be very solid DVCSes, they suffer from a problem of momentum, Darcs in particular. They both appear to be victims of Git's success, while there is inherently nothing wrong with either of them, they are competing with Linus' love-child, Git. When chosing to move to a new VCS in a company that is well over 50+ employees, the staying power of the technology you chose is important. I feel confident that Git will not only be supported, but actively developed and improved for years to come.

More importantly than that though, I like Git. Is that not enough right there? Slide makes excessive use of branches, tags and other "complex" VCS concepts that centralized systems like CVS and Subversion have trouble with. Git Branch Madness!

With Subversion creating branches in the volume in which we create branches spiralled out of control with branches becoming "stale" quickly, meaning that if we didn't refresh the branch regularly with updates from trunk it would be nearly impossible to cleanly merge back down into trunk. With my current Git clone of our primary repository, I have 23 branches (roughly 6 personal local branches, 5 old branches, and 12 active branches). Our primary Git repository has been online for about 6 months and currently has 68 branches in it, rougly 55 are active.

Why all the love for Git, but nobody every talks about Bazaar, Mercurial, Darcs, etc? Sure Git is faster, but unless you've got a enormous code base (like the linux kernel), it seems like Bazaar or Mercurial would be a better choice than Git.

One of the better known selling points of Git is that it's fast. My cloned repository of the primary Slide Git repository weighs in at a hefty 7.1GB. The latest revision number in our Subversion repository is in the 103,000 range, tacked onto that our tree is just over 2GB in size, and you've got a lot of history to keep track of. Git handles this without a sweat. despite hitting the disk extremely hard when switching to a very out of date branch. With this fix from Nico, the last of the mmap(2) allocation issues we were experiencing vanished as well.

Stop re-inventing the wheel!
One of the more interesting sentiments I noticed perusing the various comments made regarding my previous post were that we are "re-inventing the wheel" by writing scripts, hooks and other wrappers to use a product like Git. The notion that having scripts and hooks for something you use in daily development is re-inventing the wheel, or gratuituous strikes me as laughable at best. We're developers. We write scripts. Why didn't I ever write a myriad of scripts when I was an avid Subversion user? I did.. There's an enormous different between writing scripts to compensate for a poorly performing product, and writing scripts to further enhance you or your colleagues' workflows, Git's hook support falls into the latter category.

The "religion" aspect of the whole version control debate was never considered in our transition to Git, nor was the buzz. I'm far more interested in what makes other VCSes better or worse than Git, so that Git can be improved instead of a justification to ditch Git for yet-another-dvcs. I like to think of the various tools like version control that we developers use as something more relatable: work pants. A good pair of work pants should be flexible enough to allow you to get your work done, modest enough to stay out of the way and most importantly, a good pair of work pants should keep your junk safe ;)

I'm still happy to answer more specific questions about Git and how/why it works for us as well as it has, but I think most of the questions I've seen thus far have been answered above.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Git Protip: Stash the goods yo (git stash)

24 Nov 2008

slide git For about a month now I've been sending weekly "Protip" emails about Git to the rest of engineering here at Slide. I've been using them to slowly and casually introduce some of the more "interesting" features Git has to offer as we move away from Subversion entirely. Below is the first Protip I sent around, I'll be sure to send the rest in good time.

Given the nature of how Git is structured, in that your "working copy" is also your "repository" you might find yourself switching branches relatively often. While you can actually switch branches with uncommited changes outstanding in your Git directory, it's not advised (as you might forget you're commiting changes in the wrong branch, etc). You have two options if you are halfway through some work, one is to commit a checkpoint revision, but the other is to make use of the git stash command.

A scenario where this becomes especially useful would be: Bill is working in his local branch "wip-funtime" on replacing large swaths of unnecessary useless code and Ted accidentally pushes some of Bill's other changes from another branch live and things break. Bill could commit his code and write a fairly uninformative log message like "checkpoint" which cheapens the value of the revision history of his changes or Bill can use git stash to snapshot his currently working state and context switch. In this case Bill would execute the following commands:

git stash
git checkout master-media
perform hotfixes
git checkout wip-funtime
git stash pop

After performing the git stash pop command, Bill's Git repository will be in the exact same state, all his uncommitted changes in tact, as it was when he originally stashed and context-switched.

For specific usage of `git stash` refer to the git stash man page

Example usage of `git stash`

Stashing changes away




tyler@starfruit:~/source/git/main/bt> git stash


Saved working directory and index state "WIP on master-topfriends: 7b1ce9e... TOS copy fix"


(To restore them type "git stash apply")


HEAD is now at 7b1ce9e TOS copy fix


tyler@starfruit:~/source/git/main/bt>

Looking at the stash




tyler@starfruit:~/source/git/main/bt> git stash list


stash@{0}: WIP on master-topfriends: 7b1ce9e... TOS copy fix


stash@{1}: On master-topfriends: starfruit complete patchset


stash@{2}: On wip-classmethod: starfruit patches


tyler@starfruit:~/source/git/main/bt>

Grabbing the latest from the stash




tyler@starfruit:~/source/git/main/bt> git stash pop


Dropped refs/stash@{0} (94b9722b5a999c32c4361d795ee8f368d8412f9a)


tyler@starfruit:~/source/git/main/bt>

Grabbing a specific stash




tyler@starfruit:~/source/git/main/bt> git stash list


stash@{0}: WIP on master-topfriends: 7b1ce9e... TOS copy fix


stash@{1}: On master-topfriends: starfruit complete patchset


stash@{2}: On wip-classmethod: starfruit patches


tyler@starfruit:~/source/git/main/bt> git stash apply 2


# On branch master-topfriends


# Changed but not updated:


#   (use "git add ..." to update what will be committed)


#


#       modified:   db/dbroot.py


#       modified:   gogreen/coro.py


#       modified:   py/bin/_makepyrelease.py


#       modified:   py/initpkg.py


#       modified:   py/misc/_dist.py


#       modified:   py/misc/testing/test_initpkg.py


#       modified:   py/path/local/local.py


#       modified:   py/test/terminal/terminal.py


tyler@starfruit:~/source/git/main/bt>

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Git integration with Hudson and Trac.

23 Nov 2008

slide software development git As I mentioned in my previous post about Git at Slide, I wanted to answer some questions that we had to answer to migrate to Git for our development workflow. One of the major questions that had to be answered, especially for our QA department to sign off on the idea was:

How will Git integrate with Hudson, Trac and our other pieces of development infrastructure?

For us to use any version control system, centralized or decentralized, there had to be a "central" point for changes to integrate into in order for us to properly test releases and then ship them to the live site. With this requirement, we oriented our use of Git around a centralized repository which developers pull from, and push to on a regular basis.

In order for Git to integrate into Trac and Hudson, we opted for baking the functionality we needed into the post-receive hook on the centralized repository instead of relying on GitTrac, or the Hudson Git plugin to do what we needed it do to.

You can find the script below, or in this GitHub repository. The script requires the Trac XML-RPC plugin to be installed in order to properly annotate tickets when changes are pushed into the central repository. The notation syntaxes that the post-receive.py script supports in commit messages are:

re #12345
qa #12345
attn bbum,fspeirs

As one might expect, the first notation: "re #12345" will simply annotate a ticket with the commit message and the branch in which the commit was pushed into. The "qa #12345" notation part of an internal notation of marking tickets in Trac as "Ready for QA", which let's our QA engineers know when tickets are ready to be verified; a "qa" note in a commit message will reference the commit and change the status of the ticket in question. The final notation that the script supports: "attn bbum,fspeirs" is purely for calling attention to a code change, or to ask for a code review. When a commit is pushed to the central repository with "attn" in the commit message, an email with the commit message and diff will be emailed to the specified recipients.

In addition to updating Trac tickets, pushes into any branch that have a Hudson job affiliated will use the Hudson External API to queue a build for that branch. In effect, it you "git push origin master", the post-receive.py script will ping Hudson and ask it to queue a build of the "master" job.

I have included the script inline below for those weary of clicking links like this one to the GitHub repository containing the script. Enjoy :)

'''


Copyright (c) 2008 Slide, Inc





Permission is hereby granted, free of charge, to any person obtaining a copy


of this software and associated documentation files (the "Software"), to deal


in the Software without restriction, including without limitation the rights


to use, copy, modify, merge, publish, distribute, sublicense, and/or sell


copies of the Software, and to permit persons to whom the Software is


furnished to do so, subject to the following conditions:





The above copyright notice and this permission notice shall be included in


all copies or substantial portions of the Software.





THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR


IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,


FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE


AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER


LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,


OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN


THE SOFTWARE.


'''





'''


	For questions, patches, etc contact R. Tyler Ballance 


'''


import getpass


import os


import re


import socket


import smtplib


import sys


import time


import xmlrpclib





from optparse import OptionParser





MAIL_SERVER = 'your_mail_server.com'


MAIL_SUFFIX = '@mycompany.com'


BUILD_HUDSON = True


HUDSON_URL = 'http://hudson'


TRAC_XMLRPC_URL = 'URL_TO_TRAC/projects/MYPROJECT/login/xmlrpc'





def rpcProxy(user='qatracbot', password=None):


	password = password or os.getenv('TRAC_PASS')


	return xmlrpclib.ServerProxy('https://%s:%s@%s' % (user, password, TRAC_XMLRPC_URL))





def _send_commit_mail(user, address, subject, branch, commits, files, diff):


	print 'Sending a GITRECEIVE mail to %s' % address


	message = 'Commits pushed to %s:\n--------------------------------------\n\n%s\n--------------------------------------\n%s\n--------------------------------------\n%s' % (branch, commits, files, diff)


	_send_mail(user, address, subject, message)


def _send_attn_mail(user, destuser, diff):


	print 'Sending a "please review" mail to %s' % destuser


	message = '''Good day my most generous colleague! I would hold you in the highest esteem and toast you over my finest wines if you would kindly review this for me\n\n\t - %(user)s\n\nDiff:\n------------------------------------------------\n%(diff)s''' % {'diff' : diff, 'user' : user}


	addresses = []


	for d in destuser.split(','):


		addresses.append('%s%s' % (d, EMAIL_SUFFIX))


	_send_mail(user, addresses, 'Please review this change', message)





def _send_mail(user, address, subject, contents):


	try:


		if not isinstance(address, list):


			address = [address]


		s = smtplib.SMTP(MAIL_SERVER)


		message = 'From: %s%s\r\nTo: %s\r\nSubject: %s\r\n\r\n%s\n' % (user, MAIL_SUFFIX, ', '.join(address), subject, contents)


		s.sendmail('%s%s' % (user, MAIL_SUFFIX), address, message)


		s.quit()


	except:


		print 'Failed to send the email :('





def _update_ticket(ticket, message, options={}):


	rpc = rpcProxy()


	rpc.ticket.update(ticket, message, options)


	return rpc





def find_re(commit):


	return map(int, re.findall(r'(?i)\s+re\s*#([0-9]+)', commit))


def handle_re(branch, commit, ticket):


	print 'Annotating ticket #%s' % ticket


	message = '''The following was committed in "%(branch)s":


			\{\{\{	


%(commit)s \}\}\}


		''' % {'branch' : branch, 'commit' : commit}


	_update_ticket(ticket, message)





def find_qa(commit):


	return map(int, re.findall(r'(?i)\s+qa\s*#([0-9]+)', commit))


def handle_qa(branch, commit, ticket):


	print 'Marking ticket #%s as "ready for QA"' % ticket


	message = '''The following was committed in "%(branch)s":


			\{\{\{ 


%(commit)s \}\}\}


		''' % {'branch' : branch, 'commit' : commit}


	rpc = _update_ticket(ticket, message, options={'status' : 'qa'})





def find_attn(commit):


	return re.findall(r'(?i)\s+attn\s*([A-Za-z,]+)', commit)


def handle_attn(branch, commit, attn):


	# Unpack commit from this: "commit 5f4c31f3c31347c62d68ecb5f2c9afa3333f4ad0\nAuthor: R. Tyler Ballance \nDate: Wed Nov 12 16:57:32 2008 -0800 \n\n Merge commit 'git-svn' \n\n  \n \n"


	try:


		commit_hash = commit.split('\n')[0].split(' ')[1]


	except:


		return # fuk it


	diff = os.popen('git show --no-color %s --pretty=format:"Author: %%cn <%%ce>%%n%%s%%n%%n%%b%%n%%n%%H"' % commit_hash).read().rstrip()


	_send_attn_mail(getpass.getuser(), attn,  diff)





def mail_push(address, oldrev, newrev, refname):


	user = getpass.getuser()


	machine = socket.gethostname()


	base_git_diff = 'git diff %s %s' % (oldrev, newrev)


	files_diffed = os.popen('%s --name-status' % (base_git_diff)).read().rstrip()


	full_diff = os.popen('%s -p --no-color' % (base_git_diff)).read().rstrip()


	''' git rev-parse --not --branches | grep -v "$new" | git rev-list "$old".."$new" --stdin '''


	commits = os.popen('git rev-parse --not --branches | grep -v "%s" | git rev-list %s..%s --stdin --pretty=format:"Author: %%cn <%%ce>%%nDate: %%cd %%n%%n %%s %%n%%n %%b %%n %%n-------[post-receive marker]------%%n" --first-parent ' % (newrev, oldrev, newrev)).read().rstrip()


	branch = refname.split('/')[-1]


	mail_subject = 'GITRECEIVE [%s/%s] %s files changed' % (machine, branch, len(files_diffed.split('\n')))





	if branch == 'master-release':


		print 'Tagging release branch'


		tagname = 'livepush_%s' % (time.strftime('%Y%m%d%H%M%S', time.localtime()))


		sys.stderr.write('Creating a tag named: %s\n\n' % tagname)


		os.system('git tag %s' % tagname)


		mail_subject = '%s (tagged: %s)' % (mail_subject, tagname)





	if BUILD_HUDSON_JOB:


			print 'Queuing the Hudson job for "%s"' % branch


			os.system('/usr/bin/env wget -q -O /dev/null http://%s/job/%s/build' % (HUDSON_URL, branch))





	_send_commit_mail(user, address, mail_subject, branch, commits, files_diffed, full_diff)





	if branch == 'master':


		return # we don't want to update tickets and such for master/merges





	commits = filter(lambda c: len(c), commits.split('-------[post-receive marker]------'))


	commits.reverse()


	for c in commits:


		if c.find('Squashed commit') >= 0:


			continue # Skip bullshit squashed commit





		for attn in find_attn(c):


			handle_attn(branch, c, attn)





		for ticket in find_re(c):


			handle_re(branch, c, ticket)





		for ticket in find_qa(c):


			handle_qa(branch, c, ticket)








if __name__ == '__main__':


	op = OptionParser()


	op.add_option('-m', '--mail', dest='address', help='Email address to mail git push messages to')


	op.add_option('-o', '--oldrev', dest='oldrev', help='Old revision we\'re pushing from')


	op.add_option('-n', '--newrev', dest='newrev', help='New revision we\'re pushing to')


	op.add_option('-r','--ref', dest='ref', help='Refname that we\'re pushing')


	opts, args = op.parse_args()





	if not opts.address or not opts.oldrev or not opts.newrev or not opts.ref:


		print '*** You left out some needed parameters! ***'


		exit





	mail_push(opts.address, opts.oldrev, opts.newrev, opts.ref)

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Delightfully Wrong About Git

22 Nov 2008

slide software development git A very long time ago I mentioned on Twitter that I was looking at Git as a replacement for Subversion and Perforce with my personal projects, but lamented moving to Git at Slide would not be feasible

Like most disagreements I've had with people on technology in the past, immediately after I said it, I actively tried to prove myself wrong. Back in April when I made the statement above, Subversion 1.4 was "good enough" (just barely) for what we wanted to do as far as source control, but I became more and more curious about whether or not we could move to Git.

Back in April, after spending a week with projects like Tailor and git-svn(1) I started to look at the potential of moving just my team over to Git for evaluation purposes. By the end of May I had requested Git to be installed on the machines that we use for development on a day-to-day basis and we moved the team over to Git by the second week of June.

What followed were six months of sloshing uphill, some of the most notable milestones that we had to figure out in this time frame were:

Whereas in the Subversion architecture with a central repository there is a very clear development focal point for sharing code between developers, what is this in the Git workflow?
How do you ensure developers don't forget code was committed "in that one branch, in that one repository" and keep track of code
How will Git integrate with Hudson, Trac and our other pieces of development infrastructure? (answered here)

I'll be answering these questions and share some of the scripts, hooks, and documentation we've written internally to make moving to Git throughout the company a reality. I wish I could say I was responsible for it all, but there were a number of other engineers that were extremely important in defining best practices, and what this shiny new world without Subversion would look like.

At the end of the day, I'm pleased as punch with the transition. I don't hate Subversion, I just love Git; call me "spoiled" but I think we deserve something more than a system that strives to be "a better CVS".

Update: I've posted an addendum: Why we chose Git, a rebuttal

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Reliable Locks in Hudson

05 Nov 2008

slide hudson There has been some amount of discussion on the Hudson user's list recently about the status of the "Locks and Latches" plugin. The plugin allows for one to create "locks" for Jobs in a similar manner to how "locks" work in a multithreaded programming environment. The need for such a plugin becomes very clear once you start to run multiple jobs that depend on some set of shared resources, take the following example:

Jobs A,B,C must run unit tests that fetch data from a test site
Slave #1 can only run one instance of Apache at a time

How one would accomplish this with the Locks and Latches plugin would be to create a lock like "Site Lock" in the Hudson configuration, and then bind Jobs A, B, C to that Lock. Making the (large) assumption that the plugin works correctly and locks properly in order to prevent A and B from running concurrently, this would be enough to satisfy the requirements we have for the scenario above. Unfortunately it seems the plugin is largely unmaintained and buggy; in the past couple weeks of experimenting with such a set up on a variety of different slaves we've noticed that the locks aren't always respected, causing some locked jobs to execute in parallel spewing bad test results and build failures (the crux of this issue seems ot have been reported by Sergio Fernandes in #2450).

The Loopback Slave
The easiest way I found to work around the short-comings of the Locks and Latches plugin was to "break up" the Locks. Locks are only really useful if you have more than one "executor" on a Hudson node, in order to allow Hudson to execute jobs simultaneously. In essence, if you only have one executor, the Hudson queueing system will technically perform your "lock" for you by default. And thus the "loopback slave" was born! When explaining this to a co-worker, I likened my workaround to the fork(2) call, whereas the Locks and Latches plugin is much more of a pthread_mutex_lock(2) call. According to the "Distributed Builds" page on the Hudson wiki, you can start slave agent headlessly on any machine, so why not the master node?

Above is the configuration of one such "loopback slave" that took the place of one of the executors on the master node.

After setting up the loopback slave, it's just a matter of tying the Job to that node for building.

In short our set up was before: Jobs A, B, C all use the Lock "Site Job" in order to queue properly. With this change, now there is no lock, and Jobs A, B, C are all bound to the loopback slave in place of the lock on the master node. While certainly not ideal, given the frustrations of the Locks and Latches plugin going unmaintained this is the best short-term solution I've come up with thus far.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

I hate it when The Onion is factual.

09 Oct 2008

opinion miscellaneous I saw this referenced by an op-ed piece I read in a restaurant a few weeks back and had to share, as it's become more and more depressing.

This is a piece run by The Onion on January 17, 2001 shortly after George W. Bush came into office netitlted: Bush: 'Our Long National Nightmare Of Peace And Prosperity Is Finally Over

I'm going to go cry in a corner now.

Hudson Build Bookmarklet

04 Oct 2008

miscellaneous software development hudson During the usual friday-frenzy I sat down and wrote a quick 10 minute little bookmarklet to start a Hudson job. Unlike most bookmarklets that "do things" this one actually "does things" but doesn't take you away from your current page. Using the Hudson Remote Access API you can query information from Hudson programmatically, but you can also kick off builds remotely with nothing more than a simple HTTP request to the properly formed URL.

By dragging the link below to your bookmark bar, and updating the URL within ("http://hudson/") to the URL of your Hudson instance, you can queue a Hudson build from any page at any time (without leaving the page).

The Bookmarklet

Build Hudson Job

The Code

Build Hudson Job

How it actually works

After talking the concept of making cross-domain HTTP requests over with Sergio, he suggested just using an "IMG" tag (or "IFRAME") to accomplish the task. The bookmarklet doesn't actually have to send any form parameters or receive any data, Hudson just needs to receive any HTTP request to the right URL. By creating the IMG object in JavaScript, and appending it to the body of the current page the user is on, it'll effectively con the browser into making the HTTP request without needing to pull off any XmlHttpRequest hacks. One of the more interesting things that we found out when playing with the end of the bookmarklet, was that if we returned "false" or tried to wrap the whole thing in a closure, the link would still execute and the browser would change pages. However, if we stuck an "alert()" call into the tail end of the bookmarklet JavaScript, execution would stop and the link wouldn't change the page in the browser (tested this in Firefox 3).

Happy Hudsoning :)

Git back into Subversion, Mostly Automagically (Part 3/3)

01 Oct 2008

slide software development git Thus far I've covered most of the issues and hurdles we've addressed while experimenting with Git at Slide in parts 1 and 2 of this series, the one thing I've not covered that is very important to address is how to work in the "hybrid" environment we currently have at Slide, where as one team works with Git and the rest of the company works in Subversion. Our setup involves having a "Git-to-Subverison" proxy repository such that everything to the "left" of that proxy repository is entirely Subversion without exceptions and everything to the "right" of that repository is entirely Git, without exceptions. Part of my original motivation for putting this post at the end of the series was that, when I originally wrote the first post on "Experimenting with Git at Slide" I actually didn't have this part of the process figured out. That is to say, I was bringing code back and forth between Git and Subversion courtesy of git-svn(1) and some gnarly manual processes.

No Habla Branching

The primary issue when bringing changesets from Git to Subversion is based in the major differences between how the two handle branching and changesets to begin with. In theory, projects like Tailor were created to help solve this issue by first translating both the source and destination repositories into an intermediary changeset format in order to cross-apply changes from one end to the other. Unfortunately after I spent a couple days battling with Tailor, I couldn't get it to properly handle some of the revisions in Slide's three year history.

If you've ever used git-svn(1) you might be familiar with the git-svn dcommit command, which will work for some percentage of users that want to maintain dual repositories between Git and Subversion, things break down however once you introduce branching into the mix.

Up until Subversion 1.5, Subversion had no concept of "merge tracking" (even in 1.5, it requires the server and client to be 1.5, it also makes nasty use of svn props). Without the general support for "merge tracking" the concept of a changeset sourcing from a particular branch or the concept of a "merge commit" are entirely foreign in the land of Subversion. In less mumbo jumbo, this effectively means that the "revisions" that you would want to bring from Git into Subversion need to be "flattened" when being "dcommitted" into Subversion's trunk. Git supports a means of flattening revision history when merging and pulling by way of the "--squash" command line argument, so this flattening for git-svn is possible.

Giant Disclaimer

What I'm about to write I dutifully accept as Git-heresy, a nasty hack and not something I'm proud of.

Flattening into Subversion

First the icky bash script that supports properly flattening revisions into the "master" branch in the git-svn repository and dcommits the results:

#!/bin/bash





MERGE_BRANCH=mergemaster


REPO=$1


BRANCH=$2





if [[ -z "${1}" || -z "${2}" ]]; then


	echo "===> You must provide a \"remote\" and a \"refspec\" for Git to use!"


	echo "===> Exiting :("


	exit 1;


fi





LATEST_COMMIT=`git log --max-count=1 --no-merges --pretty=format:"%H"`





function master


{


	echo "==> Making sure we're on 'master'"


	git checkout master


}





function setup_mergemaster 


{


	master


	echo "==> Killing the old mergemaster branch"


	git branch -D $MERGE_BRANCH





	echo "==> Creating a new mergemaster branch"


	git checkout -b $MERGE_BRANCH


	git checkout master


}





function cleanup


{


	rm -f .git/SVNPULL_MSG


}





function prepare_message


{


	master





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	cp .git/SQUASH_MSG .git/SVNPULL_MSG





	master


}





function merge_to_svn


{


	git reset --hard ${LATEST_COMMIT}


	master


	setup_mergemaster





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	echo "==> Committing..."


	git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase





	cleanup


}





setup_mergemaster





prepare_message





merge_to_svn





master





echo "===> All done!"

Gross isn't it? There were some interesting things I learned when experimenting with this script, but first I'll explain how the script is used. As I mentioned above there is the "proxy repository", this script operates on the git-svn driven proxy repository, meaning this script is only invoked when code needs to be propogated from Git-to-Subversion as opposed to Subversion-to-Git which git-svn properly supports by default in all cases. Since this is a proxy repository, that means all the "real" code and goings-on occur in the "primary" Subversion, and "primary" Git repositories, so the code is going along this path: Primary_SVN <-> [proxy] <-> Primary_Git
This setup means when we "pull" (or merge) from Primary_Git/master we are going to be flattening at that point in order to properly merge it into the Primary_SVN. Without further ado, here's the breakdown on the pieces of the script:

function setup_mergemaster 


{


	master


	echo "==> Killing the old mergemaster branch"


	git branch -D $MERGE_BRANCH





	echo "==> Creating a new mergemaster branch"


	git checkout -b $MERGE_BRANCH


	git checkout master


}

What the setup_mergemaster branch is responsible for is deleting any prior branches that have been used for merging into the proxy repository and Primary_SVN. It gives us a "mergemaster" branch in the git-svn repository that is effectively at the same chronological point in time as the master branch before any merging occurs.

function prepare_message


{


	master





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	cp .git/SQUASH_MSG .git/SVNPULL_MSG





	master


}

The prepare_message function is part of the nastiest code in the entire script, in order to get an accurate "squashed commit" commit message when the changesets are pushed into Primary_SVN, we have to generate the commit message separately from the actual merging. Since this function is performing a `git pull` from "master" into "mergemaster" the changesets that are being pulled are going to be the only ones that show up (for reasons I'm about to explain).

function merge_to_svn


{


	git reset --hard ${LATEST_COMMIT}


	master


	setup_mergemaster





	echo "===> Pulling from ${REPO}:${BRANCH}"


	git pull ${REPO} ${BRANCH}


	git checkout ${MERGE_BRANCH}





	echo "==> Merging across change from master to ${MERGE_BRANCH}"


	git pull --no-commit --squash . master





	echo "==> Committing..."


	git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase





	cleanup


}

If you noticed above in the full script block, the "LATEST_COMMIT" code, here's where it's used, it is one of the most important pieces of the entire script. Basically the LATEST_COMMIT piece of script grabs the latest non-merge-commit hash from the `git log` output and saves it for later use (here) where it's used to rollback the proxy repository to the point in time just before the last merge commit. This is done to avoid issues with git-svn(1) not understanding how to handle merge commits whatsoever. After rolling back the proxy repository, a new "mergemaster" branch is created. After the mergemaster branch is created, the actual Primary_Git changesets that differ between the proxy repository and Primary_Git are pulled into the proxy repository's master branch, and sqaushed into the mergemaster branch where they are subsequently committed with the commit message that was prepared before. The "prepare_message" part of the script becomes important at that step because the "squashed commit" message that Git generates at this point in time will effectively contain every commit that has ever been proxied across in this manner ever.

After the "merge_to_svn" function has been run the "transaction" is entirely completed and the changesets that once differed between Primary_SVN/trunk and Primary_Git/master are now normalized.

Mostly Automagically

In the near future I intend on incorporating this script into the post-receive hook on Primary_Git in such a way that will truly propogate changesets automatically from Primary_Git into Primary_SVN, but currently I'm utilizing one of my new favorite "hammers', Hudson (see: One-line Automated Testing). Currently there are two jobs set up for proxying changesets across, the first "Subversion-to-Git" simply polls Subversion for changes and executes a series of commands when changes come in: git-svn fetch && git merge git-svn && git push $Primary_Git master. This is fairly straight-forward and fits in line with what git-svn(1) is intended to do. The other job that I created is "Git-to-Subversion" which must be manually invoked by a user, but still will automatically take care of squashing commits into Primary_SVN/trunk (i.e. bash svnproxy.sh $Primary_Git master).

Wrap-up

Admittedly, this sort of setup leaves a lot to be desired. In the ideal world, Tailor would have coped with both our Git and our Subversion repositories in such a way that would have made this script nothing more than a silly idea I had on a bus. Unfortunately that wasn't case and the time budget I had for figuring out a way to force Tailor to work was about 47.5 hours less than it took me to sit down and write the script above. I'd be interested to see other solutions other organizations are utilizing to migrate from one system to the other, but at the time of this writing I can't honestly say I've heard much about people dealing with the "hybrid" scenario that we have currently at Slide.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Team Development with Git (Part 2/3)

29 Sep 2008

slide software development git In my last post on Git, Experimenting with Git at Slide, I discussed most of the technical hurdles that stood in our way with evaluating Git for a Subversion tree that has 90k+ revisions and over 2GB of data held within. As I've learned from any project that involves more than just myself, technology is only half the battle, the other half is the human element. One of the most difficult things to "migrate" when switching to something as critical to a developer's workflow as a VCS, is habits, good ones and bad ones.

The Bad Habits

When moving my team over to Git, I was able to identify some habits that I view as "bad" that could either be blamed on how we have used Subversion here at Slide, or the development workflow that Subversion encourages. For the sake of avoiding flamewars, I'll say it's 51% us, 49% the system.

The Occasional Committer

Chances are that if you're working on "something super important!" you fall into this bad habit. Because of the nature of trunk in Subversion, if you commit half-finished work into a team-branch or trunk itself, you could cause plenty of pain for your fellow developers. As a result, you tend to commit at the end of a long day working on something, or only after something has been completed. The 9 hours of sweat and frustration you've spent pounding away on 300 lines of code is now summed up in one commit message:

Turns out there was a race-condition here, re #52516

Now three months from now when you return to the same 300 lines of code and try to figure out what the hell led to this mess, you're left with the commit message above, and nothing more.

The Less-than-attentive Developer

I've worked on a Mac for the majority of my time at Slide, as do most of my compatriots, and sooner or later one of two things will happen:svn add some/directory/ and/or svn commit. This usually results in a second commit to come into the tree with a commit message like:

Whoops, accidentally checked in resource forks

This isn't that large of a problem, except for the implication of the second command there, svn commit will commit all outstanding changes in your working copy starting in the current working directory, and recursing through children directories. I'm probably more anal-retentive about my commits than most, but I usually do a diff before I commit to make sure I'm aware of what I'm about to commit, but I've seen plenty of developers skip this step.

The Over-Confident Merger

I've fallen into this trap numerous times when merging "old" branches back into trunk, especially with binary files that may have been changed in trunk, or in my branch (hell if I know!). One thing I can speak to anecdotally from our work at Slide, is that the probability of nonsensical conflicts rises with a branch's age. The rate of our repository progresses at about 50 commits to trunk per day (~150 commits across the board), if there is a branch cut from trunk, usually within two weeks it can become extremely difficult to merge back into trunk without constant "refreshes" or merges from trunk into the branch.

If you're not careful when folding that branch back down into trunk, you can inadvertantly revert old binary files or even text files to previous states which will usually cause other individuals in the engineering organization gripe at you and your QA department to pelt you with rocks. For bonus points, you could (as I have done before) accidentally commit conflicting files earning a gold star and a dunce hat for the day. This merging pain led me to originally write my merge-safe.py script so long ago.

The Slide Way to Git

Fortunately for us, I think the decentralized nature of Git has helped us enforce some best practices when it comes to the bad habits above. "The Occassional Committer" is all but done away with thanks to the ability to atomically commit and revert revisions at a whim and have those changes not propogated to other developers until there has been an explicit push or pull.

Unfortunately however, "The Less-than-attentive Developer" isn't solved so easily. To date I've sat next to two engineers that were new to Git, and watched them both execute the same fateful command: git add .
Not realizing their mistake, they accidentally commit a truckload of build and temporary files (.so, .swp, .pyc, etc) interspersed with their regular work that they meant to commit. Git cannot prevent a developer from shooting themselves in the foot, but it does prevent them from shooting everybody else in the foot along with it (unless they commit, and then push their changes upwards).

"The Over-confident Merger" grows more and more confident in the Git-based workflow. Since Git handles changesets atomically, it becomes trivial to merge branch histories together or cherry-pick one revision and apply to an entirely separate branch. I've not yet seen a Git conflict that wasn't a true conflict insofar that it was quite literally one line of code changing in two different ways between branch histories. As an aside, when using git-svn, be prepared for all the merging "fun" that Subversion has to offer when propogating changes between the two systems.

Basic Team Workflow

The manner in which we use Git is more like a centralized-decentralized version control system. We still have a "master" repository, which provides a central synchronization point when pushing stage servers, or when bringing code into Subversion to be pushed to live servers. For any particular project one of the developers will create a branch that will serve as the primary project branch, take the "superpoke-proj" branch as an example. That developer will push this branch to "origin" (or the master repository) such that other developers can "track" that branch and contribute code. For the purposes of this example, let's say Paul and Peter are working in "superpoke-proj", while Paul is working he will incrementally commit his work, but once he has resolved/fixed a ticket, he will perform a git push and then mark the ticket appropriately such that a QA engineer can verify the fix. If Paul and Peter are working on something that "breaks the build" but they need to collaborate on it together, Paul can perform a git pull from Peter and vice versa, and again, once they're done those changes will be pushed to origin. This model allows for developers to work in relative isolation so they're not inadvertantly stepping on each others' toes, but also close enough that they can collaborate in explicit terms, i.e. when they are ready for changes to be propogated to each other or the rest of the team.

Conclusion

Our workflow, like most things at companies under 500 employees is still a "work in progress™". I think we've found the right balance thus far for the team between freedom and process the allow for sufficient mucking around in the codebase in a way that provides the most amount of time actually writing code with as little possible time spent dealing with the overhead of anything else (merging, etc). There's nothing inherently special in the way we use Git, but we've found that it works for the way we work, which is to say in a very tight release schedule that's requires multiple branches per week and plenty of merging from branch-to-branch whether it be from another team or another part of the same team.

Of course, your mileage may vary.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Facebook be riddled with swashbucklers!

19 Sep 2008

facebook I've seen a lot of user-feedback about how confusing and "boring" the new Facebook redesign is, but I'm glad to know they are still having fun down there in Palo Alto, even if it's with subtle changes to their site (click to zoom)

To enable the pirate localization, find the language combo box at the bottom-left portion of the Facebook homepage.

Stay classy Facebook.

Hudson notifications with libnotify

18 Sep 2008

miscellaneous software development linux hudson I've been using a Gnome-based desktop for about the past 8-9 months and one of the things I've come to really appreciate is that most Gnome applications integrate with "libnotify". Libnotify is a simple Windows taskbar-like notification system that presents status messages at the bottom of your screen. Like all great pieces of software, it has a solid Python interface which allows for incorporating it in those little 10-minutes scripts I find myself writing every now and again.

One of the things I wanted to script was the notification of the build status of the numerous jobs that we're running in our Hudson instance here at Slide. Using the Universal Feed Parser and pynotify (listed under "notify-python"), I had a good little Gnome Hudson Notifier running in less than 10 minutes.

Source code after the jump.




import feedparser


import pynotify


import time





BASE_TITLE = 'Hudson Update!'





def success(job):


        n = pynotify.Notification(BASE_TITLE, 


                                        '"%s" successfully built :)' % job,


                                        'file:///usr/share/pixmaps/gnome-suse.png')


        n.set_urgency(pynotify.URGENCY_LOW)


        return n





def unstable(job):


        return pynotify.Notification(BASE_TITLE, 


                                        '"%s" is unstable :-/' % job,


                                        'file:///usr/share/pixmaps/gnome-suse.png')





def failure(job):


        n = pynotify.Notification(BASE_TITLE, 


                                        '"%s" failed!' % job,


                                        'file:///usr/share/pixmaps/gnome-suse.png')


        n.set_urgency(pynotify.URGENCY_CRITICAL)


        return n





def main():


        pynotify.init('Hudson Notify')


        old_items = []


        while True:


                feed = feedparser.parse('http://hudson/rssLatest')


                items = [t['title'] for t in feed['entries']]


                new_items = list(set(old_items).difference(items))





                for i in new_items:


                        i = i.split(' ')


                        job, build, status = (i[0], i[1], i[2])


                        status = status.replace('(', '').replace(')','')


                        if status == 'SUCCESS':


                                success(job).show()


                        elif status == 'UNSTABLE':


                                unstable(job).show()


                        elif status == 'FAILURE':


                                failure(job).show()





                old_items = items


                time.sleep(60)





if __name__ == '__main__':


        main()

It's pretty basic right now, but does everything I really wanted it to do. I may add it into a public Git repository in the near future if I spend any more time on the project. Hope you like it :)

Chase Auto-Finance FAIL.

16 Sep 2008

opinion I've been getting voice-mails from Chase Auto-Finance recently bugging me to pay them some money (turns out they're strapped for cash lately, something silly about irresponsible lending).

All is well and good, I normally call Chase up once a month and navigate through increasingly painful phone menus and give Chase some of my money. As luck would have it, sometime between my last payment, and my current payment, Chase decided that you should really talk to a representative to make a payment. In effect, I have to talk to some poor soul working in a shitty 9-5 call center job to pay a car payment that I've paid for the past two years via an automated system. Hooray progress.

Back to the voice mails, each one I receive I normally receive when I am at work, turns out I am receiving the voice mails because I'm too busy working to answer the phone. Unfortunately the voice mails always contain some poor soul working in a shitty 9-5 call center job asking me to call a Chase representative back to resolve my outstanding payment issue.

Why is my bank making it so damned hard to give them money?

In the future I intend on staying with my other bank for my loans since not only do they have reasonable customer service representatives, but they make it incredibly easy to give them money.

Lazily loading attributes.

15 Sep 2008

mono miscellaneous javascript I found myself talking to Jason today about the virtues of getattr(), setattr(), and hasattr() in Python and "abusing" the dynamic nature of the language which reminded me of some lazy-loading code I wrote a while back. In February I found the need to have portions of the logic behind one of our web applications fetch data once per-request. The nature of the web applications we're building on top of the MySpace, Hi5 and Facebook platforms require some level of network data-access (traditionally via REST-like APIs). This breaks our data access model into the following tiers: Dia FTW

Working with network-centric data resources is difficult in any scenario (desktop, mobile, web) but the particularly difficult thing about network data access in the mod_python-driven request model is that it will be synchronous (mod_python doesn't support "asynchronous pages" like ASP.NET does). This means every REST call to Facebook, for example, is going to block execution of the request handler until the REST request to Facebook's API tier completes.




 def request_handler(self, *args, **kwargs):


    fb_uid = kwargs.get('fb_sig_user')


    print "Fetching the name for %s" % fb_uid


    print time.time()


    name = facebook.users.getInfo(uid=fb_uid) 


    ### WAIT-WAIT-WAIT-WAIT-WAIT


    print time.time()      


    ### Continue generating the page...

There is also a network hit (albeit minor) for accessing cached data or data stored in databases. The general idea is that we'll need to have some level of data resident in memory through-out a request that can differ widely from request-to-request.

Lazy loading in Python

To help avoid unnecessary database access or network access I wrote a bit of class-sugar to make this a bit easier and more fail-proof:




class LazyProgrammer(object):


    ''' 


        LazyProgrammer allows for lazily-loaded attributes on the subclasses 


        of this object. In order to enable lazily-loaded attributes define


        "_X_attr_init()" for the attribute "obj.X"


    '''


    def __getattr__(self, name):


        rc = object.__getattribute__(self, '_%s_attr_init')()


        setattr(self, name, rc) 


        return rc

This makes developing with network-centric web applications a bit easier, for example, if I have a "friends" lazily-loading attribute off the base "FacebookRequest" class, all developers writing code subclassing FacebookRequest can simply refer to self.friends and feel confident they aren't incurring unnecessary bandwidth hits, and the friends-list fetching code is located in once spot. If one-per-request starts to become too resource intensive as well, it'd be trivial to override the _friends_attr_init() method to hit a caching server instead of the REST servers first, without needing to change any code "downstream."

Lazy loading in C#

Since C# is not a dynamically-typed language like Python or JavaScript, you can't implement lazily-loaded attributes in the same fashion (calling something like setattr()) but you can "abuse" properties in a manner similar to the C# singleton pattern, to get the desired effect:




using System;


using System.Collections.Generic;





public class LazySharp


{


	#region "Lazy Members"


	private Dictionary _names = null;


	#endregion





	#region "Lazy Properties"


	public Dictionary Names


	{


		get {


			if (this._names == null)


				this._names = this.SomeExpensiveCall();


			return this._names;


		}


	}


	#endregion


}

Admittedly I don't find myself writing Facebook/MySpace/Hi5 applications these days on top of ASP.NET so I cannot say I actually use the class above in production, but conceptually it makes sense.

Lazy loading attributes I find useful in the more hodge-podge situations, where code and feature-sets have both grown organically over time, they're not for everybody but I figured I'd share anyways.

Resurgange of the shell.

11 Sep 2008

mono javascript Two things happened in such short proximity time-wise that I can't help but thing they're somehow related to the larger shift to interpreters. Earlier this week Miguel introduced csharp shell which forced me to dust off my shoddy Mono 1.9 build and rebuild Mono from Subversion just because this is too interesting to pass up on.

One of my favorite aspects of using IronPyhton, or Python for that matter is the interpreter which allows for prototyping that doesn't involve creating little test apps that I have to build to prove a point. For example, I can work through fetching a web page in the csharp shell really easily, instead of creating a silly little application, compiling, fixing errors, and recompiling:




tyler@pineapple:~/source/mono-project/mono> csharp


Mono C# Shell, type "help;" for help





Enter statements below.


csharp> using System;


csharp> Console.WriteLine("This changes everything.");


This changes everything.


csharp> String url = "http://tycho.usno.navy.mil/cgi-bin/timer.pl";


csharp> using System.Web;


csharp> using System.Net;


csharp> using System.IO;


csharp> using System.Text;


csharp> HttpWebRequest req = HttpWebRequest.Create(url);


(1,17): error CS0266: Cannot implicitly convert type `System.Net.WebRequest' to `System.Net.HttpWebRequest'. An explicit conversion exists (are you missing a cast?)


csharp> HttpWebRequest req = HttpWebRequest.Create(url) as HttpWebRequest;


csharp> HttpWebResponse response = req.GetResponse() as HttpWebResponse;


csharp> StreamReader reader = new StreamReader(req.GetResponseStream() as Stream, Encoding.UTF8);


(1,45): error CS1061: Type `System.Net.HttpWebRequest' does not contain a definition for `GetResponseStream' and no extension method `GetResponseStream' of type `System.Net.HttpWebRequest' could be found (are you missing a using directive or an assembly reference?)


csharp> StreamReader reader = new StreamReader(response.GetResponseStream() as Stream, Encoding.UTF8);


csharp> String result = reader.ReadToEnd();


csharp> Console.WriteLine(result);











What time is it?


 US Naval Observatory Master Clock Time
 



Sep. 11, 07:29:02 UTC



Sep. 11, 03:29:02 AM EDT



Sep. 11, 02:29:02 AM CDT



Sep. 11, 01:29:02 AM MDT



Sep. 11, 12:29:02 AM PDT



Sep. 10, 11:29:02 PM AKDT



Sep. 10, 09:29:02 PM HAST

Time Service Department, US Naval Observatory csharp> reader.Close(); csharp> response.Close(); csharp>

I really think Miguel and Co. have adding something infinitely more useful in this Hackweek project than anything I've seen come out of recent hackweeks at Novell. The only feature request that I'd add along to the csharp shell would be "recording", i.e.:
tyler@pineapple:~/source/mono-project/mono> csharp Mono C# Shell, type "help;" for help Enter statements below. csharp> Shell.record("public void Main(string[] args)"); recording... csharp> using System; csharp> Console.WriteLien("I prototyped this in csharp shell!"); (1,10): error CS0117: `System.Console' does not contain a definition for `WriteLien' /home/tyler/basket/lib/mono/2.0/mscorlib.dll (Location of the symbol related to previous error) csharp> Console.WriteLine("I prototyped this in csharp shell!"); csharp> Shell.save_record("Hello.cs"); recording saved to "Hello.cs"Which could conceptually generate the following file:using System; public class Hello { public void Main(string[] args) { Console.WriteLine("I prototyped this in csharp shell!"); } }

JavaScript Shell

In addition to the C# shell, I've been playing with v8, the JavaScript engine that powers Google Chrome. The V8 engine is capable of being embedded easily, or running standalone, one of the examples they ship with is a JavaScript shell. I've created a little wrapper script to give me the ability to load jQuery into the V8 shell to prototype jQuery code without requiring a browser to be up and running:




tyler@pineapple:~/source/v8> ./shell


V8 version 0.3.0


> load("window-compat.js");


> load("jquery.js");


> $ = window.$


function (selector,context){return new jQuery.fn.init(selector,context);}


> x = [1, 5, 6, 12, 42];


1,5,6,12,42


> $.each(x, function(index) { print("x[" + index + "] = " + this); });


x[0] = 1


x[1] = 5


x[2] = 6


x[3] = 12


x[4] = 42


1,5,6,12,42


>

The contents of "window-compat.js" being:




/*


 * Providing stub "window" objects for jQuery


 */





if (typeof(window) == 'undefined') {


        window = new Object();


        document = window;


        self = window;





        navigator = new Object();


        navigator.userAgent = navigator.userAgent || 'Chrome v8 Shell';





        location = new Object();


        location.href = 'file:///dev/null';


        location.protocol = 'file:';


        location.host = ''; 


};

In general I don't really have anything insightful or especially interesting to add, but I wanted to put out my "+1" in support of both of these projects. Making any language or API more easily accessible through these shells/interpreters can really help developers double-check syntax, expected API behavior etc. Thanks Novell/Google, interpreters rock!

Don Quixote's new side-kick, Hudson

06 Sep 2008

slide software development hudson I recently wrote about "one-line automated testing" by way of Hudson, a Java-based tool that helps to automate building and test processes (akin to Cruise Control and Buildbot). If you were to read this blog regularly, you'd be well aware that I work primarily with Python these days, at a web company no less! What does a web company need with a continuous integration tool? Especially if they're not using a compiled language like Java or C# (heresy!).

As any engineering organization grows, it's bound to happen that you reach a critical mass of developers and either need to hire an equitable critical mass of QA engineers, or start to approach quality assurance from all sides. That is to say, automated unit testing and automated integration testing becomes a requirement for growing both as a engineering organization but as a web application provider (user's don't like broken web applications). With web products like Top Friends, SuperPoke! and Slide FunSpace we have a large amount of ever-changing code, that has been in a constant state of flux for the past 16-18 months. We can accomodate for ever-changing code on the backend for the past year and half with PyUnit and development discipline.

How do you deal with months of ever changing code for the aforementinoned products' front-ends? Your options are pretty slim, you can hire a legion of black-box QA engineers to manually go through regression tests and ensure your products are in tip-top shape, or you can hire a few talented black-box QA engineers to conscript a legion of robots to go through regression tests and ensure your products are in tip-top shape. Enter Windmill. Windmill is a web browser testing framework not entirely unlike Selenium or Watir with two major exceptions: Windmill is written in Python and Windmill has a great recorder (and lots of other features). One of my colleagues at Slide, Adam Christian has been working tirelessly to push Windmill further and prepare it for enterprise adoption, the first enterprise to use it, Slide.

Adam and I have been working on bringing the two ends of the testing world together with Hudson. About half of the jobs currently running inside of our Hudson installation are running PyUnit tests on various Subversion and Git branches. The other half of the jobs are running Windmill tests, and reporting back into Hudson by way of Adam's JUnit-compatible reporting code. Thanks to the innate flexibility of PyUnit and Windmill's reporting infrastructure we were able to tie all these loose ends together with a tool like Hudson that will handle Jabber-notifications or email notifications when test-runs fail and include details in it's reports.

We're still working out the kinks in the system, but to date this set up has helped us fix at least one critical issue a week (with a numerous other minor issues) since we've launched the Hudson system, more often than not before said issues reach the live site and real users. If you've got questions about Windmill or Hudson you can stop by the #windmill or the #hudson channels on Freenode.

Automated testing is like a really good blend of coffee, until you have it, you think "bah! I don't need that!" but after you start with it you can't help but wonder how you could tolerate the swill you used to drink.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

One-line Automated Testing

20 Aug 2008

slide opinion software development hudson For about as long as my development team has been a number larger than one, I've been on a relatively steady "unit test" kick. With the product I've worked on for over a year gaining more than one cook in the kitchen, it became time to start both writing tests to prevent basic regressions (and save our QA team tedious hours of blackbox testing), but also to automate those tests in order to quickly spot issues.

While I've been on this pretty steadily lately, I'm proud to say that automated testing was one of my first pet projects at Slide. If you ever crack into the Slide corporate network you can find my workstation under the name "ccnet" which is short for Cruise Control.NET, my first failed attempt at getting automated testing going on our now defunct Windows desktop client. As our development focus shifted away from desktop applications to social applications the ability to reliably test those systems plummeted; accordingly our test suite for these applications became paltry at best. As the organization started to scale, this simply could not stand much longer else we might not be able to efficiently push stable releases on a near-nightly schedule. As we've started to back-fill tests (test-after development?) the need to automate these tests has arisen to which I started digging aronud for something less painful to deal with than Cruise Control, enter Hudson.

Holy Hudson Batman!

I was absolutely astounded that I, nor anybody I knew, was aware of the Hudson project. Hudson is absolutely amazing as far as continuous integration systems go. The only major caveat is that the entire system is written in Java, meaning I had to beg one of our sysadmins to install Java 1.5 on the unit test machine. Once that was sorted out, starting the Hudson instance up was incredibly simple:
java -jar hudson.war
In our case the following to keep the JVM within manageable virtual memory limits:
java -Xmx128m -jar hudson.war --httpPort=8888

Once the Hudson instance was up and rnuning, I simply had to browse to http://unittestbox:8888/ and the entire rest of the configuration was set up from the web UI. Muy easy. Muy bueno.

Plug-it-in, plug-it-in!

One of the most wonderful aspects of Hudson is it's extensible plugin architecture. Adding plugins like "Git", "Trac" and "Jabber" means that our Hudson instance is now properly linking to Trac revisions, sending out Jabber notifications on "build" (read: test run) failures and monitoring both Subversion and Git branches for changes. From what I've seen from their plugin architecture, it would be absolutely trivial to extend Hudson with Slide-specific plugins as the needs arise.

With the integration of the PyUnit XMLTestRunner (found here) and working an XML output plugin into Windmill we can easily automate testing of both our back-end code and our front-end.

Hudson in action

And all with one simple java command :)

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Let's Swap iPods.

30 Jul 2008

miscellaneous media Since I've started to spend such an enormous amount of my time with work and settling into a new apartment, I've had literally no time to discover new music. Because of this utter lack of time on my part, I've been pondering this idea for about the past month or two on a daily basis, I want to participate in an iPod Foreign Exchange Program.

I currently own a 30GB Video iPod (black) that has about 28GB of music on it with a few assorted podcasts here and there.

Here's what I'm thinking would constitute a good set of rules for swapping an iPod to "walk a mile in somebody's shoes" (musically).

We can be acquaintances, but not friends. I know what my friends listen to and can steal their iPods myself :)
The period to swap iPods would last one week
Both parties would make sure to un-sync their address book and calendars from the iPod, but not change any of the music (no trying to impress people)
The iPod swap is accompanied with a business card or means to coordinate a swap-back
Both parties must be respectful of the others' tastes, even if it's really weird (you know who you are)

I went ahead and removed my calendars and contacts from my iPod just in case I run into somebody on the train that has read this post and wants to swap right away, but failing that, if you're around San Francisco, let's swap iPods :)

Experimenting with Git at Slide (Part 1/3)

27 Jul 2008

slide opinion software development git For the past two months I've been experimenting with varying levels of success with Git inside of Slide, Inc.. Currently Slide makes use of Subversion and relies heavily on branches in Subversion for everything from project specific branches to release branches (branches that can live anywhere from under 12 hours to three weeks). There are plenty of other blog posts about the pitfalls of branching in Subversion that I won't go into here, suffice to say, it is...sub-par. Below is a rough diagram of our general current workflow with Subversion (I've had some other developers ask me "why don't you just work in trunk?" to which I usually wax poetic about the chaos of trunk when any project gets over 5 active developers (Slide engineering is somewhere between 30-50 engineers)).

There's always a catch
Subversion at Slide
Toying with Git
Git at Slide

There's always a catch

There are three major problems we've run up against with utilizing Subversion as our version control system at Slide:

Subversion's "branches" make context switching difficult
Depending on the age of a branch cut from trunk/, merges and maintainence is between difficult and impossible
Merging Subversion branches into each other causes a near total loss of revision history

Given that branches are a critical part of Slide's development process, we've historically looked at branch-strong version control systems as alternatives, such as Perforce. Before I joined Slide in April of 2007, I was a heavy user of Perforce for my own consulting projects as well as for some of my work with the FreeBSD project as part of the Summer of Code program. In fact, my boss sent out a "Perforce Petition" to our engineering list on my third day at Slide...we still haven't switched away from Perforce.

Up until earlier this year I hadn't given it a second thought until the team I was working with grew and grew such that between me and four other engineers we were pushing a release anywhere from once to three times a week. That meant we were creating a Subversion "branch" multiple times a week, and a significant part of my daily routine became merging to our release branch and refreshing project branches from trunk/. All of a sudden Git was looking prettier and prettier, despite some of its warts. At this point in time I was already using Git for some of my personal projects that I never have time for, so I knew at the bare minimum that it was functional. What I didn't know was how to deploy and use it with a large engineering team that works in very high churn short iterations, like Slide's.

Subversion at Slide

Moving our source tree over into a system other than Subversion, from Subverison, was destined to be painful. The tree at Slide is deceptively large, we have a substantial amount of Python running around (as Slide is built, top-to-bottom, in Python) and an incredible amount of Adobe Flash assets (.swf files), Adobe Illustrator assets (.ai files) and plenty of binary files, like images (.png/gif/jpeg). Currently a full checkout of trunk/ is roughly 2.5GB including artwork, flash, server and web application code. We also have roughly 88k revisions in Subversion, the summation of three years of the company's existence. Fortunately somebody along the line wrote a script (in Perl however) called "git-svn(1)" that is designed to do exactly what I needed, move a giant tree from Subversion to Git, from start to finish (similar to svn2p4 in Perforce parlance).

Toying with Git

When I first ran the command `git-svn init $SVN` I let the the command run for somewhere close to 6-7 hours before completing, I was shocked at the size of the generated repository. If our Git repository were to be left unpacked .git/ alone would be close to 9GB, adding the actual code on top of it, ~11GB. I decided that maybe packing this repository would be a good idea so I ran `git gc` and went to grab a coke from the fridge ... and the machine ran out of memory. One of our quad-core, 8GB RAM, shared development machines ran out of memory?!

After lurking in #git on Freenode for a while I determined two things

Apparently nobody uses Git for projects this large
Git was retaining too much memory, like a memory leak, but just don't call it a memory leak.

To compound this, the rules for memory usage with Git are vastly different between a 32-bit machine and a 64-bit machine, and because we're just that cool, we're using 64-bit machines across the board. The amount of memory Git decides to keep resident while doing repository-intensive operations like the `git gc`, is 256MB on 32-bit machines, and 8GB on 64-bit machines. As these machines are shared between developers we use of ulimit(1), when you limit memory usage with ulimit(1) it restricts address space meaning both virtual and resident memory. When Git tried to mmap(2) gigabytes of address space to do it's operations, the kernel stepped in to intervene and started returning ENOMEM to Git which promptly exited.

After raising this enough times, I finally caught spearce who was able to identify the problem and supply a patch that fixed the memory allocation issues with Git and a repository of Slide's size. First obstacle overcome, now I could actually test a Git workflow inside of Slide.

Git at Slide

Now that I could pack the repository on our development machines, I could get the repository down to a reasonable 3.0GB, i.e. .git/ weighed in at 3GB making a entire tree ~5.5GB (far more managable than 11GB). Despite Git being a decentralized version control system, we still needed some semblance of centralization to ensure a couple basic rules for a sane workflow:

A centralize place to synchronize distributed versions of the repository
Changesets cannot be lost, ever.
QA must not be over-burdened when testing releases

This meant we needed a centralized, secure, repository which left us two options: Git over WebDav (https/http) or Gitosis. After discovering that `git-http-push(1), the executable responsible for doing Git pushes over WebDav has tremendous memory issues, I abandoned that as an option (a `git push` of the repository resulted in memory usage peaking at 11GB virtual, 3.5GB resident memory).

If you are looking to deploy Git for a larger audience in a corporate environment, I highly recommend Gitosis. What Gitosis does is allows for SSH to be used as the transport protocol for Git, and provides authentication by use of limited-shell user accounts and SSH keys; it's not perfect but it's the closest thing to maintainable for larger installations of Git (in my opinion).

So far the experimenting with Git at Slide is pretty localized to just my team, but with a combination of Gitosis, git-svn(1) and some "best practices" defined for handling the new system we've successfully continued development for over the past month without any major issues.

As this post is already quite lengthy, I'll be discussing the following two parts of our experimenting in subsequent posts:

Team Development with Git
Git back to Subversion, mostly automatically.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

NAnt and ASP.NET on Mono

04 May 2008

mono miscellaneous software development Most of my personal projects are built on top of ASP.NET, Mono and Lighttpd. One of the benefits of keeping them all running on the same stack (as opposed to mixing Python, Mono and PHP together) is that I don't need to maintain different infrastructure bits to keep them all up and running. Two key pieces that keep it easy to dive back into the the side-project whenever I have some (spurious) free time are my NAnt scripts and my push scripts.

NAnt
I use my NAnt script for a bit more than just building my web projects, more often than not I use it to build, deploy and test everything related to the site. My projects are typically laid out like:

bin/ Built DLLs, not in Subversion
configs/ Web.config files per-development machine
libraries/ External libraries, such as Memcached.Client.dll, etc.
schemas/ Files containing the SQL for rebuilding my database
site/ Fully built web project, including Web.config and .aspx files
sources/ Actual code, .aspx.cs and web folder (htdocs/ containing styles, javascript, etc)

Executing "nant run" will build the entire project and construct the full version of the web application in the site/ and finally fire up xsp2 on localhost for testing. The following NAnt file is what I've been carrying from project to project.

The Push Script
Since I usually build and deploy on the same machine, I use a simple script called "push.sh" to handle rsyncing data from the development part of my machine into the live directories.




#!/bin/bash


###############################


##      Push script variables


export NANT='/usr/bin/nant'


export STAGE=`hostname`


export SOURCE='site/'


export LIVE_TARGET='/serv/www/domains/myproject.com/htdocs/'


export BETA_TARGET='/serv/www/domains/beta.myproject.com/htdocs/'


export TARGET=$BETA_TARGET


###############################





###############################


##      Internal functions


function output {


        echo "===> $1"


}


function build { 


        ${NANT} && ${NANT} site


}


###############################





###############################


##      Build the site first


output "Building the site..."


build


if [ $? -ne 0 ]; then


        output "Looks like there was an error building! abort!"


        exit 1


fi





###############################


##      Start actual pushing


if [ "${1}" = 'live' ]; then 


        output " ** PUSHING THE LIVE SITE ***"


        export TARGET=$LIVE_TARGET


else


        output "Pushing the beta site"


fi





output "Using Web.config-${STAGE}"


output "Pushing to: ${TARGET}"





cp config/Web.config-${STAGE} site/Web.config


rsync --exclude *.swp --exclude .svn/ -av ${SOURCE} ${TARGET}

Depending on the complexity of the web application I might change the scripts up on a case-by-case basis, but for the most part I have about 5-6 projects out "in the ether" that are built and deployed with a derivative of the NAnt script and push.sh listed above. In general though, they provide a good starting point for the tedious bits of non-Visual Studio-based web development (especially if you're in an entirely Linux-based environment).

Hope you find them helpful :)

Parsing HTML with Python

03 May 2008

slide miscellaneous software development A while ago I jotted down about seven or so ideas of stuff that I thought would make good blog posts, somehow "markup parsers in Python" is next on the list, so I might as well spill the beans on how incredibly easy it is to process (X)HTML with Python and a little built in class called HTMLParser.

There have been a few occasions when I needed a quick (and dirty) way to perform transforms on some chunk of HTML or merely "search and replace" parts of it. While it might be cleaner to do something with XSLT or the likes, using them doesn't even begin to match the speed of development of an HTMLParser-based class in Python.

Getting Started
One major thing to keep in mind when working with HTMLParser, especially if you're newer to Python, is that it is what's referred to as an "old styled" object, meaning subclassing it is a bit different than "new styled" classes. Since HTMLParser is an old-styled object, any time you'd want to call a super-class defined method you would need to perform HTMLParser.superMethod(arg) instead of super(SubHTMLParser, self).superMethod(arg)

Creating the HTML parser
For the purposes of this example, I want something simple, so we're just going to take a block of markup and "tweak" all the <a> tags within it to be "sad" (whereas "sad" means they'll be bold, blue, and blinkey). The actual code to do so is only 50 lines long and is as follows:

import HTMLParser





class SadHTML(HTMLParser.HTMLParser):


    '''A simple HTML transform-class based upon HTMLParser.  All links shall be bold, blue and blinky :('''





    def __init__(self, *args, **kwargs):


        HTMLParser.HTMLParser.__init__(self)


        self.stack = []





    def handle_starttag(self, tag, attrs):


        attrs = dict(attrs)


        if tag.lower() == 'a':


            self.stack.append(self.__html_start_tag('blink', None))


            attrs['style'] = '%s%s' % (attrs.get('style', ''), 'color: blue; font-weight: bold;')


        self.stack.append(self.__html_start_tag(tag, attrs))


    


    def handle_endtag(self, tag):


        self.stack.append(self.__html_end_tag(tag))


        if tag.lower() == 'a':


            self.stack.append(self.__html_end_tag('blink'))





    def handle_startendtag(self, tag, attrs):


        self.stack.append(self.__html_startend_tag(tag, attrs))                





    def handle_data(self, data):


        self.stack.append(data)





    def __html_start_tag(self, tag, attrs):


        return '<%s%s>' % (tag, self.__html_attrs(attrs))





    def __html_startend_tag(self, tag, attrs):


        return '<%s%s/>' % (tag, self.__html_attrs(attrs))





    def __html_end_tag(self, tag):


        return '' % (tag)





    def __html_attrs(self, attrs):


        _attrs = ''


        if attrs:


            _attrs = ' %s' % (' '.join([('%s="%s"' % (k,v)) for k,v in attrs.iteritems()]))


        return _attrs





    @classmethod


    def depreshun(cls, markup):


        _p = cls()


        _p.feed(markup)


        _p.close()


        return ''.join(_p.stack)

The actual ins-and-outs of the parser are very simple; markup like "<a href="#">Hello</a><br/>" would execute accordingly:

handle_starttag('a', [('href', '#')])
handle_data('Hello')
handle_endtag('a')
handle_startendtag('br', [])

Since HTMLParser just gives you element tag names, and there attributes, SadHTML simply builds a list of strings out of what data is passed to it via the super class and then when everything is finished, ties the list back together with: ''.join(list_of_tags).
Executing the SadHTML.depreshun method on the contents of my last blog post is a good example, part of the post was:

An informal poll at the Slide offices this past week yielded these interesting results: at Slide.com, nearly 100% of white people seem to like "Stuff White People Like".

After running it through "SadHTML", the following markup is generated instead:

An informal poll at the Slide offices this past week yielded these interesting results: at Slide.com, nearly 100% of white people seem to like "Stuff White People Like".

If you're curious as to how much more you can do with HTMLParser, do check out the documentation. It's far more lenient than using eXpat for parsing HTML, and it's still fast enough to be used on longer documents (there's also htmllib available for Python but I've not used it yet).

← Newer posts Older posts →

Howdy!

Example usage of `git stash`

The Bookmarklet

The Code

How it actually works

No Habla Branching

Giant Disclaimer

Flattening into Subversion

Mostly Automagically

Wrap-up

The Bad Habits

The Slide Way to Git

Basic Team Workflow

Conclusion

Lazy loading in Python

Lazy loading in C#

US Naval Observatory Master Clock Time

Sep. 11, 07:29:02 UTC Sep. 11, 03:29:02 AM EDT Sep. 11, 02:29:02 AM CDT Sep. 11, 01:29:02 AM MDT Sep. 11, 12:29:02 AM PDT Sep. 10, 11:29:02 PM AKDT Sep. 10, 09:29:02 PM HAST

JavaScript Shell

Holy Hudson Batman!

Plug-it-in, plug-it-in!

There's always a catch

Subversion at Slide

Toying with Git

Git at Slide

Sep. 11, 07:29:02 UTC

Sep. 11, 03:29:02 AM EDT

Sep. 11, 02:29:02 AM CDT

Sep. 11, 01:29:02 AM MDT

Sep. 11, 12:29:02 AM PDT

Sep. 10, 11:29:02 PM AKDT

Sep. 10, 09:29:02 PM HAST