Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Let's Swap iPods.

Since I've started to spend such an enormous amount of my time with work and settling into a new apartment, I've had literally no time to discover new music. Because of this utter lack of time on my part, I've been pondering this idea for about the past month or two on a daily basis, I want to participate in an iPod Foreign Exchange Program.

I currently own a 30GB Video iPod (black) that has about 28GB of music on it with a few assorted podcasts here and there.

Here's what I'm thinking would constitute a good set of rules for swapping an iPod to "walk a mile in somebody's shoes" (musically).
  • We can be acquaintances, but not friends. I know what my friends listen to and can steal their iPods myself :)
  • The period to swap iPods would last one week
  • Both parties would make sure to un-sync their address book and calendars from the iPod, but not change any of the music (no trying to impress people)
  • The iPod swap is accompanied with a business card or means to coordinate a swap-back
  • Both parties must be respectful of the others' tastes, even if it's really weird (you know who you are)


I went ahead and removed my calendars and contacts from my iPod just in case I run into somebody on the train that has read this post and wants to swap right away, but failing that, if you're around San Francisco, let's swap iPods :)
Read more →

Experimenting with Git at Slide (Part 1/3)

For the past two months I've been experimenting with varying levels of success with Git inside of Slide, Inc.. Currently Slide makes use of Subversion and relies heavily on branches in Subversion for everything from project specific branches to release branches (branches that can live anywhere from under 12 hours to three weeks). There are plenty of other blog posts about the pitfalls of branching in Subversion that I won't go into here, suffice to say, it is...sub-par. Below is a rough diagram of our general current workflow with Subversion (I've had some other developers ask me "why don't you just work in trunk?" to which I usually wax poetic about the chaos of trunk when any project gets over 5 active developers (Slide engineering is somewhere between 30-50 engineers)).



There's always a catch

There are three major problems we've run up against with utilizing Subversion as our version control system at Slide:
  • Subversion's "branches" make context switching difficult
  • Depending on the age of a branch cut from trunk/, merges and maintainence is between difficult and impossible
  • Merging Subversion branches into each other causes a near total loss of revision history
Given that branches are a critical part of Slide's development process, we've historically looked at branch-strong version control systems as alternatives, such as
Perforce. Before I joined Slide in April of 2007, I was a heavy user of Perforce for my own consulting projects as well as for some of my work with the FreeBSD project as part of the Summer of Code program. In fact, my boss sent out a "Perforce Petition" to our engineering list on my third day at Slide...we still haven't switched away from Perforce.



Up until earlier this year I hadn't given it a second thought until the team I was working with grew and grew such that between me and four other engineers we were pushing a release anywhere from once to three times a week. That meant we were creating a Subversion "branch" multiple times a week, and a significant part of my daily routine became merging to our release branch and refreshing project branches from trunk/. All of a sudden Git was looking prettier and prettier, despite some of its warts. At this point in time I was already using Git for some of my personal projects that I never have time for, so I knew at the bare minimum that it was functional. What I didn't know was how to deploy and use it with a large engineering team that works in very high churn short iterations, like Slide's.

Subversion at Slide

Moving our source tree over into a system other than Subversion, from Subverison, was destined to be painful. The tree at Slide is deceptively large, we have a substantial amount of
Python running around (as Slide is built, top-to-bottom, in Python) and an incredible amount of Adobe Flash assets (.swf files), Adobe Illustrator assets (.ai files) and plenty of binary files, like images (.png/gif/jpeg). Currently a full checkout of trunk/ is roughly 2.5GB including artwork, flash, server and web application code. We also have roughly 88k revisions in Subversion, the summation of three years of the company's existence. Fortunately somebody along the line wrote a script (in Perl however) called "git-svn(1)" that is designed to do exactly what I needed, move a giant tree from Subversion to Git, from start to finish (similar to svn2p4 in Perforce parlance).


Toying with Git

When I first ran the command `git-svn init $SVN` I let the the command run for somewhere close to 6-7 hours before completing, I was shocked at the size of the generated repository. If our Git repository were to be left unpacked .git/ alone would be close to 9GB, adding the actual code on top of it, ~11GB. I decided that maybe packing this repository would be a good idea so I ran `git gc` and went to grab a coke from the fridge ... and the machine ran out of memory. One of our quad-core, 8GB RAM, shared development machines ran out of memory?!


After lurking in #git on Freenode for a while I determined two things
  1. Apparently nobody uses Git for projects this large
  2. Git was retaining too much memory, like a memory leak, but just don't call it a memory leak.
To compound this, the rules for memory usage with Git are vastly different between a 32-bit machine and a 64-bit machine, and because we're just that cool, we're using 64-bit machines across the board. The amount of memory Git decides to keep resident while doing repository-intensive operations like the `git gc`, is 256MB on 32-bit machines, and 8GB on 64-bit machines. As these machines are shared between developers we use of ulimit(1), when you limit memory usage with ulimit(1) it restricts address space meaning both virtual and resident memory. When Git tried to mmap(2) gigabytes of address space to do it's operations, the kernel stepped in to intervene and started returning ENOMEM to Git which promptly exited.

After raising this enough times, I finally caught
spearce who was able to identify the problem and supply a patch that fixed the memory allocation issues with Git and a repository of Slide's size. First obstacle overcome, now I could actually test a Git workflow inside of Slide.


Git at Slide

Now that I could pack the repository on our development machines, I could get the repository down to a reasonable 3.0GB, i.e. .git/ weighed in at 3GB making a entire tree ~5.5GB (far more managable than 11GB). Despite Git being a decentralized version control system, we still needed some semblance of centralization to ensure a couple basic rules for a sane workflow:
  • A centralize place to synchronize distributed versions of the repository
  • Changesets cannot be lost, ever.
  • QA must not be over-burdened when testing releases
This meant we needed a centralized, secure, repository which left us two options: Git over WebDav (https/http) or
Gitosis. After discovering that `git-http-push(1), the executable responsible for doing Git pushes over WebDav has tremendous memory issues, I abandoned that as an option (a `git push` of the repository resulted in memory usage peaking at 11GB virtual, 3.5GB resident memory).

If you are looking to deploy Git for a larger audience in a corporate environment, I highly recommend Gitosis. What Gitosis does is allows for SSH to be used as the transport protocol for Git, and provides authentication by use of limited-shell user accounts and SSH keys; it's not perfect but it's the closest thing to maintainable for larger installations of Git (in my opinion).

So far the experimenting with Git at Slide is pretty localized to just my team, but with a combination of Gitosis, git-svn(1) and some "best practices" defined for handling the new system we've successfully continued development for over the past month without any major issues.

As this post is already quite lengthy, I'll be discussing the following two parts of our experimenting in subsequent posts:



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide









Read more →

NAnt and ASP.NET on Mono

Most of my personal projects are built on top of ASP.NET, Mono and Lighttpd. One of the benefits of keeping them all running on the same stack (as opposed to mixing Python, Mono and PHP together) is that I don't need to maintain different infrastructure bits to keep them all up and running. Two key pieces that keep it easy to dive back into the the side-project whenever I have some (spurious) free time are my NAnt scripts and my push scripts.

NAnt
I use my NAnt script for a bit more than just building my web projects, more often than not I use it to build, deploy and test everything related to the site. My projects are typically laid out like:
  • bin/ Built DLLs, not in Subversion
  • configs/ Web.config files per-development machine
  • libraries/ External libraries, such as Memcached.Client.dll, etc.
  • schemas/ Files containing the SQL for rebuilding my database
  • site/ Fully built web project, including Web.config and .aspx files
  • sources/ Actual code, .aspx.cs and web folder (htdocs/ containing styles, javascript, etc)


Executing "nant run" will build the entire project and construct the full version of the web application in the site/ and finally fire up xsp2 on localhost for testing. The following NAnt file is what I've been carrying from project to project.









































































The Push Script
Since I usually build and deploy on the same machine, I use a simple script called "push.sh" to handle rsyncing data from the development part of my machine into the live directories.

#!/bin/bash
###############################
## Push script variables
export NANT='/usr/bin/nant'
export STAGE=`hostname`
export SOURCE='site/'
export LIVE_TARGET='/serv/www/domains/myproject.com/htdocs/'
export BETA_TARGET='/serv/www/domains/beta.myproject.com/htdocs/'
export TARGET=$BETA_TARGET
###############################

###############################
## Internal functions
function output {
echo "===> $1"
}
function build {
${NANT} && ${NANT} site
}
###############################

###############################
## Build the site first
output "Building the site..."
build
if [ $? -ne 0 ]; then
output "Looks like there was an error building! abort!"
exit 1
fi

###############################
## Start actual pushing
if [ "${1}" = 'live' ]; then
output " ** PUSHING THE LIVE SITE ***"
export TARGET=$LIVE_TARGET
else
output "Pushing the beta site"
fi

output "Using Web.config-${STAGE}"
output "Pushing to: ${TARGET}"

cp config/Web.config-${STAGE} site/Web.config
rsync --exclude *.swp --exclude .svn/ -av ${SOURCE} ${TARGET}


Depending on the complexity of the web application I might change the scripts up on a case-by-case basis, but for the most part I have about 5-6 projects out "in the ether" that are built and deployed with a derivative of the NAnt script and push.sh listed above. In general though, they provide a good starting point for the tedious bits of non-Visual Studio-based web development (especially if you're in an entirely Linux-based environment).

Hope you find them helpful :)
Read more →

Parsing HTML with Python

A while ago I jotted down about seven or so ideas of stuff that I thought would make good blog posts, somehow "markup parsers in Python" is next on the list, so I might as well spill the beans on how incredibly easy it is to process (X)HTML with Python and a little built in class called HTMLParser.

There have been a few occasions when I needed a quick (and dirty) way to perform transforms on some chunk of HTML or merely "search and replace" parts of it. While it might be cleaner to do something with XSLT or the likes, using them doesn't even begin to match the speed of development of an HTMLParser-based class in Python.

Getting Started
One major thing to keep in mind when working with HTMLParser, especially if you're newer to Python, is that it is what's referred to as an "old styled" object, meaning subclassing it is a bit different than "new styled" classes. Since HTMLParser is an old-styled object, any time you'd want to call a super-class defined method you would need to perform HTMLParser.superMethod(arg) instead of super(SubHTMLParser, self).superMethod(arg)


Creating the HTML parser
For the purposes of this example, I want something simple, so we're just going to take a block of markup and "tweak" all the <a> tags within it to be "sad" (whereas "sad" means they'll be bold, blue, and blinkey). The actual code to do so is only 50 lines long and is as follows: import HTMLParser

class SadHTML(HTMLParser.HTMLParser):
'''A simple HTML transform-class based upon HTMLParser. All links shall be bold, blue and blinky :('''

def __init__(self, *args, **kwargs):
HTMLParser.HTMLParser.__init__(self)
self.stack = []

def handle_starttag(self, tag, attrs):
attrs = dict(attrs)
if tag.lower() == 'a':
self.stack.append(self.__html_start_tag('blink', None))
attrs['style'] = '%s%s' % (attrs.get('style', ''), 'color: blue; font-weight: bold;')
self.stack.append(self.__html_start_tag(tag, attrs))

def handle_endtag(self, tag):
self.stack.append(self.__html_end_tag(tag))
if tag.lower() == 'a':
self.stack.append(self.__html_end_tag('blink'))

def handle_startendtag(self, tag, attrs):
self.stack.append(self.__html_startend_tag(tag, attrs))

def handle_data(self, data):
self.stack.append(data)

def __html_start_tag(self, tag, attrs):
return '<%s%s>' % (tag, self.__html_attrs(attrs))

def __html_startend_tag(self, tag, attrs):
return '<%s%s/>' % (tag, self.__html_attrs(attrs))

def __html_end_tag(self, tag):
return '' % (tag)

def __html_attrs(self, attrs):
_attrs = ''
if attrs:
_attrs = ' %s' % (' '.join([('%s="%s"' % (k,v)) for k,v in attrs.iteritems()]))
return _attrs

@classmethod
def depreshun(cls, markup):
_p = cls()
_p.feed(markup)
_p.close()
return ''.join(_p.stack)


The actual ins-and-outs of the parser are very simple; markup like "<a href="#">Hello</a><br/>" would execute accordingly:
  • handle_starttag('a', [('href', '#')])
  • handle_data('Hello')
  • handle_endtag('a')
  • handle_startendtag('br', [])


Since HTMLParser just gives you element tag names, and there attributes, SadHTML simply builds a list of strings out of what data is passed to it via the super class and then when everything is finished, ties the list back together with: ''.join(list_of_tags).
Executing the SadHTML.depreshun method on the contents of my last blog post is a good example, part of the post was:
An informal poll at the Slide offices this past week yielded these interesting results: at Slide.com, nearly 100% of white people seem to like "Stuff White People Like".


After running it through "SadHTML", the following markup is generated instead:
An informal poll at the Slide offices this past week yielded these interesting results: at Slide.com, nearly 100% of white people seem to like "Stuff White People Like".


If you're curious as to how much more you can do with HTMLParser, do check out the documentation. It's far more lenient than using eXpat for parsing HTML, and it's still fast enough to be used on longer documents (there's also htmllib available for Python but I've not used it yet).
Read more →

Hi5 goes 100%

Lou Moore and Hi5 haz a platfurm
It's so easy to get caught up in the flurry of things going on here in Silicon Valley (not to mention just at Slide), but I figured that Hi5 deserved being mentioned. I'd like to congratulate Lou, Anil, Paul, Zack and the rest of the Hi5 Platform team on being (from what I can tell) the first social network to turn their OpenSocial-based platform on 100% to users. As of last friday they finally ramped up to 100%, meaning every user on Hi5 can add OpenSocial applications that have been approved and added to the Hi5 applications gallery.

The past couple weeks I've been lurking on the #Hi5dev channel on Freenode, where most of the Hi5 team has been as well, dutifully answering questions and getting general developer feedback. I highly recommend following their developer blog where Lou (pictured here) has been posting regular updates and all the important things that you need to do in order to get your application viral, approved and reaching Hi5's users.

Some of the applications we've launched include: Top Friends, Slide TV and SuperPoke. Of course, if all you want to do on Hi5 is be friends with me, you can find me here :).

Overall the OpenSocial/Hi5 platform has been an interesting experience, moving more of the application into the realm of JavaScript as opposed to what I've become used to on the Facebook platform has made me think harder about the separation of front-end code from back-end code and where you actually draw the line when both are written in the same language. One down, only two to go!

Read more →

A few changes

If you're subscribed to my RSS feed you wouldn't have noticed, but if not you probably already know by now about the change in the look of unethicalblogger.com. I got tired of the old (boring) red theme and dug around on drupal.org until I found one I liked and then customized it to suit my needs.

Other than that, I'm very proud to announce that this site is the 7th hit on Google for the query "unethical" (check it out). Besides the obvious tactics (kicking puppies), I'm wondering how I can reach the #1 result for "unethical".

Any ideas?
Read more →

Data Binding with jQuery

Since I've come from the land of desktop application development, there are a few concepts that I don't think quite "made the voyage" from desktop/thick-client development to web/thin-client development. The concept of "data binding" is completely lost in my opinion in the land of Javascript and HTML (not to mention the concept of "controls" to begin with).A few weeks ago while exploring a couple other concepts for how to improve our overall frontend development at Slide I prototyped a means of "databinding" controls, or at the very least DOM elements to data-providing Javascript functions.

I've posted an example here of some of the data binding code I've written for experimentation purposes. In the example page linked, there is a <ul> tag that is "bound" to a Javascript function, the Javascript function creates an array of associative arrays inline (it could very well be powered by some AJAX-oriented Javascript with minor adjustments). Using the results of the "databind" function specified on the bindable element, it creates a set of child nodes to attach to the parent list. In effect, the following code:



Will generate the following DOM tree, after the "bind()" function has been run on page load:

  • List Item #1 OMG

  • List Item #2

  • List Item #3




Since the code is relatively simple (in my opinion) I figured I would throw it out there in all it's minimalistic glory and get some general feedback on the concept before I go "all out" and create a full-on jQuery extension based off of the linked prototype above. I'm trying to think of ways to make it more powerful as well, built-in support for binding to the results of an asynchrounous call to a URL that returns JSON that would then create the elements is at the top of my TODO list at this point. Feedback, flames and actual useful critiques are all welcome; I'll be sure to post again once I have the time to create the jQuery extension for binding, this however is more experimental quality (i.e. don't use it, i'm not).

What do you think?



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide

Read more →

Dashboard for Linux users

With the release of Mac OS 10.5 (Leopard) I found myself in a tough spot, I liked certain features added into Leopard, but I couldn't stand some of the stability issues I was having and the other bugs that would interrupt my normal workflow during the day. In an effort to alleviate some of my frustrations with Leopard, I experimented for a week with running Gnome (with Compiz) on my openSUSE workstation. In general all the important bits were there, By this point, I had already switched from using any sort of GUI editor to work, but instead had switched over to using vim on a shared development server here at the office. Pidgin in Gnome Given that Drosera still wasn't fast enough for my normal day-to-day web development debugging, I was also using Firefox and Opera for most of my web browsing as well. Chat was covered by Adium, so using it's Linux/Windows counter-part, Pidgin was no trouble at all.

After switching for a week and falling in love with Compiz and some of the features it offers, I feel like I can accomplish far more now on Linux than I was on Mac OS X. For me Mac OS X became the new Windows, I was fighting the system to work almost as much as I was do actual work (between the IMAP code in Mail.app crashing and Safari leaking, I was not a happy camper). The one missing feature however was Dashboard. I'm not a religious user of Dashboard, but I always used it to keep little chunks of information stored away either in post-it notes, via clocks, or tickers, etc. I'd not found a good solution until recently, by way of Opera and a combination of widgets and one of the default Compiz Fusion plugins.

(click to enlarge)


The concept behind Opera Widgets is the exact same as behind Dashboard widgets, tiny little web applications running on your desktop, by default however they run on your desktop. This wasn't going to work for me, I like to stash widgets away, and access them through the trusty F12 button as per usual with Dashboard.

In enters the "Widget Layer" plugin for Compiz Fusion , which allows you to assign rules for placing regular windows sitting in the window manager, into this special widget layer that appears just like Dashboard does on Mac OS X (with the actual desktop faded out in the background). In order to group all Opera Widgets in the Widget Layer, you can set the "Widget Windows" field to:
role=opera-widget

Which will cause all enabled Opera Widgets to be availabe at a keypress of F12, if you're on Linux, I highly suggest you try it out, it's extremely useful, especially if you grab some of the more developer focused widgets from the widget directory

Of course, there's plenty of reasons to use Compiz. One of my favorite plugins is the "Annotate" plugin that allows you to draw on your screen, which comes in handy for going over interfaces with coworkers.

In general the addition of Compiz to the Linux desktop I feel is an important one, it drastically improves the rendering of windows since it's essentially doing what Quartz Extreme is doing on Mac OS X in terms of offloading some rendering on the graphics card's GPU. Having really bitching eye-candy certainly doesn't hurt either, So far with Compiz I have what equates to "Spaces", "Expose", and "Dashboard" from Leopard, along with a myriad of other goodies like "Wobbly Windows", true transparency and reflections on arbitrary UI elements and of course, a fish tank inside my desktop. (If you're using openSUSE, the one-click packages for Compiz Fusion can be found here on the wiki).
Read more →

Srsly ur doin it wrong.

Via twitter I have been griping a bit about Javascript recently. It's quite possible that I've been complaining about it far more than I complain about other things via twitter, which is a tall order to match.

When addressing something as big and scary as say, a platform built on Javascript, it forces you into looking at Javascript in a way different than how I think most developers (myself included) have looked at Javascript. Most Javascript that I've seen has been hideous. Gobs and gobs of functions and procedural garbage thrown into a series of files that kinda makes sense, but really doesn't. It would seem that most developers charged with writing Javascript don't understand how to write object-oriented Javascript. In fact about two or three months ago when considering topics to discuss in a front-end developers meeting here at Slide, I bit the bullet, raised my hand and said "Can you explain how to do object-oriented Javascript? Because I honestly don't have a fucking clue."

In the past Javascript that I've written has been to compliment existing backend web-application code and front-end code, i.e. I wasn't looking at Javascript as one of the building blocks of my application, I was looking at it as a bit of mortar spread between the cracks to smooth out the surface of the application. The difference in how you start to use Javascript in a web application makes an enormous difference 6 months to a year down the road. How terrible your code (this isn't actually segregated to Javascript) is becomes far more apparent when other developers start to work with your code as well, it's tremendously embarrassing to have to answer questions like "where's the code that generates that one DOM element?" As a general rule, coding all by your lonesome, especially with a tight schedule, will produce less than clean results (unfortunately Javascript is one of the languages I've found where this is more of the norm than the exception).

A lot of what's driven the change from my Javascript being the mortar to being the bricks in my work has been the adoption of jQuery which I highly recommend along with the jQuery.ui library. jQuery makes developing Javascript feel like actual programming, instead of hackish-scripting, which means you'll start to view your Javascript code differently too. Dealing with scoping issues, and prototype-based programming in Javascript isn't all rainbows and butterflies but "doing it right" will help you sleep at night and help reduce the amount of embarrassing questions you'll have to answer to the next poor unfortunate soul that inherits your code.

Some of the resources I've found useful in getting over the barrier to object-oriented Javascript have been:

I'm still not a huge fan of Javascript, but I'm hating it less these days :)

Read more →