Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Toying around with ASP.NET MVC and NAnt

I recently found myself toying around with a number of web frameworks (like Seaside) to get a good read on who's doing what in the web world outside of Python and Django, when I stumbled across the ASP.NET MVC Add-in for MonoDevelop. Though the new Vim keybindings are sweet, I still can't effectively get work done in MonoDevelop yet. What MonoDevelop does do however is support generating Makefiles for any given project, which allowed me to create some Makefiles for an ASP.NET MVC project I had created in MonoDevelop, and port those Makefiles over to fit my NAnt and Vim-based workflow.

Along with building the necessary DLLs, I prefer to use my NAnt scripts to fire up the NUnit console and fire up a development instance of XSP to test my web applications out. All said and done this fairly basic script does the job; I typically run it with:
nant test run


Not much else to say, hope you find it useful.

Read more →

Investment Strategy for Developers

It seems every time @jasonrubenstein, @ggoss3, @cablelounger and I sit down to have lunch together, we invariably sway back and forth between generic venting about “work stuff” and best practices for doing aforementioned “work stuff” better. The topic of “reusable code” came up over Mac ‘n Cheese and beers this afternoon, and I felt it warranted “wider distribution” so to speak (yet-another-lame-Slide-inside-joke).

We, Slide, are approaching our fourth year in existence as a startup which means all sorts of interesting things from an investor standpoint, employees options are starting to become fully-vested and other mundane and boring financial terms. Being an engineer, I don’t care too much about the stocks and such, but rather about development; four years is a lot from a code-investment standpoint (my bias towards code instead of financial planning will surely bite me eventually). Projects can experience bitrot, bloating (read: Vista’ing) and a myriad other illnesses endemic to software that’s starting to grow long in the tooth.

At Slide, we have a number of projects on slightly different trajectories and timelines, meaning we have an intriguing cross-section of development histories representing themselves. We are no doubt experiencing a similar phenomenon to Facebook, MySpace, Yelp and a number of other “startups” who match this same age group of 4-7 years. Just like our bretheren in the startup community, we have portions of code that fit all the major possible categories:

  • That which was written extremely fast, without an afterthought to what would happen when it serve tens of millions of users
  • That which was written slowly, trying to cater to every possible variation, ultimately to go over-budget and over-schedule.
  • That which has been rewritten. And rewritten. And rewritten.
  • Then the exceptionally rare, that which has been written in such a fashion that it has been elegantly extended to support more than it was originally conceived to support.

In all four cases, “we” (whereas “we” refers to an engineering department) have invested differently in our code portfolio depending on a number of factors and information given at the time. For example, it’s been a year since Component X was written. Component X is currently used by every single product The Company owns, but over the past year it’s been refactored and partially rewritten each time a new product starts to “use” Component X. In its current state, Component X’s code reads more like an embarrasing submission to The Daily WTF with its hodge-podge of code, passed from team to team, developer to developer, like some expensive game of “Telephone” for software engineers. After the fact, it’s difficult and not altogether helpful to try to lay blame with the mighty sword of hindsight, but it is feasible to identify the reasons for the N number of developer hours lost fiddling, extending, and refactoring Component X.

  • Was the developer responsible for implementing Component X originally aware of the potentially far reaching scope of their work?
  • Was the developer given an adequate time frame to implement a proper solution, or “this should have shipped yesterday!”
  • Did somebody pass the project off to an intern or somebody who was on their way out the door?
  • Were other developers in similar realms of responsibility asked questions or for their opinions?
  • Is/was the culture proliferated by Engineering Leads and Managers encouraging of best practices that lead to extensible code?

I’ve found, watching Slide Engineering culture evolve, that the majority of libraries or components that go through multiple time/resource-expensive iterations tend to have experienced shortcomings in one of the five sections above. More often than not, a developer was given the task to implement Some Thing. Simple enough, Some Thing is developed with the specific use-case in mind, and the developer moves on with their life. Three months later however, somebody else asks another developer, to add Some Thing to another product.

“Product X has Some Thing, and it works great for them, let’s incorporate Some Thing into Product Y by the end of the week.”

Invariably this leads to heavy developer drinking. And then perhaps some copy-paste, with a dash of re-jiggering, and quite possibly multiple forks of the same code. That is, if Some Thing was not properly planned and designed in the first place.

Working as a developer on products that move at a fast pace, but will be around for longer than three months is an exercise in investment strategy (i.e. managing technical debt). What makes great Engineering Managers great is their ability to determine when and where to invest the time to do things right, and where to write some Perl-style write-only code (zing!). What makes a startup environment a more difficult one to work on your “code portfolio” is that you don’t usually know what may or may not be a success, and in a lot of cases getting your product out there now is of paramount importance. Unfortunately there isn’t any simple guideline or silver bullet, and there is no bailout, if you invest your time poorly up front, there will be nobody to save you further down the line when you’re staring an resource-devouring refactor in its ugly face.

Where do you invest the time in any given project? What will happen if you shave a few days by deciding not to write any tests, or documentation. Will it cost you a week further down the road if you take shortcuts now?

I wish I knew.

Read more →

Writing for Stability (or: I hate writing tests)

Since moving to the infrastructure team at Slide I’ve found the rate at which my software gets deployed has plummeted, while the quantity of the code that I am deploying to the live site has sky-rocketed. When on an applications-team within Slide, code is typically pushed in small incrememnts a few days a week, if not daily. This allows for really exciting compact milestones that make more fine-grained analysis achievable, post-push for product management and metrics purposes. On the infrastructure team however, the requirements are wholly different, the “fail-fast, ship-now” mentality that prevails when doing user-facing web application development just does not work in infrastructure. The most important aspects of building out infrastructure components are stability and usability, our “customers” are the rest of engineering, and that has a definite effect on your workflow.

Code Review

One of the things that @jasonrubenstein and I always did when we worked together, was occasional code review. In the majority of cases, our “code review” sessions were more or less rubber duck debugging, but occasionally it would escalate into more complex discussions about the “right way” to do something. When you’re writing infrastructure software for services that are handling tens of millions of users the notion of “code review” goes from being optional to being absolutely required. Discussions are had on the correctness or performance characteristics of database indexes, the necessity of some objects instantiating default values of attributes or having them lazily load, or debating garbage collection of objects while meticulously watching memory consumption.

For one of my most recent projects, I was working on something in C, a rarity at Slide since we work with managed code in Python the majority of the time. As the project neared completion, I counted roughly two or three hours of code review time dedicated by our Chief Architect. The attention to detail paid to this code was extremely high, as the service was going to be handling millions of requests from other levels of the Slide infrastructure, before getting cycled or restarted.

A particularly frustrating aspect of code review by your peers is that a second set of eyes not only will find problems with your code, but will likely mean refactoring or bug fixes, more work. In my case, whenever a bug or stability issue was discovered, a test needed to be written for it to make sure the bug did not present itself again, the workload would be larger than if I had just fixed the bug and moved on with my life.

Testing, oh the testing

If you expect to write an API, have it stablize, and then be used, you must write test cases for it. I’m not a TDD “nut”, I actually hate writing test cases, I absolutely abhor it. Writing test cases is responsible and the adult thing to do. In my experience, it can also be tedious and usually comes as a result of finding flaws in my own software. The majority of tests that I find myself writing are admissions of defeat, admitting that I don’t crap roses and by george, my code isn’t perfect either.

On the flipside however, I hate debugging even more. Stepping through a call stack is on par with waterboarding in my book, torture. Which means I’m more than willing to tolerate writing tests so long as it means I can be certain I will be cutting down on the time spent being tortured with either pdb or gdb. In almost every situation where I’ve written tests properly, like the responsible developer that I am, I find them saving me at some point. It might be getting late, or I’m just feeling a little cavalier, but tests failing almost always indicates that I’ve screwed something up I shouldn’t have.

Additionally, now that the majority of my projects are infrastructure-level projects, the tests I write serve a second “undocumented” purpose, they provide ready-made examples for other developers on how to use my code. Bonus!

The more and more code I write, the more amazed I am at the pushback against testing in general, there exists decent libraries for every language imaginable (well, perhaps BrainfuckUnit doesn’t exist), and its sole purpose (in my opinion) is to save develpoment time, particularly when coupled with a good continuous integration server. Further to that effect, if you’re building services for other developers to use, and you’re not writing tests for it, you’re not only wasting your time and your employer’s money, but the time of your users as well (read: stop being a jerk).

Sure there are a lot of articles/books/etc about writing stable code, but in my opinion, solid code review and testing will stablize your code far more than any design pattern ever will.

Read more →

Awesomely Bad

A coworker of mine, @teepark and I recently fell in love with tiling window managers, Awesome in particular. The project has been interesting to follow, to say the least. When I first installed Awesome, from the openSUSE package directory, I had version 2, it was fairly basic, relatively easy to configure and enough to hook me on the idea of a tiling window manager. After conferring with @teepark, I discovered that he had version 3 which was much better, had some new fancy features, and an incremented version number, therefore I required it.

In general, I’m a fairly competent open-source contributor and user. Autoconf and Automake, while I despise them, aren’t mean and scary to me and I’m able to work with them to fit my needs. I run Linux on two laptops, and a few workstations, not to mention the myriad of servers I’m either directly or peripherally responsible for. I grok open sources. Thusly, I was not put off by the idea of grabbing the latest “stable” tarball of Awesome to build and install it. That began my slow and painful journey to get this software built, and installed.

  • Oh, it needs Lua, I’ll install that from the repositories.
  • Hm, what’s this xcb I need, and isn’t in the repositories. I guess I’ll have to build that myself, oh but wait, there’s different subsets of xcb? xcb-util, xcb-proto, libxcb-xlib, xcb-kitchensink, etc.
  • Well, I need xproto as well, which isn’t in the repositories either.
  • CMake? Really guys? Fine.
  • ImLib2, I’ve never even heard of that!
  • libstartup-notification huh? Fine, i’ll build this too.

After compiling what felt like an eternity of subpackages, I discovered a number of interesting things about the varying versions of Awesome v3. The configuration file format has changed a few times, even between one release candidate to another. I ran across issues that other people had that effectively require recompilling X11’s libraries to link against the newly built xcb libraries in order to work (/usr/lib/libxcb-xlib.so.0: undefined reference to _xcb_unlock_io). Nothing I seemed to try worked as I might expect, if I couldn’t recompile the majority of my system to be “bleeding edge” I was screwed. The entire affair was absolutely infuriating.

There were a few major things that I think the team behind Awesome failed miserably at accomplishing, that every open source developer should consider when releasing software:

  • If you depend on a hodge-podge of libraries, don’t make your dependency on the bleeding edge of each package
  • Maintain an open dialogue with those that package your software, don’t try to make their job hell.
  • When a user cannot build your packages with the latest stable versions of their distribution without almost rebuilding their entire system, perhaps you’re “doin’ it wrong”
  • Changing file formats, or anything major between two release candidates is idiocy.
  • If you don’t actually care about your users, be sure to state it clearly, so then we don’t bother using or trying to improve your poor quality software

In the end, I decided that Haskell isn’t scary enough not to install XMonad, so I’ve started replacing machines that run Awesome, with XMonad, and I’m not looking back. Ever.

Read more →

Jython, JGit and co. in Hudson

At the Hudson Bay Area Meetup/Hackathon that Slide, Inc. hosted last weekend, I worked on the Jython plugin and released it just days after releasing a strikingly similar plugin, the Python plugin. I felt that an explanation might be warranted as to why I would do such a thing.

For those that don’t know, Hudson is a Java-based continuous integration server, one of the best CI servers developed (in my humblest of opinions). What makes Hudson so great is a very solid plugin architecture allowing developers to extend Hudson to support a wide variety of scripting languages as well as notifiers, source control systems, and so on (related post on the growth of Hudson’s plugin ecosystem). Additionally, Hudson supports slaves on any operating system that Java supports, allowing you to have a central manager (the “master” Hudson server/node) and a vast network of different machines performing tasks and executing jobs. Now that you’re up to speed, back to the topic at hand.

Jython versus Python plugin. Why bother with either, as @gboissinot pointed out in this tweet? The interesting thing about the Jython plugin, particularly when you use a large number of slaves is that with the installation of the Jython plugin, suddenly you have the ability to execute Python script on every single slave, regardless of whether or not they actually have Python installed. The more “third party” that can be moved into Hudson by way of the plugin system means reduced dependencies and difficulty setting up slaves to help handle load.

Take the “git” versus the “git2” plugin, the git plugin was recently criticized on the #hudson channel because of it’s use of the JGit library, versus “git2” which invokes git(1) on the command line. The latter approach is flawed for a number of reasons, particularly the reliance on the git command line executables and scripts to return consistent formatting is specious at best even if you aren’t relying on “porcelain” (git community terminology for front-end-ish script and code sitting on top of the “plumbing”, the breakdown is detailed here). The command-line approach also means you now have to ensure every one of your slaves that are likely to be executing builds have the appropriate packages installed. One the flipside however, with the JGit-based approach, the Hudson slave agent can transfer the appropriate bytecode to the machine in question and execute that without relying on system-dependencies.

The Hudson Subversion plugin takes a similar approach, being based on SVNKit.

Being a Python developer by trade, I am certainly not in the “Java Fanboy” camp, but the efficiencies gained by incorporating Java-based libraries in Hudson plugins and extensions is a no brainer, the reduction of dependencies on the systems incorporated in your build farm will save you plenty of time in maintenance and version woes alone. In my opinion, the benefits of JGit, Jython, SVNKit, and the other Java-based libraries that are running some of the most highly used plugins in the Hudson ecosystem continue to outweigh the costs, especially as we find ourselves bringing more and more slaves online.

Read more →

Template Theory

Since becoming the (de-facto) maintainer of the Cheetah project I’ve been thinking more and more about what a templating engine should do and where the boundary between template engine and language are drawn. At their most basic level, template engines are means of programmatically generating large strings or otherwise massaging chunks of text. What tends to separate template engines from one another are: the language they’re written in and what level of “host-language” access they offer the author of the template.

Cheetah is special in that for all intents and purposes Cheetah is Python which blurs the line between the controller layer and the view layer, as Cheetah is compiled into literal Python code. In fact, one of the noted strengths of Cheetah is that Cheetah templates can subclass from regular Python objects defined in normal Python modules, and vice versa. That being the case, how do you organize your code, and where should particular portions physically reside in the source tree? What qualifies code to be entered into a .py file versus a .tmpl file? If you zoom out from this particular problem, to a larger scope, I believe there is a much larger question to be answered here: as a language, what should Cheetah provide?

Since Cheetah compiles down to Python, does it merit introducing all the Python constructs that one has at their disposal within Cheetah, including:

  • Properties
  • Decorated methods
  • Full/multiple inheritance
  • Metaclasses/class factories

Attacked from the other end, what Cheetah-specific language constructs are acceptable to be introduced into Cheetah as a Python-based hybrid language? Currently some of the language constructs that exist in Cheetah that are distinct to Cheetah itself are:

  • #include
  • #filter
  • #stop
  • #shBang
  • #block
  • #indent
  • #transform
  • #silent
  • #slurp
  • #encoding

Some of the examples of unique Cheetah directives are necessary in order to manipulate template output in ways that aren’t applicable to normal Python (take #slurp, #indent, #filter for example), but where does one draw the line?

Too add yet another layer of complexity into the problem, Cheetah is not only used in the traditional Model-View-Controller set up (e.g. Django + Cheetah templates) but it’s also used to generate other code, i.e. Cheetah is sometimes used as a means of generating source code (bash, C, etc).

In My Humble Opinion

Cheetah, at least to me, is not a lump of text files that you can perform loops and use variables in, it is a fully functional, object-oriented, Pythonic text-aware programming language. Whether or not it compiles to Python or is fully interoperable with Python is largely irrelevant (that is not to say that we don’t make use of this feature). As far as “what should Cheetah provide?” I think the best way to answer the question is to not think about Cheetah as Python, or as a “strict” template engine (Mako, Genshi, etc) but rather as a domain specific language for complex text generation and templating. When deciding on what Python features to expose as directives in Cheetah (the language) the litmus test that should be evaluated against is: does this make generating text easier?

Cheetah need not have hash-directives for every feature available in Python, the idea of requiring meta-classes in Cheetah is ridiculous at best, a feature like decorators however could prove quite useful in text processing/generation (e.g. function output filters), along with proper full inheritance.

My goals ultimately with Cheetah, are to make our lives easier developing rich interfaces for our various web properties, but also to make “things” faster. Whereas “things” can fall under a few different buckets: development time, execution time, maintenance time.

Cheetah will likely look largely the same a year from now, and if we (the developers of Cheetah) have done our jobs correctly, it should be just as simple to pick up and learn, but even more powerful and expressive than before.

Read more →

Slide Open Source

It’s not been a secret that I’m a big fan of open source software, I would likely not be where I am today had I not started with projects like OpenBSD or FreeBSD way back when. While my passion for open source software and the “bazaar” method of developing software might not be shared by everybody at Slide, Inc, everybody can certainly get on board with the value of incorporating open source into our stack, which is almost entirely comprised of Python (and an assortment of other web technologies).

Along those lines, there’s been some amount of discussion about what we can or should open source from what we’ve developed at Slide, but we’ve not really pushed anything out into the ether as of yet. Today however, I think we finally put our foot in the door as far as contributing back to the open source community as a whole, we’re now on GitHub as “slideinc, yay! (coincidentally we have a slideinc twitter account too)

Currently the only project that’s come directly out of Slide, and shared via the slideinc GitHub account is PyVE, a Python Virtual Earth client that I hacked together recently to tinker with some Geocoding (released under a 3-clause BSD license). In the (hopefully) near future we’ll continue to open source some other components we’ve either created or extended internally.

If you’re not a GitHub user, you should definitely check GitHub out, it’s a pretty impressive site. If you are a GitHub user, or a Python developer, you should “follow” the slideinc user on GitHub to catch the cool stuff that we may or may not ever actually release ;)

Read more →

Breathing life into a dead open source project

Over the past couple years that I have been working at Slide, Inc. I’ve had a love/hate relationship with the Cheetah templating engine. Like almost every templating engine, it allows for abuse by its users, which can result in some templating code that looks quite horrendous, contributing significantly to some negative opinions of the templating engine. At one point, I figured an upgrade of Cheetah would help correct some of these abuses and I distinctly remember pushing to upgrade to the 2.xx series of Cheetah. I then found out that I had unintentionally volunteered myself to oversee the migration and also to update any ancient code that was lying around that depended on “features” (see: bugs) in Cheetah prior to the 2.xx series. We upgraded to Cheetah 2.xx and life was good, but Cheetah was practically dead.

The last official release of Cheetah was in November of 2007, this is not something altogether uncommon in the world of open source development. Projects come and go, some reach a point in their growth and development where they’re abandoned, or their community dissipates, etc. As time wore on, I found myself coming up with a patch here and there that corrected some deficiency in Cheetah, but I also noticed that many others were doing the same. There was very clearly a need for the project to continue moving forward, and with my introduction to both Git and GitHub as a way of distributing development, I did what any weekend hacker is prone to do, I forked it. Meet Community Cheetah —————- On January 5th, 2009 I started to commit to my local fork of the Cheetah code base (taken from Cheetah CVS tree), making sure my patches were committed but also taking the patches from a number of others on the mailing list. By mid-March I had collected enough patches to properly announce Cheetah Community Edition v2.1.0 to the mailing list. I was entirely unprepared for the response.

Whereas the previous 6 months of posts to the mailing list averaged about 4 messages a month, March exploded to 88 messages, 20 of them in the thread announcing Cheetah CE (now deemed Community Cheetah (it had a better ring to it, and an available domain name to boot)). All of a sudden the slumbering community is awake and the patches have started to trickle in.

We’ve fixed some issues with running Cheetah on Python 2.6, Cheetah now supports compiling templates in parallel, issues with import behavior have been fixed and added a number of smaller features. In 2008 there were six commits to the Cheetah codebase, thus far in 2009 there have been over seventy (I’m still waiting on a few patches from colleagues at other startups in Silicon Valley as well).

I’m not going to throw up a “Mission Accomplished” banner just yet, Cheetah still needs a large amount of improvement. It was written during a much different era of Python, the changes in Python 2.6 and moving forward to Python 3.0 present new challenges in modernizing a template engine that was introduced in 2001.

Being a maintainer

Starting your own open source project is tremendously easy, especially with the advent of hosts like Google Code or GitHub. What’s terrifying and difficult, is when other people depend on your work. By stepping up and becoming the de-facto maintainer of Community Cheetah, I’ve opened myself up to a larger collection of expectations than I originally anticipated. I feel as if I have zero credibility with the community at this point, which means I painstakingly check the changes that are committed and review as much code as possible before tagging a release. I’m scared to death of releasing a bad release of Community Cheetah and driving people away from the project, the nightmare scenario I play over in my head when tagging a release in Git is somebody going “this crap doesn’t work at all, I’m going to stick with Cheetah v2.0.1 for now” such that I cannot get them to upgrade to subsequent releases of Community Cheetah. I think creators of a project have a lot of “builtin street cred” with their users and community of developers, whereas I still have to establish my street cred through introduction of bug fixes/features, knowledge of the code base and generally being available through the mailing list or IRC.

Moving Forward

Currently I’m preparing the third Community Cheetah release (which I tagged today) v2.1.1 which comes almost a month after the previous one and introduces a number of fixes but also some newer features like the #transform directive, markdown support, and 100% Python 2.6 compatibility.

Thanks to an intrepid contributor, Jean-Baptiste Quenot, we have a v2.2 release lined up for the near future which fixes a large number of Unicode specific faults that Cheetah currently has (the code can currently be found in the unicode branch) and moves the internal representation of code within the Cheetah compiler/parser to a unicode string object in Python.

I eagerly look forward to more and more usage of Cheetah, with other templating engines out there for Python like Mako and Genshi I still feel Cheetah sits far and above the others in its power and versatility but has just been neglected for far too long.

If you’re interested in contributing to Cheetah, you can fork it on GitHub, join the mailing list or find us on IRC (#cheetah on Freenode).

This experiment on restarting an open source project is far from over, but we’re off to a promising start.

Read more →