Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Twenty Eleven

I wanted to wish everybody foolish enough to keep my RSS feed in their news reader a happy twenty eleven from Victoria, Canada. While I won’t do a big 2010 “year in review” style post, I wanted to point out some milestones the year has had for me:

  • In 2010, I became a married man. Hooray new tax status!
  • In 2010, Slide was acquired by Google, giving me the liquidity to use previously purchased stock to buy a nice BLT sandwich on Wed. September 15th.
  • In 2010, my lovely wife finished her paralegal studies. Bringing her degree count to two, eclipsing my zero.
  • In 2010, I moved from San Francisco to Berkeley, adding two more modes of transportation to my morning commute
  • In 2010, I managed to not die in any fashion, comically or otherwise.
Empress by Night
Read more →

Ada? Surely you jest Mr. Pythonman

The past couple weeks I’ve been spending my BART commutes learning the Ada programming language. Prior to starting to research Ada, I sat in my office frustrated with Python for my free time hackery. Don’t get me wrong, I love the Python language, I have enjoyed the ease of use, dynamic model, rapid prototyping and expressiveness of the Python language, I just fall into slumps occasionally where some of Python’s “quirks” utterly infuriating. Quirks such as its loosey-goosey type system (which I admittedly take advantage of often), lack of good concurrency in the language, import subsystem which has driven lesser men mad and its difficulty in scaling organically for larger projects (I’ve not yet seen a large Python codebase that hasn’t been borderline “clusterfuck”.)

Before you whip out the COBOL and Fortran jokes, I’d like to let it known up front that Ada is a modern language (as I mentioned on reddit, the first Ada specification was in 1983, 11 years after C debuted, and almost 30 years after COBOL and Fortran were designed). It was most recently updated with the “Ada 2005” revision and supports a lot of the concepts one expects from modern programming languages. For me, Ada has two strong-points that I find attractive: extra-strong typing and built-in concurrency.

Incredibly strong typing

The typing in Ada is unlike anything I’ve ever worked with before, coming from a C-inspired languages background. Whereas one might use the plus sign operator in Python to add an int and a float together without an issue, in Ada there’s literally zero auto-casting (as far as I’ve learned) between types. To the inexperienced user (read: me) this might seem annoying at first, but it’s fundamental to Ada’s underlying philosophy of “no assumptions.” If you’re passing an Integer into a procedure that expects a Float, there will be no casting, the statement will error at compile time.

Concurrency built-in

Unlike C, Java, Objective-C and Python (languages I’ve used before), Ada has concurrency defined as part of the language, as opposed to an abstraction on top of an OS level library (pthreads). In Ada this concept is called “tasking” which allows for building easily concurrent applications. Unlike OS level bindings built on top of pthreads (for example) Ada provides built in mechanisms for communicating between “tasks” called “rendezvous” along with scheduling primitives.

Being able to define a “task” as this concurrent execution unit that uses this rendezvous feature to provide “entries” to communicate with it is something I still haven’t wrapped my head around to be honest. The idea of a language where concurrency is a core component is so new to me I’m not sure how much I can do with it.

For my first “big” project with Ada, I’ve been tinkering with a memcached client in Ada which will give me the opportunity to learn some Ada fundamentals before I step on to bigger projects. Disregarding the condescending jeers from other programmers who one could classify as “leet Django haxxorz”, I’ve been enjoying the experience of learning a new vastly different language than one that I’ve tried before.

So stop picking on me you big meanies :(

Read more →

GNU/Parallel changed my life

The @Apture ElephantsOver the past month or so I’ve fallen in love with an incredibly simple command line tool: GNU/Parallel. Parallel has more or less replaced my use of xargs when piping data around on the many machines that I use. Unlike xargs however, Parallel lets me make use of the many cores that I have access to, either on my laptop or the many quad and octocore machines we have lying around the Apture office.

Using Parallel is incredibly easy, in fact the docs enumerate just about every possible incantation of Parallel you might want to use, but starting simple you can just pipe stuff to it:

cat listofthings.txt | parallel --max-procs=8 --group 'echo "Thing: {}"'

The command above will run at most eight concurrent processes and group the output of each of the processes when the entire thing completes, simple and in this case not too much different than running with xargs

With some simple Python scripting, Parallel becomes infinitely more useful:

python generatelist.py | parallel --max-procs=8 --group 'wget "{}" -O - | python processpage.py'

There’s not really a whole lot say about GNU/Parallel other than you should use it. I find myself increasingly impatient when a single process takes longer than a couple minutes to complete, so I’ve been using GNU/Parallel in more and more different ways across almost all the machines that I work on to make things faster and faster. So much so that I’ve started to pine for a quad-core notebook instead of this weak dual core Thinkpad of mine :)

GNU/Parallel Demo

</param></param></param></embed>
Read more →

Experimenting with reddit's self-serve ads

A couple weeks ago I decided to try out reddit’s self-serve advertising system for one of our products at Apture: the Apture Highlights browser extension. While I am an Apture employee, I’ve also turned into a rabid user of our browser plugin while browsing the web, I’ve found it to be perfect at answering a number of quick questions like “what does this word mean?” or “who the hell is this?” In a mix of curiosity regarding reddit’s advertising system and advocacy for our browser extension, I decided to run a trial campaign on reddit.

Looking up 'Voyager' with Apture

If you’ve not been exposed to reddit’s self-serve advertising platform, here’s a quick overview. The entire system is bid-based, with minimum bids starting at 20 USD a day. Ads are created by users (like me) and submitted for approval with tentative dates. Once the ad is approved by reddit, it is scheduled to run on a particular day. From my understanding of the system, the number of impressions given to your advertisement is based on your bid and the demand for ad impressions on the given day. On top of this basic structure, you can run advertisements “targeted” to a specific subreddit or reddit-wide.

For the purposes of my campaign, I wanted to try both reddit-wide and targeted ads, for my targeted portion of the campaign I ran my ad for two days on the /r/todayilearned, a subreddit with nearly 80,000 subscribers who all are looking to share an interesting nugget of information that they have learned today. In addition to targeting the ad to the specific subreddit, I tried to make the copy of the advertisement as compelling as possible for my potential clickers:

Add more TIL to every thread on reddit with the Apture Highlights browser extension

(note: The acronym “TIL” generally is used as a substitute for “today I learned” in threads on reddit)

This ad ran for two days on /r/todayilearned and for one day reddit-wide, bringing my total campaign expenditure to $60. The breakdown in numbers is as follows:

Impressions (unique -> total): 21,420 -> 141,037 Clicks (unique -> total): 146 -> 157

While the click-through rate is frustratingly low, what I found astonishing was the huge disparity between unique and non-unique impressions. What that indicates to me is that readers have a tendency to refresh a page (such as the subreddit homepage) a number of times during the day.

What you cannot tell from those numbers above is how many of the clicks came from the targeted placement (/r/todayilearned) versus the reddit-wide run. When the ad ran reddit-wide it received zero-clicks, not only did the targeting to /r/todayilearned garner more repeated (non-unique) impressions, it received all of the clicks received throughout the entire campaign.

The big take-away lesson for me from this brief trial advertising on reddit was: avoid reddit-wide advertising. Finding a subreddit with a large number of passionate users isn’t that difficult, so you should be able to identify a subreddit that overlaps with your target market and advertise to them specifically. Other than that, I don’t have any great “analysis” to offer, it was an interesting experiment but not a rigorously scientific one.

If you’d like to download the CSV with the data from the campaign, you can grab that here. The columns are: date, impression_unique, impression_total, click_unique, click_total, clickrate_unique, clickrate_total.

Read more →

So. I'm married.

A few weeks ago I finally tied the knot after a rather long engagement, putting my relationship with then-fianceé into a legally binding relationship. While a wedding should hold a very special place in the bride and groom’s heart forever, I feel like it is safe to say that our wedding objectively rocked.

I don’t want to dive too much into the nitty-gritty details of the entire weekend which culminated in a great ceremony and reception at the phenomenal Madrona Manor Restaurant and Inn. The entire atmosphere, from both families having a great time together, to impeccable weather and the fantastically prepared dinner, was damned near perfect. Cue a brief slideshow of pictures taken by my good friends Dave Young and Annika Lindner:

</param> </param> </param></embed>

Now that we’re properly married, and no longer engaged, the typical annoying question has changed from “When are you getting married?” to “When are you having children?

Considering I can barely take care of our big moron of a cat, I don’t think children are in the cards anytime soon. I’m curious what milestone comes after children though, “when are you going to retire” might be next and then perhaps “when are you going to die” after that.

Either way, I think it’s safe to say, it’s all down hill from here.

Read more →

Being a Croy

The name change that I mentioned in my previous post is now official. This means I now have to update everything. I’m in for a world of hurt between the DMV, banks, brothels and strip-Parcheesi clubs.

The only thing you need to do is update your address book, lucky you! I know at least one friend of mine has, who messaged me to say:

I put your old surname in the “Maiden Name” field in Address Book. Just thought you’d want to know.

I spoke to my step-dad George on the phone immediately after the hearing was over and asked if there are “any perks to being a Croy?”

Still haven’t gotten a response to that one yet.

Read more →

What's in a name?

Tomorrow morning I will be in court, hopefully finalizing a process I started earlier this year. I will be changing my name.

When I was first considering it, I found the entire idea a bit scary. I have worked tremendously hard to make a name for myself, from my work in the open source community to conferences I’ve spoken at and interactions with numerous companies and people who have been instrumental in my whittling out a career in software engineering. I have been very particular about being referred to as “R. Tyler Ballance,” ensuring that my “self-branding” remains consistent, netting me somewhere north of 36,000 results when searching Google.

Tomorrow I intend on throwing all that out the window, there are more important things in life than Google results (as shocking as that may sound).

I’m hesitant to go too much into the motivations for the change, knowing full well that everything I publish might as well be set in stone on the internet.

Those close to me know that my parents divorced when I was young. After a particularly nasty divorce, my mother and my three sisters parted ways with my father who I have since only had sporadic contact with. After a couple dark years for my sisters and I, my mother married another Navy man, George P. Croy, III. George came into the marriage with his daughter, bringing my sister-count up to four.

Over the past fifteen years or so, I have become George’s son. Successfully exploring his emotional spectrum from tears of joy to turning him a bright crimson shade of pissed-off, never once treating me as if I were anything less than his kin. I’m convinced my attitudes towards family, women and friends not to mention my strong opinions on honor and integrity have all been heavily influenced by him

Plainly put, I would not be the man I am today without his guiding hand.

Provided everything goes well at the courthouse, I enter as R. Tyler Ballance and leave as R. Tyler Croy.

Might as well update your address books.

Read more →

Unclog the tubes; blocking detection in Eventlet

Colleagues of mine are all very familiar with my admiration of Eventlet, a Python concurrency library, built on top of greenlet, that provides lightweight “greenthreads” that naturally yield around I/O points. For me, the biggest draw of Eventlet besides its maturity, is how well it integrates with standard Python code. Any code that uses the built-in socket module can be “monkey-patched” (i.e. modified at runtime) to use the “green” version of the socket module which allows Eventlet to turn regular ol’ Python into code with asynchronous I/O.

The problem with using libraries like Eventlet, is that some Python code just blocks, meaning that code will hit an I/O point and not yield but instead block the entire process until that network operation completes.

In practical terms, imagine you have a web crawler that uses 10 “green threads”, each crawling a different site. The first greenthread (GT1) will send an HTTP request to the first site, then it will yield to GT2 and so on. If each HTTP request blocks for 100ms, that means when crawling the 10 sites, you’re going to block the whole process, preventing anything from running, for a whole second. Doesn’t sound too terrible, but imagine you’ve got 1000 greenthreads, instead of everything smoothly yielding from one thread to another the process will lock up very often resulting in painful slowdowns.

Starting with Eventlet 0.9.10 “blocking detection” code has been incorporated into Eventlet to make it far easier for developers to find these portions of code that can block the entire process. import eventlet.debug eventlet.debug.hub_blocking_detection(True)

While using the blocking detection is fairly simple, its implementation is a bit “magical” in that it’s not entirely obvious how it works. The detector is built around signals, inside of Eventlet a signal handler is set up prior to firing some code and then after said code has executed, if a certain time-threshhold has passed, an alarm is raised dumping a stack trace to the console. I’m not entirely convinced I’m explaining this appropriately so here’s some pseudo-code:

def runloop(): while True: signal.alarm(handler, 1) execute_next_block() if (time.time() - start) < resolution: clear_signal() # Clear the signal if we're less than a second, otherwise it will alarm

The blocking detection is a bit crude and can raise false positives if you have bits of code that churn the CPU for longer than a second but it has been instrumental in incorporating non-blocking DNS support into Eventlet, which was also introduced in 0.9.10 (ported over from Slide’s gogreen package).

If you are using Eventlet, I highly recommend running your code periodically with blocking detection enabled, it is an invaluable tool for determining whether you’re running as fast and as asynchronous as possible. In my case, it has been the difference between web services that are fast in development but slow under heavy stress, and web services that are fast always regardless of load.

Read more →

Paw paw?

I feel like I’m slowly starting to blog like @cansar with just excerpts of other stuff that other people have said on the internet, so this is the last non-technical post for a little bit, promise.

This thread on reddit just about made my morning, well, in addition to that delicious peach I ate.

The mere thought of my own grandfather on reddit or any other online community I frequent is a pretty big stretch, but to have him be a notable member of the community is unfathomable (not to mention, run a part of it like r/mayonnaise).

I suggest you read the whole thread and enjoy a hearty belly laugh, only so long as you’re not doing anything important like driving a bus or performing a colonoscopy.


Updated: As with most things, too good to be true. Although, I must say one of the most well done trolling performances I’ve seen yet. I remain unrepentant in my enjoying of a good belly laugh however

Read more →