Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Recovering from disasters with Delta Lake

Entering into the data platform space with a lot of experience in more traditional production operations is a lot of fun, especially when you ask questions like “what if X goes horribly wrong?” My favorite scenario to consider is: “how much damage could one accidentally cause with our existing policies and controls?” At Scribd we have made Delta Lake a cornerstone of our data platform, and as such I’ve spent a lot of time thinking about what could go wrong and how we would defend against it.

Read more →

Understanding big data partitioning

Data partitioning is one of the principles to utilize when developing large data sets, but do you know what that actually means for the storage format? I didn’t! Many “big data” storage systems such as HDFS, S3, and Azure Data Lake Storage all are effectively a file system. This past year or so, I’ve become much more familiar with Delta Lake and kind of just assumed that data partitioning was something being done at the transaction log level. Turns out I guessed wrong.

Read more →

Building a goede search engine

This weekend I finally got around to building a little Rust “full text search engine” based on the educational post written by my Scribd colleague Bart: titled Building a full-text search engine in 150 lines of Python code. Bart did a great job writing an accessible post which introduced some common search concepts using Python, my objective wasn’t necessarily to write something faster or better but to use the exercise as Rust practice. My day job is no longer writing code so the opportunity for a problem with fixed scope which would work out my Rust muscles was too good to pass up. In this post I want to share some things which I’ve learned in the process of duplicating Bart’s work.

Read more →

Subscribe to my "Podcast Picks"

I am have always been a fan of podcasts, but have never had really any good way to share the interesting things I am listening to. A couple weeks ago I struck upon an idea that seems so bafflingly simple in retrospect: I could just host my own podcast feed.

Read more →

Software-defined networks with FreeBSD Jails

As a comprehensive operating system FreeBSD never ceases to impress me, the recent iterations of FreeBSD Jails as an example have been an absolute joy to use. The introduction of the vnet(9) network subsystem has completely transformed what I had originally thought about software-defined networking. My previous exposure to the concept of software-defined networking was through both OpenStack and Docker, two very different approaches to the broad domain of “SDN”. FreeBSD’s vnet system has resonated most strongly with me and has allowed me some measure of success in deploying real production-grade virtualized networks.

Read more →

Dynamically adding parameters in sqlx

Bridging data types between the database and a programming language is such a foundational feature of most database-backed applications that many developers overlook it, until it doesn’t work. For many of my Rust-based applications I have been enjoying sqlx which strikes the right balance between “too close to the database”, working with raw cursors and buckets of bytes, and “too close to the programming language”, magic object relational mappings. It reminds me a lot of what I wanted Ruby Object Mapper to be back when it was called “data mapper.” sqlx can do many things, but it’s not a silver bullet and it errs on the side of “less magic” in many cases, which leaves the developer to deal with some trade-offs. Recently I found myself with just such a trade-off: mapping a Uuid such that I could do IN queries.

Read more →

Thoughts on WebTorrent

WebTorrent is one of the most novel uses of some modern browser technologies that I have recently learned about. Using WebRTC is able to implement a truly peer-to-peer data transport on top of support offered by existing browsers. I came across WebTorrent when I was doing some research on what potential future options might exist for more scalable distribution of free and open source libraries and applications. In this post, I want to share some thoughts and observations I jotted down while considering WebTorrent.

Read more →

Technically I'm microblogging now.

I am a big fan of the open web and although I have enjoyed Twitter the platform has regressed in dramatic form and function since I first adopted it. I remember Twitter actively avoided building a walled garden with fantastic APIs and RSS feeds open to the public. Much of the popularity of the platform hinged upon the incredible third party applications and integrations developers like me built in the first five-ish years of its existence. Over time the site has strayed from open APIs and standards, and while I still enjoy Twitter, I want some more flexibility which is why you can now subscribe to my microblog with any RSS-capable client.

Read more →

Synchronizing notes with Nextcloud and Vimwiki

The quantity of things I need to keep track of or be responsible for has exploded in the past few years, so much so that I have had to really focus on organizing my “personal knowledgebase.” When I originally tried to spend some time improving my information management system, I found numerous different services offering to improve my productivity and to help me keep track of everything. Invariably many of these tools were web apps. In order to quickly and productively work with information, a <textarea/> in a web page is the choice of just about last resort. I recently revisited Vimwiki and have been quite satisfied both by my productivity boost and the benefits that come with having raw text to work with. The best benefit: easy synchronization of notes with Nextcloud.

Read more →

Reverse proxying a Tide application with Nginx

Every now and again I’ll encounter a silly problem, fix it, forget about it, and then later run into the exact same problem again. Today’s example is a confusing error I encountered when reverse-proxying a Tide application with Nginx. In the Tide application, I was greeted with an ever-so-descriptive error:

Read more →