Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

Dynamically forwarding SSH ports

Working over SSH on a big beefy remote machine is a great way to extend the life of any laptop, but presents challenges when developing network-based services. Fortunately OpenSSH has a “port forwarding” feature which has been around for a number of years. Port forwarding allows the user to tunnel a port from the remote machine back to their local machine, in essence allowing you to access a remote service bound to port 8000 on your own localhost:8000. When I first learned about this, I would fiddle around with my ssh invocations or hardcode a list of potential ports forwarded in my ~/.ssh/config. If I was working on a new service that needed a port not yet forwarded, I would disconnect, add it to the list of ports in my config file, and then reconnect. That was until my pal nibz (nibalizer) showed me how to dynamically add port forwards to an already existing session.

Read more →

Generating pre-signed S3 URLs in Rust

Creating Pre-signed S3 URLs in Rust took me a little more brainpower than I had anticipated, so I thought I would share how to generate them using Rusoto. Pre-signed URLs allow the creation of purpose built URLs for fetching or uploading objects to S3, and can be especially useful when granting access to S3 objects to mobile or web clients. In my use-case, I wanted the clients of my web service to be able to access some specific objects from a bucket.

Read more →

Why build a native interface to Delta Lake

Investing in the development of delta-rs is one of the longer shots I have taken recently and with my upcoming Data and AI Summit talk on the subject, I wanted to share some back story on the project. As I have mentioned before Delta Lake is a key component of Scribd’s data platform. We selected Delta Lake for many reasons, including that it is a open and vendor neutral project. The power of Delta Lake has opened up countless opportunities for data and for the past year I have seen the potential for many more.

Read more →

Recovering from disasters with Delta Lake

Entering into the data platform space with a lot of experience in more traditional production operations is a lot of fun, especially when you ask questions like “what if X goes horribly wrong?” My favorite scenario to consider is: “how much damage could one accidentally cause with our existing policies and controls?” At Scribd we have made Delta Lake a cornerstone of our data platform, and as such I’ve spent a lot of time thinking about what could go wrong and how we would defend against it.

Read more →

Understanding big data partitioning

Data partitioning is one of the principles to utilize when developing large data sets, but do you know what that actually means for the storage format? I didn’t! Many “big data” storage systems such as HDFS, S3, and Azure Data Lake Storage all are effectively a file system. This past year or so, I’ve become much more familiar with Delta Lake and kind of just assumed that data partitioning was something being done at the transaction log level. Turns out I guessed wrong.

Read more →

Building a goede search engine

This weekend I finally got around to building a little Rust “full text search engine” based on the educational post written by my Scribd colleague Bart: titled Building a full-text search engine in 150 lines of Python code. Bart did a great job writing an accessible post which introduced some common search concepts using Python, my objective wasn’t necessarily to write something faster or better but to use the exercise as Rust practice. My day job is no longer writing code so the opportunity for a problem with fixed scope which would work out my Rust muscles was too good to pass up. In this post I want to share some things which I’ve learned in the process of duplicating Bart’s work.

Read more →

Subscribe to my "Podcast Picks"

I am have always been a fan of podcasts, but have never had really any good way to share the interesting things I am listening to. A couple weeks ago I struck upon an idea that seems so bafflingly simple in retrospect: I could just host my own podcast feed.

Read more →

Software-defined networks with FreeBSD Jails

As a comprehensive operating system FreeBSD never ceases to impress me, the recent iterations of FreeBSD Jails as an example have been an absolute joy to use. The introduction of the vnet(9) network subsystem has completely transformed what I had originally thought about software-defined networking. My previous exposure to the concept of software-defined networking was through both OpenStack and Docker, two very different approaches to the broad domain of “SDN”. FreeBSD’s vnet system has resonated most strongly with me and has allowed me some measure of success in deploying real production-grade virtualized networks.

Read more →

Dynamically adding parameters in sqlx

Bridging data types between the database and a programming language is such a foundational feature of most database-backed applications that many developers overlook it, until it doesn’t work. For many of my Rust-based applications I have been enjoying sqlx which strikes the right balance between “too close to the database”, working with raw cursors and buckets of bytes, and “too close to the programming language”, magic object relational mappings. It reminds me a lot of what I wanted Ruby Object Mapper to be back when it was called “data mapper.” sqlx can do many things, but it’s not a silver bullet and it errs on the side of “less magic” in many cases, which leaves the developer to deal with some trade-offs. Recently I found myself with just such a trade-off: mapping a Uuid such that I could do IN queries.

Read more →

Thoughts on WebTorrent

WebTorrent is one of the most novel uses of some modern browser technologies that I have recently learned about. Using WebRTC is able to implement a truly peer-to-peer data transport on top of support offered by existing browsers. I came across WebTorrent when I was doing some research on what potential future options might exist for more scalable distribution of free and open source libraries and applications. In this post, I want to share some thoughts and observations I jotted down while considering WebTorrent.

Read more →

Technically I'm microblogging now.

I am a big fan of the open web and although I have enjoyed Twitter the platform has regressed in dramatic form and function since I first adopted it. I remember Twitter actively avoided building a walled garden with fantastic APIs and RSS feeds open to the public. Much of the popularity of the platform hinged upon the incredible third party applications and integrations developers like me built in the first five-ish years of its existence. Over time the site has strayed from open APIs and standards, and while I still enjoy Twitter, I want some more flexibility which is why you can now subscribe to my microblog with any RSS-capable client.

Read more →

Synchronizing notes with Nextcloud and Vimwiki

The quantity of things I need to keep track of or be responsible for has exploded in the past few years, so much so that I have had to really focus on organizing my “personal knowledgebase.” When I originally tried to spend some time improving my information management system, I found numerous different services offering to improve my productivity and to help me keep track of everything. Invariably many of these tools were web apps. In order to quickly and productively work with information, a <textarea/> in a web page is the choice of just about last resort. I recently revisited Vimwiki and have been quite satisfied both by my productivity boost and the benefits that come with having raw text to work with. The best benefit: easy synchronization of notes with Nextcloud.

Read more →

Reverse proxying a Tide application with Nginx

Every now and again I’ll encounter a silly problem, fix it, forget about it, and then later run into the exact same problem again. Today’s example is a confusing error I encountered when reverse-proxying a Tide application with Nginx. In the Tide application, I was greeted with an ever-so-descriptive error:

Read more →

Multiple Let's Encrypt domains in a single Nginx server block

Nginx is a fantastic web server and reverse proxy to use with Let’s Encrypt, but when dealing with multiple domains it can be a bit tedious to configure. I have been moving services into more FreeBSD jails as I alluded to in my previous post, among them the general Nginx proxy jail which I have serving my HTTP-based services. Using Let’s Encrypt for TLS, I found myself declaring multiple server blocks inside my virtual host configurations to handle the apex domain (e.g. dotdotvote.com), the www subdomain, and vanity domains (e.g. dotdot.vote). With the help Membear and MTecknology in the #nginx channel on Freenode, I was able to refactor multiple largely redundant server blocks into one.

Read more →

Using FreeBSD's pkg(1) with an 'offline' jail

In the modern era of highly connected software, I have been trying to “offline” as many of my personal services as I can. The ideal scenario being a service running in an environment where it cannot reach other nodes on the network, or in some cases even route back to the public internet. To accomplish this I have been working with FreeBSD jails a quite a bit, creating a service per-jail in hopes of achieving high levels of isolation between them. This approach has a pretty notable problem at first glance: if you need to install software from remote sources in the jail, how do you keep it “offline”?

Read more →

Loving the PinePower

My current available working space is at an all time low which has made the dimensions of everything around me much more important. While I can never become one of those extreme minimalists that works with only their laptop on a park bench, next to their camper van (or whatever), I have been pushing myself to become more space-efficient with my electronics. This includes how they all are powered, so when I learned about the PinePower device, I ordered it immediately.

Read more →

Intentionally leaking AWS keys

“Never check secrets into source control” is one of those rules that are 100% correct, until it’s not. There are no universal laws in software, and recently I had a reason to break this one. I checked AWS keys into a Git repository. I then pushed those commits to a public repository on GitHub. I did this intentionally, and lived to tell the tale. You almost certainly should never do this, so I thought I would share what happens when you do.

Read more →

Corporate dependence in free and open source projects

The relationship between most open source developers and corporations engaging in open source work is rife with paradoxes. Developers want to be paid for their work, but when a company hires too many developers for a project, others clutch their pearls and grow concerned that the company is “taking over the project.” Large projects have significant expenses, but when companies join foundations established to help secure those funds, they may also be admonished for “not really contributing to the project.” If a company creates and opens up a new technology, users and developers inevitably come to assume that the company should be perpetually responsible for the on-going development, improvement, and maintenance of the project, to do otherwise would be “betraying the open source userbase.”

Read more →

Finally a successful winter garden

Of all the bizarre things to have happened in 2020, my winter garden may be one of the more benign occurrences. I started gardening seven or eight years ago in Berkeley. The long backyard with excellent sunlight rewarded me with incredible tomato harvests summer after summer. Autumn became the time when everything would get thrashed or covered up to lie fallow through the wet winter months in Northern California. After moving to Santa Rosa, my gardening became much more serious but still packed it all in around October/November. The last few winter seasons I have tried a winter garden with little success, but this year the winter garden is astounding.

Read more →

Parsing Jenkins Pipeline without Jenkins

Writing and locally verifying a CI/CD pipeline is a challenge thousands of developers face, which I’m hoping to make a little bit easier with a new tool named: Jenkins Declarative Parser (jdp). Jenkins Pipeline is one of the most important advancements made in the last 10 years for Jenkins, it can however behave like a frustrating black box for many new and experienced Jenkins users. The goal with jdp is to provide a lightweight and easy to run utility and library for validating declarative Jenkinsfiles.

Read more →