Working over SSH on a big beefy remote machine is a great way to extend the
life of any laptop, but presents challenges when developing network-based
services. Fortunately OpenSSH has a “port forwarding” feature which has been around for a number of
years. Port forwarding allows the user to tunnel a port from the remote machine back to
their local machine, in essence allowing you to access a remote service bound to port 8000 on your own localhost:8000. When I first learned about this, I would fiddle around
with my ssh invocations or hardcode a list of potential ports forwarded in
my ~/.ssh/config. If I was working on a new service that needed a port not yet forwarded, I would disconnect, add it to the list of ports in my config file, and then reconnect. That was until my pal nibz
(nibalizer) showed me how to dynamically add
port forwards to an already existing session.
Howdy!
Welcome to my blog where I write about software
development, cycling, and other random nonsense. This is not
the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.
Generating pre-signed S3 URLs in Rust
Creating Pre-signed S3 URLs in Rust took me a little more brainpower than I had anticipated, so I thought I would share how to generate them using Rusoto. Pre-signed URLs allow the creation of purpose built URLs for fetching or uploading objects to S3, and can be especially useful when granting access to S3 objects to mobile or web clients. In my use-case, I wanted the clients of my web service to be able to access some specific objects from a bucket.
Why build a native interface to Delta Lake
Investing in the development of delta-rs is one of the longer shots I have taken recently and with my upcoming Data and AI Summit talk on the subject, I wanted to share some back story on the project. As I have mentioned before Delta Lake is a key component of Scribd’s data platform. We selected Delta Lake for many reasons, including that it is a open and vendor neutral project. The power of Delta Lake has opened up countless opportunities for data and for the past year I have seen the potential for many more.
Recovering from disasters with Delta Lake
Entering into the data platform space with a lot of experience in more
traditional production operations is a lot of fun, especially when you ask
questions like “what if X goes horribly wrong?” My favorite scenario to
consider is: “how much damage could one accidentally cause with our existing
policies and controls?” At Scribd we have made
Delta Lake a cornerstone of our data platform, and as such
I’ve spent a lot of time thinking about what could go wrong and how we would
defend against it.
Understanding big data partitioning
Data partitioning is one of the principles to utilize when developing large data sets, but do you know what that actually means for the storage format? I didn’t! Many “big data” storage systems such as HDFS, S3, and Azure Data Lake Storage all are effectively a file system. This past year or so, I’ve become much more familiar with Delta Lake and kind of just assumed that data partitioning was something being done at the transaction log level. Turns out I guessed wrong.
Building a goede search engine
This weekend I finally got around to building a little Rust “full text search engine” based on the educational post written by my Scribd colleague Bart: titled Building a full-text search engine in 150 lines of Python code. Bart did a great job writing an accessible post which introduced some common search concepts using Python, my objective wasn’t necessarily to write something faster or better but to use the exercise as Rust practice. My day job is no longer writing code so the opportunity for a problem with fixed scope which would work out my Rust muscles was too good to pass up. In this post I want to share some things which I’ve learned in the process of duplicating Bart’s work.
Subscribe to my "Podcast Picks"
I am have always been a fan of podcasts, but have never had really any good way to share the interesting things I am listening to. A couple weeks ago I struck upon an idea that seems so bafflingly simple in retrospect: I could just host my own podcast feed.
Software-defined networks with FreeBSD Jails
As a comprehensive operating system FreeBSD never ceases to impress me, the
recent iterations of FreeBSD
Jails as an example have been an
absolute joy to use. The introduction of the
vnet(9)
network subsystem has completely transformed what I had originally thought
about software-defined networking. My previous exposure to the concept of
software-defined
networking was
through both OpenStack and Docker, two very
different approaches to the broad domain of “SDN”. FreeBSD’s vnet system has
resonated most strongly with me and has allowed me some measure of success in
deploying real production-grade virtualized networks.
Dynamically adding parameters in sqlx
Bridging data types between the database and a programming language is such a
foundational feature of most database-backed applications that many developers
overlook it, until it doesn’t work. For many of my Rust-based applications I
have been enjoying sqlx which strikes
the right balance between “too close to the database”, working with raw cursors
and buckets of bytes, and “too close to the programming language”, magic object
relational mappings. It reminds me a lot of what I wanted Ruby Object
Mapper to be back when it was called “data mapper.” sqlx
can do many things, but it’s not a silver bullet and it errs on the side of
“less magic” in many cases, which leaves the developer to deal with some
trade-offs. Recently I found myself with just such a trade-off: mapping a Uuid such that I could do IN queries.
Thoughts on WebTorrent
WebTorrent is one of the most novel uses of some modern browser technologies that I have recently learned about. Using WebRTC is able to implement a truly peer-to-peer data transport on top of support offered by existing browsers. I came across WebTorrent when I was doing some research on what potential future options might exist for more scalable distribution of free and open source libraries and applications. In this post, I want to share some thoughts and observations I jotted down while considering WebTorrent.
Technically I'm microblogging now.
I am a big fan of the open web and although I have enjoyed Twitter the platform has regressed in dramatic form and function since I first adopted it. I remember Twitter actively avoided building a walled garden with fantastic APIs and RSS feeds open to the public. Much of the popularity of the platform hinged upon the incredible third party applications and integrations developers like me built in the first five-ish years of its existence. Over time the site has strayed from open APIs and standards, and while I still enjoy Twitter, I want some more flexibility which is why you can now subscribe to my microblog with any RSS-capable client.
Synchronizing notes with Nextcloud and Vimwiki
The quantity of things I need to keep track of or be responsible for has
exploded in the past few years, so much so that I have had to really focus on
organizing my “personal knowledgebase.” When I originally tried to spend some
time improving my information management system, I found numerous different
services offering to improve my productivity and to help me keep track of
everything. Invariably many of these tools were web apps. In order to quickly and
productively work with information, a <textarea/> in a web page is the choice
of just about last resort. I recently revisited
Vimwiki and have been quite satisfied both by
my productivity boost and the benefits that come with having raw
text to work with. The best benefit: easy synchronization of notes with Nextcloud.
Reverse proxying a Tide application with Nginx
Every now and again I’ll encounter a silly problem, fix it, forget about it, and then later run into the exact same problem again. Today’s example is a confusing error I encountered when reverse-proxying a Tide application with Nginx. In the Tide application, I was greeted with an ever-so-descriptive error:
Multiple Let's Encrypt domains in a single Nginx server block
Nginx is a fantastic web server and reverse proxy to use
with Let’s Encrypt, but when dealing with multiple
domains it can be a bit tedious to configure. I have been moving services into
more FreeBSD jails as I alluded to in my previous
post, among them the
general Nginx proxy jail which I have serving my HTTP-based services. Using
Let’s Encrypt for TLS, I found myself declaring multiple server blocks inside
my virtual host configurations to handle the apex domain (e.g.
dotdotvote.com), the www subdomain, and vanity domains (e.g.
dotdot.vote). With the help Membear and MTecknology in the #nginx
channel on Freenode, I was able to refactor multiple
largely redundant server blocks into one.
Using FreeBSD's pkg(1) with an 'offline' jail
In the modern era of highly connected software, I have been trying to “offline” as many of my personal services as I can. The ideal scenario being a service running in an environment where it cannot reach other nodes on the network, or in some cases even route back to the public internet. To accomplish this I have been working with FreeBSD jails a quite a bit, creating a service per-jail in hopes of achieving high levels of isolation between them. This approach has a pretty notable problem at first glance: if you need to install software from remote sources in the jail, how do you keep it “offline”?
Loving the PinePower
My current available working space is at an all time low which has made the dimensions of everything around me much more important. While I can never become one of those extreme minimalists that works with only their laptop on a park bench, next to their camper van (or whatever), I have been pushing myself to become more space-efficient with my electronics. This includes how they all are powered, so when I learned about the PinePower device, I ordered it immediately.
Intentionally leaking AWS keys
“Never check secrets into source control” is one of those rules that are 100% correct, until it’s not. There are no universal laws in software, and recently I had a reason to break this one. I checked AWS keys into a Git repository. I then pushed those commits to a public repository on GitHub. I did this intentionally, and lived to tell the tale. You almost certainly should never do this, so I thought I would share what happens when you do.
Corporate dependence in free and open source projects
The relationship between most open source developers and corporations engaging in open source work is rife with paradoxes. Developers want to be paid for their work, but when a company hires too many developers for a project, others clutch their pearls and grow concerned that the company is “taking over the project.” Large projects have significant expenses, but when companies join foundations established to help secure those funds, they may also be admonished for “not really contributing to the project.” If a company creates and opens up a new technology, users and developers inevitably come to assume that the company should be perpetually responsible for the on-going development, improvement, and maintenance of the project, to do otherwise would be “betraying the open source userbase.”
Finally a successful winter garden
Of all the bizarre things to have happened in 2020, my winter garden may be one of the more benign occurrences. I started gardening seven or eight years ago in Berkeley. The long backyard with excellent sunlight rewarded me with incredible tomato harvests summer after summer. Autumn became the time when everything would get thrashed or covered up to lie fallow through the wet winter months in Northern California. After moving to Santa Rosa, my gardening became much more serious but still packed it all in around October/November. The last few winter seasons I have tried a winter garden with little success, but this year the winter garden is astounding.
Parsing Jenkins Pipeline without Jenkins
Writing and locally verifying a CI/CD pipeline is a challenge thousands of
developers face, which I’m hoping to make a little bit easier with a new tool
named: Jenkins Declarative Parser (jdp).
Jenkins Pipeline is one of the most important advancements made in the last 10
years for Jenkins, it can however behave like a frustrating black box for many
new and experienced Jenkins users. The goal with jdp is to provide a
lightweight and easy to run utility and library for validating declarative
Jenkinsfiles.