Howdy!

Welcome to my blog where I write about software development, cycling, and other random nonsense. This is not the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.

The y-axis is important

Automated anomaly detection has become invaluable in the observability tools I rely on for production systems. More advanced vendors like Datadog have introduced “AI” functionality, which is really just conventional machine-learning models for proactively identifying suspicious behavior.

Read more →

Abstraction cannot hide complexity

One of the first advanced concepts taught to people learning to code is abstraction, practicing taking logic and hiding it. Object-oriented programming articles and books frequently will encourage the reader to take logic and complexity and build simpler APIs around it. Abstracting and encapsulating that logic helps us offload cognitive burden and allows us to focus on other aspects of the program. Abstraction is useful. Abstraction is good.

Read more →

Screaming in the Cloud

One of the reasons I work where I work is because of the fascinating data-at-scale problems that they have. This has led me deep into the world of Delta Lake and AWS S3. Not one to take anything too seriously, I have been cooking up absolutely bonkers solutions to some of these billions-scale challenges I am tasked with solving.

Read more →

Decentralized

The wisdom that experience can bring means I now look at statements like “organizing the world’s information” through a very different lens. Google’s mission statement was something I did not view as coercive, because I didn’t understand the power exerted by those who collect and organize data.

Read more →

Supporting the Blue Tailed Skinks

Cycling in northern California is something special. Over the past few decades Sonoma county has produced numerous professional mountain, road, and cyclocross athletes. Cycling is an institution, everybody is riding the roads or the mountains around here. It is not uncommon to see beat-up work trucks, Rivians, sensible sedans, and everything in between parked near some of our mountain biking trails.

Read more →

Managing buffer overflows

Working in the data storage and services it can seem like everything revolves around capacity and throughput. We don’t think of throughput until it is lacking. A traffic jam, a flipped breaker, or an overflowing drain. There are architectural changes we make to improve throughput and there are tactical fixes. This post is about the tactical fixes.

Read more →

Multimodal with Delta Lake

The rate of change for data storage systems has accelerated to a frenzied pace and most storage architectures I have seen simply cannot keep up. Much of my time is spent thinking about large-scale tabular data stored in Delta Lake which is one of the “lakehouse” storage systems along with Apache Iceberg and others. These storage architectures were developed 5-10 years ago to solve problems faced moving from data warehouse architectures to massive scale structured data needs faced by many organizations. The storage changes we need today must support “multimodal data” which is a dramatic departure in many ways from the traditional query and usage patterns our existing infrastructure supports.

Read more →

The challenges facing Delta Kernel

The Delta Kernel is one of the most technically challenging and ambitious open source projects I have worked on. Kernel is fundamentally about unifying all of our needs and wants from a Delta Lake implementation into a single cohesive yet-pluggable API surface. Towards the end of 2025 TD asked me to jot down some of the issues which have been frustrating me and/or slowing down the adoption of kernel in projects like delta-rs. At the outset of the project we all discussed concerns about what could actually be possible as we set out into uncharted territory. In many ways we have succeeded, in others we have failed.

Read more →

Parallelism is a little tricky

In theory many developers understand concurrency and parallelism, in practice I think almost none of us do. At least not all the time. Building a mental model of highly parallel interdependent software is incredibly time-consuming, difficult, and error-prone. I have recently been doing a lot of performance analysis with both delta-rs and delta-kernel-rs. In the process I have had to check some of my own assumptions of how things should work compared to how they do work.

Read more →

Things you should know about Url in Rust

I would guess most developers think of URLs as a string with a https:// at the beginning. In many cases there are assumptions that are made about these URL-shaped strings which may be confusing, misleading, or flat out incorrect. The url crate is compliant to the RFCs about URLs, but while being technically correct is the best kind of correct, that doesn’t mean it still isn’t confusing.

Read more →

The end of the road for kafka-delta-ingest

After five years in production kafka-delta-ingest at Scribd has been shut off and removed from our infrastructure. kafka-delta-ingest was the motivation behind my team creating delta-rs, the most successful open source project I have started to date. With kafka-delta-ingest we achieved our original stated goals and reduced streaming data ingestion costs by 95%. In the time since however, we have further reduced that cost with even more efficient infrastructure.

Read more →

R.I.P. S3 Object Lambda

Did you know that AWS S3 is almost 20 years old? The “cloud” as a concept is fairly recent but in the time-distortion that has occurred since the rise of the internet, I think many of us have lost track of how old some of these public cloud providers are, and as a side-effect, how old their technology offerings can become. Periodically you need to clean out the attic, and this week AWS did just that with their “AWS Service Availability Updates.”

Read more →

Sacrifice to AI

What a wild time to be alive. It’s really quite something. How wonderful it is to have a phrase like “what a wild time to be alive” that could mean a dozen different moderately positive or extremely negative things depending on where in your news or social feed you find this article.

Read more →