Nothing quite feels like “I made it!” like being published. Which is why I am thrilled to share that Delta Lake: The Definitive Guide is available for purchase, and I kind of helped! I wanted to share a little bit about how my contributions (Chapter 6!) came about, because my entrance into the Delta Lake ecosystem was about as unplanned as my authorship of part of this wonderful book.
Howdy!
Welcome to my blog where I write about software
development
, cycling, and other random nonsense. This is not
the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.
Data and AI Summit 2024 presentations
This year has been so jam packed full of activities that I forgot to share some videos from Data and AI Summit 2024 this past summer! The annual conference hosted by Databricks has become one of my favorites to meet with other Delta Lake users and developers to discuss the future of large-scale data ingestion and processing. This year however, I overdid it a little bit.
Always dig deeper into the error
The staggering complexity of modern software makes it impossible for us to truly understand what is happening while our code runs, but when it fails there is always something we can learn. At the beginning of my career we, the industry, generally understood that programs were getting complex. Without hesitation we made things more complex, more distributed, and somehow more coupled. Failure is a “learning opportunity”, and those opportunities are in abundance.
Who is "R Tyler Croy"
I asked a large language model this question:
Mr. Sas, here to save the day.
I have always been a technology scavenger, picking up cheap or disused computers for parts or tinkering. Last year when I picked up a full-height server cabinet, a new world of rack-mountable junk finally became possible! One lucky Craigslist find ended up being an older 2U IBM xSeries server with 8 drive sleds that was described as “sorta working” by the owner, who was shedding some extra stuff for his move across the country. I accepted the challenge, forked over a few Jacksons, and brought the machine home.
A large language model is not a good co-pilot
Large language models (LLMs) seem to only be good at two things: summarizing text and making up bullshit. The idea that a general purpose LLM is going to herald a new age of software development efficiency is misleading in most cases bordering on malicious. While there are a number of other recommendations or predictive machine learning models which can improve software development efficiency, LLMs propensity to generate bullshit undermines trust in a way that makes me question their validity at baseline as a software development tool.
Improving lock performance for delta-rs
I have had the good fortune this year to help a number of organizations develop and deploy native data applications in Python and Rust using a project I helped found: delta-rs. At a high level delta-rs is a Rust implementation of the Delta Lake protocol which offers ACID-like transactions for data lake use-cases. One of the big areas of my focus has been in evaluating and improving performance in highly concurrent runtime environments on AWS.
Solving a FreeBSD Jails issue: interface already exists
For a long time after I rebuilt my jails host, I could not restart a certain number of jails due to an “interface already exists” error. For the life of me I could not make sense of it, The services running in the jails were useful but not required so I put off tinkering with it. I thought that I would magically stumble into the solution in my sleep or something equally silly.
Hashicorp Nomad, almost but not quite good
My home office has grown in size and for the first time in decades I believe I have a surplus of compute power at my disposal. These computational resources are not in the form of some big beefy machine but a number of smaller machines all tied together by a gigabit network hiding away in a server cabinet. The big problem has become how to effectively utilize all that computational power, I turned to Nomad to orchestrate arbitrary workloads on static and ephemeral (netboot) machines. As the title would suggest, it’s almost good but it still falls frustratingly short for my use-cases.
Why we re-export symbols from other libraries in Rust
Dependency management in the Rust ecosystem is fairly mature from my perspective, with crates.io, Cargo, and some cultural norms around semantic versions, I feel safer with dependencies in Rust than I have in previous toolchains. It’s far from perfect however, and this question helps highlight one of the quirks of how Rust dependency management does or does not work, depending on your perspective: