Screaming in the Cloud

2026-02-13T00:00:00+00:00

One of the reasons I work where I work is because of the fascinating data-at-scale problems that they have. This has led me deep into the world of Delta Lake and AWS S3. Not one to take anything too seriously, I have been cooking up absolutely bonkers solutions to some of these billions-scale challenges I am tasked with solving.

Recently I was fortunate enough to discuss some of the objectively insane ideas with an old PuppetConf pal Corey Quinn.

In this post I wrote about the design of Content Crush and how Scribd is consolidating objects in S3 to minimize our costs.

Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don’t work anymore. Tyler shares how with this much data, you can’t just throw money at the problem, but rather you have to engineer your way out.

For better or worse I have been so much fun coming up with crazy data solutions during the day, that I also am doing it on nights and weekends with my consultancy Buoyant Data.

In the coming months I’m expecting to have some more time free up, so I’m hoping to find another couple clients who need some AWS and data expertise to spice up their infrastructure! You can find me at rtyler@buoyantdata.com for that type of thing, but if you just want to share your own crazy ideas with me, or commiserate with me about S3, you can find me at rtyler@brokenco.de.

On Data Engineering Central

2026-02-04T00:00:00+00:00

I was lucky enough to record a podcast episode with Daniel Beach of Data Engineering Central. Daniel and I have known each other for a couple years sharing notes and ideas on the state of the ecosystem, where it falls down, and where things are getting interesting.

In my opinion Data Engineering Central has been one of the most useful broad-ranged surveys of the ecosystem, curated by one crazy mid-westerner: Daniel. He pulls no punches and while we share criticisms of AI in the industry and commercial tools, Daniel’s honesty also has put some of my work on blast, such as this post about some terrible user-experience and lopsided Delta Lake support in delta-rs.

In his post Daniel highlights some of the topics we got into during our time chatting:

What the Lakehouse architecture gets right—and where it still falls short

Why multimodal data (text, images, audio, video, embeddings) changes everything

How open table formats like Delta Lake fit into the next generation of data platforms

The growing gap between data tooling hype and day-to-day data engineering reality

What skills and architectural thinking will matter most for data engineers over the next decade

I encourage you to subscribe to his newsletter or if that’s not your jam, you can subscribe to the RSS feed too.