<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//feed/by_tag/podcast.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-04-12T21:39:52+00:00</updated><id>https://brokenco.de//feed/by_tag/podcast.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">Screaming in the Cloud</title><link href="https://brokenco.de//2026/02/13/screaming-in-the-cloud.html" rel="alternate" type="text/html" title="Screaming in the Cloud" /><published>2026-02-13T00:00:00+00:00</published><updated>2026-02-13T00:00:00+00:00</updated><id>https://brokenco.de//2026/02/13/screaming-in-the-cloud</id><content type="html" xml:base="https://brokenco.de//2026/02/13/screaming-in-the-cloud.html"><![CDATA[<p>One of the reasons I work where I work is because of the fascinating
data-at-scale problems that they have. This has led me deep into the world of
<a href="https://delta.io">Delta Lake</a> and AWS S3.  Not one to take anything too
seriously, I have been cooking up absolutely bonkers solutions to some of these
<em>billions-scale</em> challenges I am tasked with solving.</p>

<p>Recently I was fortunate enough to discuss some of the objectively insane ideas
with an old PuppetConf pal <a href="https://www.linkedin.com/in/coquinn/">Corey Quinn</a>.</p>

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/TZj38Bm1DC4?si=m_jo0HOFPHqPC--2" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p>In <a href="https://tech.scribd.com/blog/2026/content-crush.html">this post</a> I wrote
about the design of Content Crush and how Scribd is consolidating objects in S3
to minimize our costs.</p>

<p><em>Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive.
Normal solutions don’t work anymore. Tyler shares how with this much data, you
can’t just throw money at the problem, but rather you have to engineer your way
out.</em></p>

<p>For better or worse I have been so much fun coming up with crazy data solutions
during the day, that I also am doing it on nights and weekends with my
consultancy <a href="https://www.buoyantdata.com">Buoyant Data</a>.</p>

<p>In the coming months I’m expecting to have some more time free up, so I’m
hoping to find another couple clients who need some AWS and data expertise to
spice up their infrastructure! You can find me at
<a href="mailto:rtyler@buoyantdata.com">rtyler@buoyantdata.com</a> for that type of thing,
but if you just want to share your own crazy ideas with me, or commiserate with
me about S3, you can find me at
<a href="mailto:rtyler@brokenco.de">rtyler@brokenco.de</a>.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opinion" /><category term="aws" /><category term="podcast" /><summary type="html"><![CDATA[One of the reasons I work where I work is because of the fascinating data-at-scale problems that they have. This has led me deep into the world of Delta Lake and AWS S3. Not one to take anything too seriously, I have been cooking up absolutely bonkers solutions to some of these billions-scale challenges I am tasked with solving.]]></summary></entry><entry><title type="html">On Data Engineering Central</title><link href="https://brokenco.de//2026/02/04/data-engineering-central.html" rel="alternate" type="text/html" title="On Data Engineering Central" /><published>2026-02-04T00:00:00+00:00</published><updated>2026-02-04T00:00:00+00:00</updated><id>https://brokenco.de//2026/02/04/data-engineering-central</id><content type="html" xml:base="https://brokenco.de//2026/02/04/data-engineering-central.html"><![CDATA[<p>I was lucky enough to <a href="https://dataengineeringcentral.substack.com/p/the-lakehouse-architecture-multimodal">record a podcast
episode</a>
with Daniel Beach of Data Engineering Central. Daniel and I have known each
other for a couple years sharing notes and ideas on the state of the ecosystem,
where it falls down, and where things are getting interesting.</p>

<p>In my opinion <a href="https://dataengineeringcentral.substack.com">Data Engineering
Central</a> has been one of the most
useful broad-ranged surveys of the ecosystem, curated by one crazy
mid-westerner: Daniel. He pulls no punches and while we share criticisms of AI
in the industry and commercial tools, Daniel’s honesty also has put some of my
work on blast, such as <a href="https://dataengineeringcentral.substack.com/p/_internaldeltaprotocolerror">this
post</a>
about some terrible user-experience and lopsided Delta Lake support in
<a href="https://github.com/delta-io/deltars">delta-rs</a>.</p>

<p>In his post Daniel highlights some of the topics we got into during our time chatting:</p>

<blockquote>
  <ul>
    <li>What the Lakehouse architecture gets right—and where it still falls short</li>
    <li>Why multimodal data (text, images, audio, video, embeddings) changes everything</li>
    <li>How open table formats like Delta Lake fit into the next generation of data platforms</li>
    <li>The growing gap between data tooling hype and day-to-day data engineering reality</li>
    <li>What skills and architectural thinking will matter most for data engineers over the next decade</li>
  </ul>
</blockquote>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/WLlko-liHMg?si=9aGp1v-6nm2kbya0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe></center>

<p>I encourage you to <a href="https://dataengineeringcentral.substack.com/">subscribe</a> to
his newsletter or if that’s not your jam, you can <a href="https://dataengineeringcentral.substack.com/feed">subscribe to the RSS
feed</a> too.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="software" /><category term="dataeng" /><category term="buoyantdata" /><category term="databricks" /><category term="podcast" /><summary type="html"><![CDATA[I was lucky enough to record a podcast episode with Daniel Beach of Data Engineering Central. Daniel and I have known each other for a couple years sharing notes and ideas on the state of the ecosystem, where it falls down, and where things are getting interesting.]]></summary></entry></feed>