<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//feed/by_tag/dataengineering.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-04-30T12:21:39+00:00</updated><id>https://brokenco.de//feed/by_tag/dataengineering.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">2026 April: Recently Studied Stuff</title><link href="https://brokenco.de//2026/04/30/fresh-from-rss.html" rel="alternate" type="text/html" title="2026 April: Recently Studied Stuff" /><published>2026-04-30T00:00:00+00:00</published><updated>2026-04-30T00:00:00+00:00</updated><id>https://brokenco.de//2026/04/30/fresh-from-rss</id><content type="html" xml:base="https://brokenco.de//2026/04/30/fresh-from-rss.html"><![CDATA[<p>Similar to last month I have given more intention to some of the interesting
things that I have stumbled across in my feed reader or the fediverse. Rather
than just a quip, boost, or reply, I have wanted to consolidate these thoughts
with more permanance here to my blog.</p>

<p>Chris’ talk below at <a href="https://northbaypython.org/">North Bay Python</a> was, as
his always are, well-delivered and worth consideration.</p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/d7AeWFbOTHg?si=zW0bHhRpj--dsrdW" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe></center>

<p>The conclusion that he
draws towards the end is similar to something I was <a href="/2025/09/20/sacrificing-the-understanding.html">noodling last
year</a>:</p>

<blockquote>
  <p>At some point somebody, somewhere, is going to have to actually understand
how things work.</p>
</blockquote>

<p>Chris makes the point, as he typically does, much more thoughtfully and with a
stronger philosophical base.</p>

<hr />

<p>Had some discussions with the <a href="https://github.com/delta-io/delta-kernel-rs">delta-kernel-rs</a> developers after they mistakenly added a <em>ton</em> of new files to <code class="language-plaintext highlighter-rouge">tests/</code> blowing up test cycle times. Another community member shared <a href="https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html">this great overview</a> about <strong>not</strong> using Cargo integration tests.</p>

<hr />

<p>Catching up on <a href="https://open.substack.com/pub/dataengineeringcentral/p/revisiting-data-quality?utm_source=share&amp;utm_medium=android&amp;r=cxg56">Daniel’s thoughts on Data
Quality</a>
and reconsidering the domain. The generation of slop has resulted in renewed
discussions of “but how do we ensure correctness?” which is a great question to
be trying to answer, but I am still rather disappointed with the state of the
art for data quality tooling.</p>

<hr />

<p>I recommend <a href="https://etbe.coker.com.au/2026/03/29/communication-hostile-ais/">this blog
post</a> which
has some good citations for negative AI behaviors affecting free and open
source communities.</p>

<blockquote>
  <p>This is going to be a difficult problem to solve, more difficult than the
email spam problem we have been unable to solve after 30
years of working on it.</p>

  <p>This is also a very important problem, we are currently in an age where we have
access to information that most people couldn’t even dream of 30 years ago. We
also have disinformation that combines some of the worst aspects of
authoritarian regimes throughout history combined with the worst aspects of
cult brainwashing. If we lose access to the information but the disinformation
remains (or get worse) then the result will be terrible.</p>
</blockquote>

<hr />

<p>I really enjoy <a href="https://planet.debian.org">Planet Debian</a> as an aggregator of an international set of voices from the Debian community. I get exposed to so many different view points from around the free software ecosystem, which I really value. This past week I read 
<a href="https://blog.bofh.it/debian/id_473">this blog post</a> by a debian maintainer which I was so flummoxed by I <a href="/2026/03/25/do-not-comply.html">wrote out my thoughts on the topic here</a></p>

<hr />

<p>Streaming tar over SSH is one of the more novel Unix tricks I don’t get to use
much anymore. <a href="https://drewdevault.com/2026/03/28/2026-03-28-rsync-without-rsync.html">Drew
Devault</a>
shared some helpful tips for using it without needing to use incantations of
<code class="language-plaintext highlighter-rouge">rsync(1)</code>.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="deltalake" /><category term="data" /><category term="dataengineering" /><category term="opensource" /><summary type="html"><![CDATA[Similar to last month I have given more intention to some of the interesting things that I have stumbled across in my feed reader or the fediverse. Rather than just a quip, boost, or reply, I have wanted to consolidate these thoughts with more permanance here to my blog.]]></summary></entry></feed>