<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//feed/by_tag/rss.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-05-03T00:12:50+00:00</updated><id>https://brokenco.de//feed/by_tag/rss.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">2026 April: Recently Studied Stuff</title><link href="https://brokenco.de//2026/04/30/fresh-from-rss.html" rel="alternate" type="text/html" title="2026 April: Recently Studied Stuff" /><published>2026-04-30T00:00:00+00:00</published><updated>2026-04-30T00:00:00+00:00</updated><id>https://brokenco.de//2026/04/30/fresh-from-rss</id><content type="html" xml:base="https://brokenco.de//2026/04/30/fresh-from-rss.html"><![CDATA[<p>Similar to last month I have given more intention to some of the interesting
things that I have stumbled across in my feed reader or the fediverse. Rather
than just a quip, boost, or reply, I have wanted to consolidate these thoughts
with more permanance here to my blog.</p>

<p>Chris’ talk below at <a href="https://northbaypython.org/">North Bay Python</a> was, as
his always are, well-delivered and worth consideration.</p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/d7AeWFbOTHg?si=zW0bHhRpj--dsrdW" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe></center>

<p>The conclusion that he
draws towards the end is similar to something I was <a href="/2025/09/20/sacrificing-the-understanding.html">noodling last
year</a>:</p>

<blockquote>
  <p>At some point somebody, somewhere, is going to have to actually understand
how things work.</p>
</blockquote>

<p>Chris makes the point, as he typically does, much more thoughtfully and with a
stronger philosophical base.</p>

<hr />

<p>Had some discussions with the <a href="https://github.com/delta-io/delta-kernel-rs">delta-kernel-rs</a> developers after they mistakenly added a <em>ton</em> of new files to <code class="language-plaintext highlighter-rouge">tests/</code> blowing up test cycle times. Another community member shared <a href="https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html">this great overview</a> about <strong>not</strong> using Cargo integration tests.</p>

<hr />

<p>Catching up on <a href="https://open.substack.com/pub/dataengineeringcentral/p/revisiting-data-quality?utm_source=share&amp;utm_medium=android&amp;r=cxg56">Daniel’s thoughts on Data
Quality</a>
and reconsidering the domain. The generation of slop has resulted in renewed
discussions of “but how do we ensure correctness?” which is a great question to
be trying to answer, but I am still rather disappointed with the state of the
art for data quality tooling.</p>

<hr />

<p>I recommend <a href="https://etbe.coker.com.au/2026/03/29/communication-hostile-ais/">this blog
post</a> which
has some good citations for negative AI behaviors affecting free and open
source communities.</p>

<blockquote>
  <p>This is going to be a difficult problem to solve, more difficult than the
email spam problem we have been unable to solve after 30
years of working on it.</p>

  <p>This is also a very important problem, we are currently in an age where we have
access to information that most people couldn’t even dream of 30 years ago. We
also have disinformation that combines some of the worst aspects of
authoritarian regimes throughout history combined with the worst aspects of
cult brainwashing. If we lose access to the information but the disinformation
remains (or get worse) then the result will be terrible.</p>
</blockquote>

<hr />

<p>I really enjoy <a href="https://planet.debian.org">Planet Debian</a> as an aggregator of an international set of voices from the Debian community. I get exposed to so many different view points from around the free software ecosystem, which I really value. This past week I read 
<a href="https://blog.bofh.it/debian/id_473">this blog post</a> by a debian maintainer which I was so flummoxed by I <a href="/2026/03/25/do-not-comply.html">wrote out my thoughts on the topic here</a></p>

<hr />

<p>Streaming tar over SSH is one of the more novel Unix tricks I don’t get to use
much anymore. <a href="https://drewdevault.com/2026/03/28/2026-03-28-rsync-without-rsync.html">Drew
Devault</a>
shared some helpful tips for using it without needing to use incantations of
<code class="language-plaintext highlighter-rouge">rsync(1)</code>.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="deltalake" /><category term="data" /><category term="dataengineering" /><category term="opensource" /><summary type="html"><![CDATA[Similar to last month I have given more intention to some of the interesting things that I have stumbled across in my feed reader or the fediverse. Rather than just a quip, boost, or reply, I have wanted to consolidate these thoughts with more permanance here to my blog.]]></summary></entry><entry><title type="html">2026 March: Recently Studied Stuff</title><link href="https://brokenco.de//2026/03/21/fresh-from-rss.html" rel="alternate" type="text/html" title="2026 March: Recently Studied Stuff" /><published>2026-03-21T00:00:00+00:00</published><updated>2026-03-21T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/21/fresh-from-rss</id><content type="html" xml:base="https://brokenco.de//2026/03/21/fresh-from-rss.html"><![CDATA[<p>Over the past week I have made a more conscious effort to keep track of some
really interesting articles that came through my feed reader. I am a big fan of
the open web and the power of RSS for disseminating interesting information
from actual people. Below are some really interesting posts I have read recently!</p>

<p><strong><a href="https://felipe.rs/2024/10/23/arrow-over-http/">Compressed Apache Arrow tables over HTTP</a></strong></p>

<p>When discussing transport protocols for sending data between services at work
recently, a colleague asked “why can’t we just yeet Arrow over HTTP?” It turns out, you <a href="https://github.com/apache/arrow-experiments/tree/main/http/get_simple/python">absolutely can</a> and Arrow IPC streams even have a registered MIME type:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Content-Type: application/vnd.apache.arrow.stream
</code></pre></div></div>

<p><strong><a href="https://blog.dataexpert.io/p/parquet-can-shrink-your-data-100x">Understanding Parquet format for beginners</a></strong></p>

<p>A great introduction to the <a href="https://parquet.apache.org">Apache Parquet</a> format
and why it makes so many things better with large data storage systems like
<a href="https://delta.io">Delta Lake</a>. I have written on this
<a href="/tag/parquet.html">topic</a> before and encourage you to take another read
through <a href="https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/">this blog
post</a>
by some maintainers of the <a href="https://crates.io/crates/parquet">parquet</a> crate.</p>

<p><strong><a href="https://apenwarr.ca/log/20260316">Every layer of review makes you 10x slower</a></strong></p>

<blockquote>
  <p>Every layer of approval makes a process 10x slower [..]</p>

  <p>Just to be clear, we’re counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.</p>

  <ul>
    <li>Code a simple bug fix: 30 minutes</li>
    <li>Get it code reviewed by the peer next to you: 300 minutes → 5 hours → half a day</li>
    <li>Get a design doc approved by your architects team first: 50 hours → about a week</li>
    <li>Get it on some other team’s calendar to do all that (for example, if a customer requests a feature): 500 hours → 12 weeks → one fiscal quarter</li>
  </ul>
</blockquote>

<p>This inspired these thoughts which I shared with the <a href="https://github.com/delta-io/delta-rs">delta-rs</a> community:</p>

<p>“what if we didn’t require code review for merging into main”</p>

<p>I’m exploring the thought more about what we might need to make that happen.
“Why would you do such a thing, code review is so valuable!”  I do find code
reviews valuable but we do seem to lose a lot of flow time due to timezones,
differing work schedules, and a number of other things. For something without a
lot of changes, especially bug fixes that come with tests I would be much more
comfortable with maintainers merging once CI goes green.</p>

<p>Some pieces of the puzzle that I think would be needed:</p>

<ul>
  <li>Soft caps on pull requests. I saw this mentioned somewhere else, but implementing a soft cap of &lt;500 lines per pull request can help people avoid massive unreviewable changes that are simpler to integrate.</li>
  <li>Incorporating some of the benchmarking work into CI that has already been explored. If performance of key operations is not affected and the build is green, go for it.</li>
  <li>Stronger semantic version checks: if our APIs have not changed and all tests pass, I’m generally comfortable with landing stuff by maintainers.</li>
  <li>Implementing Apache Software Foundation style release candidates and voting: this is where we would put a mandatory bottleneck, rather than some jokey slack emojis like I tend to do, implementing a true release candidate process that requires review and vote before we push something to users.</li>
</ul>

<p>All of this is to say that reviews can still be requested, but I would love to
see us land more improvements faster and I think we have a bunch of different
schedules that can make pushing each change through a review queue a lot slower
than necessary.</p>

<p><strong><a href="https://www.possiblerust.com/pattern/conditional-impls">Conditional Impls in Rust</a></strong></p>

<blockquote>
  <p>It’s possible in Rust to conditionally implement methods and traits based on
the traits implemented by a type’s own type parameters. While this is used
extensively in Rust’s standard library, it’s not necessarily obvious that
this is possible.</p>
</blockquote>

<p>I have been vaguely aware of this functionality but haven’t really taken the
time to consider it, so I really appreciated this post walking through the
conditional impl functionality in Rust.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="arrow" /><category term="parquet" /><category term="rust" /><summary type="html"><![CDATA[Over the past week I have made a more conscious effort to keep track of some really interesting articles that came through my feed reader. I am a big fan of the open web and the power of RSS for disseminating interesting information from actual people. Below are some really interesting posts I have read recently!]]></summary></entry><entry><title type="html">Technically I’m microblogging now.</title><link href="https://brokenco.de//2021/02/21/technically-microblogging.html" rel="alternate" type="text/html" title="Technically I’m microblogging now." /><published>2021-02-21T00:00:00+00:00</published><updated>2021-02-21T00:00:00+00:00</updated><id>https://brokenco.de//2021/02/21/technically-microblogging</id><content type="html" xml:base="https://brokenco.de//2021/02/21/technically-microblogging.html"><![CDATA[<p>I am a <em>big</em> fan of the open web and although I have enjoyed
<a href="https://twitter.com/agentdero">Twitter</a> the platform has regressed in dramatic
form and function since I first adopted it. I remember Twitter actively
<em>avoided</em> building a walled garden with fantastic APIs and RSS feeds open to
the public. Much of the popularity of the platform hinged upon the incredible
third party applications and integrations developers like me built in the first
five-ish years of its existence. Over time the site has strayed from open APIs
and standards, and while I still enjoy Twitter, I want some more flexibility
which is why you can now subscribe to my <a href="/microblog.xml">microblog</a> with any
RSS-capable client.</p>

<p><a href="https://en.wikipedia.org/wiki/Microblogging">Microblogging</a> is basically RSS
with a slight change in conventions to support more Twitter-style postings,
with a somehow sillier name. When I was exploring the concept, I came across
numerous posts of folks trying out microblogging only to find that their feeds
had gone <em>very</em> stale. They couldn’t bridge the gap between their existing
community in sites like Twitter with their microblog set up. This excludes some
users I have seen on <a href="https://micro.blog">micro.blog</a>, but I’m not about to pay
$5/month for something that primitive.</p>

<p>My set up is built around me existing blog (you’re reading it!) with
<a href="https://github.com/rtyler/brokenco.de/blob/4e1513b75cab88ed4f098a0f905c33c9860f9d39/_config.yml#L15-L18">some</a>
<a href="https://github.com/rtyler/brokenco.de/blob/4e1513b75cab88ed4f098a0f905c33c9860f9d39/_scripts/new-microblog">tweaks</a>
to make it easier to author microblog entries. I then integrated
<a href="https://github.com/gr2m/twitter-together/">twitter-together</a> to ensure that my
microblog posts are duplicated automatically into my Twitter account.  Not
everything I post to Twitter goes through my Microblog however, replies and
retweets don’t make much sense to me outside of the context of Twitter. But
practically anything I would typically share via my desktop I can now broadcast
via both channels!</p>

<p>Subscribe to my <a href="/microblog.xml">microblog</a>, or don’t, it’s a free internet after all. :)</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><summary type="html"><![CDATA[I am a big fan of the open web and although I have enjoyed Twitter the platform has regressed in dramatic form and function since I first adopted it. I remember Twitter actively avoided building a walled garden with fantastic APIs and RSS feeds open to the public. Much of the popularity of the platform hinged upon the incredible third party applications and integrations developers like me built in the first five-ish years of its existence. Over time the site has strayed from open APIs and standards, and while I still enjoy Twitter, I want some more flexibility which is why you can now subscribe to my microblog with any RSS-capable client.]]></summary></entry><entry><title type="html">Reading RSS feeds from wacky protocols with newsboat</title><link href="https://brokenco.de//2020/07/07/newsboat-wacky-feeds.html" rel="alternate" type="text/html" title="Reading RSS feeds from wacky protocols with newsboat" /><published>2020-07-07T00:00:00+00:00</published><updated>2020-07-07T00:00:00+00:00</updated><id>https://brokenco.de//2020/07/07/newsboat-wacky-feeds</id><content type="html" xml:base="https://brokenco.de//2020/07/07/newsboat-wacky-feeds.html"><![CDATA[<p>Much of the information I read during the day, not counting e-mail, comes from
my RSS reader: <a href="https://newsboat.org">Newsboat</a>. Whenever I see an interesting
blog post on Twitter or elsewhere, I habitually subscribe the author’s RSS
feed. I recently stumbled across an interesting RSS feed which wasn’t served
over HTTP, leading me to wonder: how can I subscribe?</p>

<p>After trying to find some way to make newsboat read a different protocol,
racking my brains thinking of different ways to set up a stub HTTP proxy, I
finally succumbed and read the manpage.</p>

<p>As my luck would have it, the <code class="language-plaintext highlighter-rouge">urls</code> file that newsboat stores its URLs
supports a special <code class="language-plaintext highlighter-rouge">exec</code> syntax for shelling out to run a command to fetch the feed,
for example:</p>

<p><code class="language-plaintext highlighter-rouge">~/.newsboat/urls</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"exec:ssh shellhost 'cat /srv/www/rss.xml'"
"exec /usr/bin/torify curl ftp://someftp/rss.xml"
"exec:/usr/bin/torify curl gopher://example.com/0/news.atom.xml"
</code></pre></div></div>

<p>(<em>Side note:</em> do you have any idea how many protocols <code class="language-plaintext highlighter-rouge">curl</code> supports? <strong>Lots</strong>! On my machine: <code class="language-plaintext highlighter-rouge">dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp</code>)</p>

<p>The <code class="language-plaintext highlighter-rouge">exec</code> syntax is certainly a novel feature. As I have been pondering it
more, I have been thinking about is using it to run
<em>arbitrary shell scripts</em> which would generate reports for review. Some ideas that have come to mind:</p>

<ul>
  <li>Reading the root’s mbox on my local and remote machines to get better visibility into the status of cron jobs.</li>
  <li>Executing some <code class="language-plaintext highlighter-rouge">aws-cli</code> and <code class="language-plaintext highlighter-rouge">az</code> scripts to grab generate some daily cost reports.</li>
  <li>Retrieving error logs from remote machines to tabulate a daily error report.</li>
</ul>

<p>There are other possibilities that come to mind, but it all basically boils
down to generating information dashboards which will help me keep tabs on more
and more things, all from within my feed reader.</p>

<p>I have only just started to experiment with this idea, but I’m looking forward
to poking around with this more.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="newsboat" /><summary type="html"><![CDATA[Much of the information I read during the day, not counting e-mail, comes from my RSS reader: Newsboat. Whenever I see an interesting blog post on Twitter or elsewhere, I habitually subscribe the author’s RSS feed. I recently stumbled across an interesting RSS feed which wasn’t served over HTTP, leading me to wonder: how can I subscribe?]]></summary></entry></feed>