rtyler

2026 March: Recently Studied Stuff

2026-03-21T00:00:00+00:00

Over the past week I have made a more conscious effort to keep track of some really interesting articles that came through my feed reader. I am a big fan of the open web and the power of RSS for disseminating interesting information from actual people. Below are some really interesting posts I have read recently!

Compressed Apache Arrow tables over HTTP

When discussing transport protocols for sending data between services at work recently, a colleague asked “why can’t we just yeet Arrow over HTTP?” It turns out, you absolutely can and Arrow IPC streams even have a registered MIME type:

Content-Type: application/vnd.apache.arrow.stream

Understanding Parquet format for beginners

A great introduction to the Apache Parquet format and why it makes so many things better with large data storage systems like Delta Lake. I have written on this topic before and encourage you to take another read through this blog post by some maintainers of the parquet crate.

Every layer of review makes you 10x slower

Every layer of approval makes a process 10x slower [..]

Just to be clear, we’re counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.

Code a simple bug fix: 30 minutes

Get it code reviewed by the peer next to you: 300 minutes → 5 hours → half a day

Get a design doc approved by your architects team first: 50 hours → about a week

Get it on some other team’s calendar to do all that (for example, if a customer requests a feature): 500 hours → 12 weeks → one fiscal quarter

This inspired these thoughts which I shared with the delta-rs community:

“what if we didn’t require code review for merging into main”

I’m exploring the thought more about what we might need to make that happen. “Why would you do such a thing, code review is so valuable!” I do find code reviews valuable but we do seem to lose a lot of flow time due to timezones, differing work schedules, and a number of other things. For something without a lot of changes, especially bug fixes that come with tests I would be much more comfortable with maintainers merging once CI goes green.

Some pieces of the puzzle that I think would be needed:

Soft caps on pull requests. I saw this mentioned somewhere else, but implementing a soft cap of <500 lines per pull request can help people avoid massive unreviewable changes that are simpler to integrate.
Incorporating some of the benchmarking work into CI that has already been explored. If performance of key operations is not affected and the build is green, go for it.
Stronger semantic version checks: if our APIs have not changed and all tests pass, I’m generally comfortable with landing stuff by maintainers.
Implementing Apache Software Foundation style release candidates and voting: this is where we would put a mandatory bottleneck, rather than some jokey slack emojis like I tend to do, implementing a true release candidate process that requires review and vote before we push something to users.

All of this is to say that reviews can still be requested, but I would love to see us land more improvements faster and I think we have a bunch of different schedules that can make pushing each change through a review queue a lot slower than necessary.

Conditional Impls in Rust

It’s possible in Rust to conditionally implement methods and traits based on the traits implemented by a type’s own type parameters. While this is used extensively in Rust’s standard library, it’s not necessarily obvious that this is possible.

I have been vaguely aware of this functionality but haven’t really taken the time to consider it, so I really appreciated this post walking through the conditional impl functionality in Rust.

Technically I’m microblogging now.

2021-02-21T00:00:00+00:00

I am a big fan of the open web and although I have enjoyed Twitter the platform has regressed in dramatic form and function since I first adopted it. I remember Twitter actively avoided building a walled garden with fantastic APIs and RSS feeds open to the public. Much of the popularity of the platform hinged upon the incredible third party applications and integrations developers like me built in the first five-ish years of its existence. Over time the site has strayed from open APIs and standards, and while I still enjoy Twitter, I want some more flexibility which is why you can now subscribe to my microblog with any RSS-capable client.

Microblogging is basically RSS with a slight change in conventions to support more Twitter-style postings, with a somehow sillier name. When I was exploring the concept, I came across numerous posts of folks trying out microblogging only to find that their feeds had gone very stale. They couldn’t bridge the gap between their existing community in sites like Twitter with their microblog set up. This excludes some users I have seen on micro.blog, but I’m not about to pay $5/month for something that primitive.

My set up is built around me existing blog (you’re reading it!) with some tweaks to make it easier to author microblog entries. I then integrated twitter-together to ensure that my microblog posts are duplicated automatically into my Twitter account. Not everything I post to Twitter goes through my Microblog however, replies and retweets don’t make much sense to me outside of the context of Twitter. But practically anything I would typically share via my desktop I can now broadcast via both channels!

Subscribe to my microblog, or don’t, it’s a free internet after all. :)

Reading RSS feeds from wacky protocols with newsboat

2020-07-07T00:00:00+00:00

Much of the information I read during the day, not counting e-mail, comes from my RSS reader: Newsboat. Whenever I see an interesting blog post on Twitter or elsewhere, I habitually subscribe the author’s RSS feed. I recently stumbled across an interesting RSS feed which wasn’t served over HTTP, leading me to wonder: how can I subscribe?

After trying to find some way to make newsboat read a different protocol, racking my brains thinking of different ways to set up a stub HTTP proxy, I finally succumbed and read the manpage.

As my luck would have it, the urls file that newsboat stores its URLs supports a special exec syntax for shelling out to run a command to fetch the feed, for example:

~/.newsboat/urls

"exec:ssh shellhost 'cat /srv/www/rss.xml'"
"exec /usr/bin/torify curl ftp://someftp/rss.xml"
"exec:/usr/bin/torify curl gopher://example.com/0/news.atom.xml"

(Side note: do you have any idea how many protocols curl supports? Lots! On my machine: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp)

The exec syntax is certainly a novel feature. As I have been pondering it more, I have been thinking about is using it to run arbitrary shell scripts which would generate reports for review. Some ideas that have come to mind:

Reading the root’s mbox on my local and remote machines to get better visibility into the status of cron jobs.
Executing some aws-cli and az scripts to grab generate some daily cost reports.
Retrieving error logs from remote machines to tabulate a daily error report.

There are other possibilities that come to mind, but it all basically boils down to generating information dashboards which will help me keep tabs on more and more things, all from within my feed reader.

I have only just started to experiment with this idea, but I’m looking forward to poking around with this more.