<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//feed/by_tag/llm.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-04-12T21:39:52+00:00</updated><id>https://brokenco.de//feed/by_tag/llm.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">A Few Good Humans</title><link href="https://brokenco.de//2026/03/05/a-few-good-humans.html" rel="alternate" type="text/html" title="A Few Good Humans" /><published>2026-03-05T00:00:00+00:00</published><updated>2026-03-05T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/05/a-few-good-humans</id><content type="html" xml:base="https://brokenco.de//2026/03/05/a-few-good-humans.html"><![CDATA[<p>The tool is only as good as its training data. A developer I work with was
expressing some frustrations with the strong encouragement <em>but definitely not
a mandate</em> to use AI-assisted coding tools. They were feeling gaslit because
they were told this was going to 10x their productivity and instead it had led
them <em>significantly</em> astray and ended up wasting much more time than it saved!</p>

<p>The developer was trying to write some brand Terraform for the project they
were working on, not their area of expertise but it needed to be done. They had
an experience which I recognized from my explorations with earlier models where
I <em>also</em> just wanted the model to write some miserable Terraform resources
because I didn’t want to myself. Except when prompted, it was as if I asked
“<em>write me some Terraform to provision this resource, wrong answers only.</em>”</p>

<p>I <em>also</em> thought I must be insane and just using the magic numbers box
incorrectly. Seeing the exact same situation play out for another developer,
with a different model, years later, I feel like i understand the problem!</p>

<p><strong>There is not enough open source <em>use</em> of Terraform to copy</strong>.</p>

<p>Plenty of great <a href="https://github.com/terraform-aws-modules/">open source terraform modules
exist</a> (Putin khuylo!) but perilously
few open source examples exist of <em>using</em> Terraform. I believe this to be
because the vast majority of Terraform is <em>closed source</em> and therefore not
scraped and ingested into these models.</p>

<p>If code-generating AI tools don’t want to suffer from the <a href="https://en.wikipedia.org/wiki/Dead_Internet_theory">dead internet
theory</a>, the data has to
come from somewhere.</p>

<p>The machine relies on code being <em>open sourced</em>.</p>

<hr />

<p><strong>LTJG Kaffee</strong>: Colonel Jessep! Did you author that code?</p>

<p><strong>Judge Randolph</strong>: You don’t have to answer that question!</p>

<p><strong>Col Jessup</strong>: I’ll answer the question. You want answers?</p>

<p><strong>LTJG Kaffee</strong>: I think I’m entitled to them.</p>

<p><strong>Col Jessup</strong>: You want answers?!</p>

<p><strong>LTJG Kaffee</strong>: I want the truth!</p>

<p><strong>Col Jessup</strong>: You can’t handle the truth!</p>

<p>Son, we live in a world that has models, and those models have to be seeded by
humans who code.</p>

<p>Who’s gonna do it? You? You, Lieutenant Weinberg?</p>

<p>I have a greater responsibility than you can possibly fathom.</p>

<p>You weep for the roadmap, and you curse the upstream. You have that luxury. You
have the luxury of not knowing what I know – that the launch delay, while
tragic, probably saved time; and my existence, while grotesque and
incomprehensible to you, saves time.</p>

<p>You don’t want the truth because deep down in places you don’t talk about at
parties, you want me feeding that model  – you need me feeding that model.</p>

<p>We use words like “fork,” “code,” “libre.” We use these words as the backbone of a life spent supporting something. You use them as a punch line.</p>

<p>I have neither the time nor the inclination to explain myself to a man who
rises and sleeps under the blanket of the productivity that I provide and then
questions the manner in which I provide it.</p>

<p>I would rather that you just said “thank you” and went on your way. Otherwise,
I suggest you pick up an editor and submit a PR.</p>

<p>Either way, I don’t give a DAMN what you think you’re entitled to!</p>

<p><strong>LTJG Kaffee</strong>: Did you author that code?</p>

<p><strong>Col Jessup</strong>: I did the job –</p>

<p><strong>LTJG Kaffee</strong>: – Did you author the code?</p>

<p><strong>Col Jessup</strong>: YOU’RE GOD DAMN RIGHT I DID!</p>

<hr />]]></content><author><name>R. Tyler Croy</name></author><category term="software" /><category term="ai" /><category term="llm" /><summary type="html"><![CDATA[The tool is only as good as its training data. A developer I work with was expressing some frustrations with the strong encouragement but definitely not a mandate to use AI-assisted coding tools. They were feeling gaslit because they were told this was going to 10x their productivity and instead it had led them significantly astray and ended up wasting much more time than it saved!]]></summary></entry><entry><title type="html">DCO and AI is a no-go.</title><link href="https://brokenco.de//2026/03/02/copyright-ai.html" rel="alternate" type="text/html" title="DCO and AI is a no-go." /><published>2026-03-02T00:00:00+00:00</published><updated>2026-03-02T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/02/copyright-ai</id><content type="html" xml:base="https://brokenco.de//2026/03/02/copyright-ai.html"><![CDATA[<p>The phrases “generative AI” and “copyright” evoke a multitude of stories about
unauthorized training, scraping, and violation of norms. The thought that
somebody could then turn around and then try to copyright works <em>generated</em> by
these large language models is absurd, but in 2026 anything kind of goes
doesn’t it?</p>

<p>One of the big arguments <em>against</em> generative AI-based coding
tools is that they were trained on billions of lines of <em>copyrighted</em> and
licensed works in the open source ecosystem, and they strip those works of all
attribution and violate the terms of the licenses.</p>

<p>Yesterday there was some fervor about <a href="https://www.theverge.com/policy/887678/supreme-court-ai-art-copyright">the Supreme Court allowing a lower court
decision to
stand</a>
in my timeline. I have been following this topic for a few weeks after reading this
<a href="https://www.congress.gov/crs_external_products/LSB/PDF/LSB10922/LSB10922.8.pdf">Congressional Research Service report: Generative Artificial Intelligence and
Copyright
Law</a>
(PDF) and considering how some of the guidance might affect the use of
generative AI in open source projects.</p>

<p><a href="https://toot.cat/@zkat/116162089501237946">kat nailed it with their toot</a></p>

<blockquote>
  <p>So uh</p>

  <p>does this mean that there is now precedent that at least “agentic” dev systems,
potentially any genAI dev system, now leaves companies open to their code no
longer being considered copyrightable if they use these systems?</p>
</blockquote>

<p>kat is right that <strong>this is huge</strong>.</p>

<p>In the <a href="https://github.com/delta-io">Delta Lake project</a> we rely on <a href="https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin">Developer
Certificate of
Origin</a> (DCO) with
guidance from the Linux Foundation. Yes, <a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">that Linux Foundation</a>.</p>

<p>From the DCO:</p>

<blockquote>
  <p>The contribution was created in whole or in part by me and I have the right
to submit it under the open source license indicated in the file; or</p>
</blockquote>

<p>From the <a href="https://www.apache.org/licenses/LICENSE-2.0.html">Apache License</a>:</p>

<blockquote>
  <p>“Licensor” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.</p>

  <p>[..]</p>

  <p>“Contributor” shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.</p>
</blockquote>

<p>I am no lawyer but I do have at least a 12th grade reading comprehension level.</p>

<p>If AI-generated works are not copyrightable, then it is not possible for
somebody to <em>license</em> under any open source license, much less assert via DCO
that they are able to do so.</p>

<hr />
<p><strong>2026-03-04 update:</strong> Software Freedom Conservancy has <a href="https://sfconservancy.org/blog/2026/mar/04/scotus-deny-cert-dc-circuit-thaler-appeal-llm-ai/">a blog
post</a>
on the topic which is worth a read. I understand the narrowness of the scope of
judgement which is being referred to, but my opinion on the situation is taking
into consideration the other guidance from the Congressional Research report
and the US Copyright Office oplicies as they stand today.</p>

<hr />

<p>Again, this is a big deal.</p>

<p>From the Congressional Research report</p>

<blockquote>
  <p>The AI Guidance states that authors may claim copyright protection only “for
their own contributions” to such works, and they must identify and disclaim
AI-generated parts of the works when applying to register their copyright.</p>
</blockquote>

<p>The only viable solution I can imagine is that all AI-generated code
contributions in open source projects is considered public domain and commented
appropriately. Otherwise I don’t see a sensible path forward for
AI-generated code in open source projects.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opinion" /><category term="opensource" /><category term="ai" /><category term="llm" /><summary type="html"><![CDATA[The phrases “generative AI” and “copyright” evoke a multitude of stories about unauthorized training, scraping, and violation of norms. The thought that somebody could then turn around and then try to copyright works generated by these large language models is absurd, but in 2026 anything kind of goes doesn’t it?]]></summary></entry><entry><title type="html">The AI Coding Margin Squeeze</title><link href="https://brokenco.de//2025/08/07/margin-squeeze.html" rel="alternate" type="text/html" title="The AI Coding Margin Squeeze" /><published>2025-08-07T00:00:00+00:00</published><updated>2025-08-07T00:00:00+00:00</updated><id>https://brokenco.de//2025/08/07/margin-squeeze</id><content type="html" xml:base="https://brokenco.de//2025/08/07/margin-squeeze.html"><![CDATA[<p>Words cannot express how excited I am for the coming margin squeeze on <em>every</em>
“AI company” that isn’t Anthropic, OpenAI, Microsoft, or Google. The <em>entire</em>
industry is built on an unethical foundation, having illegitimately acquired
<a href="https://duckduckgo.com/?t=ffab&amp;q=openai+copyright+infringement&amp;ia=news&amp;iar=news">massive
amounts</a>
of
<a href="https://arstechnica.com/tech-policy/2025/07/meta-pirated-and-seeded-porn-for-years-to-train-ai-lawsuit-says/">content</a>
from practically <em>everybody</em>. The companies selling “AI Coding Assistants” I am particularly excited to see implode.</p>

<p>Every “AI Coding Assistant” uses Large Language Models (LLMs) trained on the
open source ecosystem and every one of those models <strong>is</strong> violating the
licenses of every piece of code they were trained on.</p>

<p>My code has been slurped up into these models and is being used to
create derived works without attribution, thereby violating the software I have
personally licensed under MIT, Apache Software License 2.0, LGPL, AGPL, or GPLv3.</p>

<p>I saw a fun quote in this post <a href="https://techcrunch.com/2025/08/07/the-high-costs-and-thin-margins-threatening-ai-coding-startups/?utm_source=dlvr.it&amp;utm_medium=mastodon">from TechCrunch</a>:</p>

<blockquote>
  <p>“That’s what everyone’s banking on,” said Eric Nordlander, a general partner at
Google Ventures. “The inference cost today, that’s the most expensive it’s ever
going to be.”</p>

  <p>It’s not entirely clear how true that is. Rather than falling as expected, the
cost of some of the latest AI models has risen, as they use more time and
computational resources to handle complicated, multi-step tasks.</p>
</blockquote>

<p>Kudos to the author <a href="https://techcrunch.com/author/marina-temkin/">Marina
Tempkin</a> for calling bullshit on
the delusion that “inference will get cheaper.”</p>

<p>It will only get more expensive. It is a matter of <em>when</em> not <em>if</em>. Investors
have poured billionrs into OpenAI for example, and they are expecting a massive
return. The only way that happens is by OpenAI successfully executing the
“YouTube Playbook”: run at a loss as fast as you can, get massive scale, corner
the market, then pivot to monetization.</p>

<p>The companies built entirely with OpenAI API calls and little other “moat” are
destined to have their margins squeezed <em>hard</em> and then wink out of existence.</p>

<p>I wish them all the best of luck in their future endeavors.</p>

<hr />

<p>Think I’m salty? Read <a href="https://malwaretech.com/2025/08/every-reason-why-i-hate-ai.html">this great post from Marcus
Hutchins</a>
which has a much longer and thoughtful take on the entire ecosystem.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opinion" /><category term="llm" /><category term="ai" /><summary type="html"><![CDATA[Words cannot express how excited I am for the coming margin squeeze on every “AI company” that isn’t Anthropic, OpenAI, Microsoft, or Google. The entire industry is built on an unethical foundation, having illegitimately acquired massive amounts of content from practically everybody. The companies selling “AI Coding Assistants” I am particularly excited to see implode.]]></summary></entry></feed>