<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//feed/by_tag/antlr.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-05-03T00:12:50+00:00</updated><id>https://brokenco.de//feed/by_tag/antlr.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">Parsing in Rust</title><link href="https://brokenco.de//2020/12/21/parsers-are-fun.html" rel="alternate" type="text/html" title="Parsing in Rust" /><published>2020-12-21T00:00:00+00:00</published><updated>2020-12-21T00:00:00+00:00</updated><id>https://brokenco.de//2020/12/21/parsers-are-fun</id><content type="html" xml:base="https://brokenco.de//2020/12/21/parsers-are-fun.html"><![CDATA[<p>In a world where everything is increasingly YAML, you might find yourself
wondering: “why bother to write a parser?” For starters, I recommend reading
the <a href="https://yaml.org/spec/1.2/spec.html">YAML specification</a> before if you
haven’t, but more importantly: there are so many domains which can be better
modeled with domain-specific semantics and syntax. When I was younger parsing
was typically done with lexx/yacc/bison/whatever and was complete drudgery, but
there are a few great modern tools in the Rust ecosystem that make writing
parsers <em>fun</em>.</p>

<p>I first dabbled in writing parsers with <a href="https://github.com/antlr">ANTLRv4</a>
which is an absolutely <strong>fantastic</strong> toolset for writing parsers. The primary
author <a href="https://github.com/parrt">Terence Parr</a> has written a number of good
books such as “The Definitive ANTLR 4 Reference” and “Language Implementation
Patterns”. Both of which I recommend even if you’re not setting out to write
that next great programming language.</p>

<p>In <a href="https://rust-lang.org">Rust</a> our options are also pretty decent. When I
first ventured into writing Rust I discovered
<a href="https://github.com/rrevenantt/antlr4rust">antlr4rust</a> which I promptly
bookmarked and then set aside until I had a parsing project. Once I finally had
a parsing project, I revisited the project and found that I didn’t like the
ANTLR-like semantics in the Rust language. It didn’t quite feel idiomatic
enough for me to feel comfortable.</p>

<p>More recently I have discovered <strong><a href="https://pest.rs/">Pest</a></strong> which I have now
used within <a href="https://github.com/rtyler/otto">Otto</a> and my most recent
experiment <a href="https://github.com/rtyler/jdp">Jenkins Declarative Parser</a>.</p>

<p>The grammar is similar enough to ANTLR that I was able to get started and my ideas quite quickly. Still, I haven’t become clever enough to use parser-level stack manipulations, so I think that means I remain a parser-simpleton.</p>

<p>Below is an example of the grammar necessary to parse the <code class="language-plaintext highlighter-rouge">script { }</code> step in
Declarative Jenkins Pipelines, which themselves allow arbitrary Groovy code
within them (I didn’t want to parse the groovy too).</p>

<pre><code class="language-peg">scriptStep = { "script" ~ opening_brace ~ groovy ~ closing_brace }
groovy = {
            (
            // Handle nested structures
            (opening_brace ~ groovy ~ closing_brace)
            | (!closing_brace ~ ANY)
            )*
         }

stagesDecl = { "stages" ~
                opening_brace ~
                stage+ ~
                closing_brace
              }
</code></pre>

<p>The qualifiers and details on the grammar can be found in the <a href="https://docs.rs/pest_derive/">pest_derive
crate’s documentation</a>.</p>

<p>Once compiled into the Rust program, using the generated parser is a <em>little</em>
goofy but still very workable, a snippet:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">parser</span> <span class="o">=</span> <span class="nn">PipelineParser</span><span class="p">::</span><span class="nf">parse</span><span class="p">(</span><span class="nn">Rule</span><span class="p">::</span><span class="n">pipeline</span><span class="p">,</span> <span class="n">buffer</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

<span class="k">while</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">parsed</span><span class="p">)</span> <span class="o">=</span> <span class="n">parser</span><span class="nf">.next</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">parsed</span><span class="nf">.as_rule</span><span class="p">()</span> <span class="p">{</span>
        <span class="nn">Rule</span><span class="p">::</span><span class="n">agentDecl</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// parse the agent {} declaration</span>
        <span class="p">}</span>
        <span class="nn">Rule</span><span class="p">::</span><span class="n">stagesDecl</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="nf">parse_stages</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">parsed</span><span class="nf">.into_inner</span><span class="p">())</span><span class="o">?</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">{}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The parsers I am writing tend to be relatively simplistic, taking user-friendly
models and turning them into internal data structures for further use. While
basic it reminds me of the domain-specific language (DSL) “fad” among Rubyists.
I once joked “for loving Ruby so much, Rubyists sure do spend a lot of time
building tools to avoid writing Ruby.” Once you have a simple and easy approach
to create syntax and tooling that better models the domain you’re working it,
it’s hard to avoid!</p>

<p>YAML, XML, and JSON have their place as data serialization formats, but far too
frequently they’re used for configuration or other descriptive usages. Many
developers will cite “everybody knows YAML” in their use, thereby overlooking
that “syntax” and “semantics” are two very distinct pieces of the puzzle. Yes,
most everybody grasps the basics of YAML syntax, however whatever keys a
program is encoding as semantically significant for its configuration (see:
Kubernetes) is a <em>very</em> different story.</p>

<p>The next time you find yourself needing to describe or model complex concepts
for your program, consider creating a language to describe it! Writing the
parser will be easier than you might think!</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rust" /><category term="pest" /><category term="antlr" /><summary type="html"><![CDATA[In a world where everything is increasingly YAML, you might find yourself wondering: “why bother to write a parser?” For starters, I recommend reading the YAML specification before if you haven’t, but more importantly: there are so many domains which can be better modeled with domain-specific semantics and syntax. When I was younger parsing was typically done with lexx/yacc/bison/whatever and was complete drudgery, but there are a few great modern tools in the Rust ecosystem that make writing parsers fun.]]></summary></entry></feed>