<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//feed/by_tag/cicd.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-05-03T00:12:50+00:00</updated><id>https://brokenco.de//feed/by_tag/cicd.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">Noodling on Otto’s pipeline state machine</title><link href="https://brokenco.de//2020/11/21/otto-pipeline-state-machine.html" rel="alternate" type="text/html" title="Noodling on Otto’s pipeline state machine" /><published>2020-11-21T00:00:00+00:00</published><updated>2020-11-21T00:00:00+00:00</updated><id>https://brokenco.de//2020/11/21/otto-pipeline-state-machine</id><content type="html" xml:base="https://brokenco.de//2020/11/21/otto-pipeline-state-machine.html"><![CDATA[<p>Recently I have been making good progress with
<a href="https://github.com/rtyler/otto">Otto</a> such that I seem to be unearthing one
challenging design problem per week. The <a href="/2020/11/06/pipeline-syntax-for-otto.html">sketches of Otto pipeline
syntax</a> necessitated some internal
data structure
<a href="https://github.com/rtyler/otto/commit/d92a72ec7dd78968df863d9d90f553c98871c625#diff-0118f6d77d9de58413a5a5c50e6eaa7015c344e025a1d16d86cc54f661713d0f">changes</a>
to ensure that to right level of flexibility was present for execution.  Otto
is designed as a services-oriented architecture, and I have the parser service
and the agent daemon which will execute steps from a pipeline.  I must now
implement the service(s) between the parsing of a pipeline and the execution of
said pipeline. My current thinking is that two services are needed: the
Orchestrator and the Pipeline State Machine.</p>

<p>For this blog post I discuss much of what the Orchestrator should do other than
to mention that I intend Orchestrators to exist to provision resources and
launch agents for executing pipelines.</p>

<p>The Pipeline State Machine (PSM) is where the real fun starts. Somewhere inside
Otto, something <strong>must</strong> keep track of the progression of a pipeline from one
state to another, ensuring that the right actions are being triggered when
certain state transitions occur.</p>

<h2 id="states">States</h2>

<p>The current structure of the internal pipeline model informs the potential
states in the state machine:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">uuid</span><span class="pi">:</span> <span class="s1">'</span><span class="s">some'</span>
<span class="na">batches</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">mode</span><span class="pi">:</span> <span class="s">Linear</span>
    <span class="na">contexts</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uuid</span><span class="pi">:</span> <span class="s1">'</span><span class="s">uuid-context'</span>
        <span class="na">properties</span><span class="pi">:</span>
          <span class="na">name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Build'</span>
        <span class="na">environment</span><span class="pi">:</span> <span class="pi">{}</span>
        <span class="na">steps</span><span class="pi">:</span>
          <span class="pi">-</span> <span class="na">symbol</span><span class="pi">:</span> <span class="s1">'</span><span class="s">sh'</span>
            <span class="na">uuid</span><span class="pi">:</span> <span class="s1">'</span><span class="s">uuid-step'</span>
            <span class="na">context</span><span class="pi">:</span> <span class="s1">'</span><span class="s">uuid-context'</span>
            <span class="na">parameters</span><span class="pi">:</span>
              <span class="pi">-</span> <span class="s1">'</span><span class="s">pwd'</span>
  <span class="pi">-</span> <span class="na">mode</span><span class="pi">:</span> <span class="s">Linear</span>
    <span class="na">contexts</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uuid</span><span class="pi">:</span> <span class="s1">'</span><span class="s">uuid2-context'</span>
        <span class="na">properties</span><span class="pi">:</span>
          <span class="na">name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Test'</span>
        <span class="na">environment</span><span class="pi">:</span> <span class="pi">{}</span>
        <span class="na">steps</span><span class="pi">:</span>
          <span class="pi">-</span> <span class="na">symbol</span><span class="pi">:</span> <span class="s1">'</span><span class="s">sh'</span>
            <span class="na">uuid</span><span class="pi">:</span> <span class="s1">'</span><span class="s">uuid2-step'</span>
            <span class="na">context</span><span class="pi">:</span> <span class="s1">'</span><span class="s">uuid2-context'</span>
            <span class="na">parameters</span><span class="pi">:</span>
              <span class="pi">-</span> <span class="s1">'</span><span class="s">make</span><span class="nv"> </span><span class="s">test'</span>
</code></pre></div></div>

<p>“Batches” are a concept which will exist
internally to Otto to help handle parallel stages and other novel groupings of
steps. Referring back to the “sketches of syntax” post, a <code class="language-plaintext highlighter-rouge">parallel</code> or
<code class="language-plaintext highlighter-rouge">fanout</code> block would result in a single Batch. Inside that Batch would be a
Context for each <code class="language-plaintext highlighter-rouge">stage</code> declared, allowing some flexibility between the
internal representation of a Pipeline and the user-visible declaration.</p>

<p>I believe that each Pipeline will largely
need to progress through the states defined below.</p>

<ul>
  <li><strong>Pending</strong>: requires full model and uuid, basically the output from the Parser.</li>
  <li><code class="language-plaintext highlighter-rouge">for batch in batches</code>
    <ul>
      <li><strong>Auction Started</strong>: with list of contexts that have been auctioned</li>
      <li><strong>Auction Completed</strong>: with list of contexts and the winning Orchestrator for each context.</li>
      <li><strong>Provisioning</strong>: mapping each context to an Orchestrator who should be provisioning the resource(s) necessary to execute that context.</li>
      <li><strong>Execution(context)</strong>: each context has its own state of: Pending/Running/Failed/Aborted/Unstable/Success</li>
      <li><strong>Batch Complete</strong></li>
    </ul>
  </li>
  <li><strong>Pipeline Complete(status)</strong></li>
</ul>

<p>“Auctions” refer to the planned <a href="https://github.com/rtyler/otto/blob/main/rfc/0003-resource-auctioning.adoc">resource
auctioning</a>
work I wish to explore at a later date; the first version of PSM will likely
omit these states.</p>

<p>The requirements for PSM I have in mind are:</p>

<ul>
  <li>It should receive and store the entire pipeline model (YAML above). I am not
yet should what the exact interplay between source control and PSM should be.
I have <a href="https://github.com/rtyler/otto/issues/11">an issue</a> which mentions
the service which will ingest GitHub Webhook payloads. My current thinking is
that this service should perhaps be responsible for handling the webhook
payload <em>and</em> fetching the <code class="language-plaintext highlighter-rouge">Ottofile</code> in order to send a request to the
Parser and then PSM.</li>
  <li>It should hold the mapping between a given pipeline <code class="language-plaintext highlighter-rouge">uuid</code> and the states listed above.</li>
  <li>It should fire events for each state transition.</li>
</ul>

<p>Some requirements I am not yet certain of are:</p>

<ul>
  <li>Does it need to know which orchestrator or who is actually executing on a
context or batch?</li>
  <li>Should PSM contain a mapping of a commit revision to pipeline uuid to help
with de-duplication of pipelines for identical commits?</li>
</ul>

<p>Looking at the shape of PSM is like a inspecting a building in the distance. I
have a general idea of its dimensions and key characteristics, but the details
remain blurry no matter how hard I squint.</p>

<hr />

<p>This phase of Otto’s development has certainly been the most frustrating in
months. I’m pushing towards enough service integration to allow for a Otto to
perform basic self-hosted CI. To accomplish this I will need:</p>

<ul>
  <li>A service ingesting webhooks and fishing the <code class="language-plaintext highlighter-rouge">Ottofile</code> out of the latest commit on a given branch.</li>
  <li>☑ The Parser service, which turns an <code class="language-plaintext highlighter-rouge">Ottofile</code> into a usable internal model.</li>
  <li>A Pipeline State Machine to manage the execution of the pipeline.</li>
  <li>A basic Orchestrator which can dispatch a local <code class="language-plaintext highlighter-rouge">otto-agent</code> with the appropriate arguments.</li>
  <li>☑ An object store service to contain logs, artifacts, and step libraries.</li>
  <li>☑ An agent capable of executing steps.</li>
  <li>☑ Defined steps to check out a source repo.</li>
</ul>

<p>For the most basic self-hosted implementation, I don’t even think I need a
GUI/dashboard or an eventbus, both of which are in the “grand vision.”</p>

<p>Much of what remains requires “big think” time however, which is in short
supply. Every time I sit down to the problem, I spend a non-trivial amount of
time debating whether I am over-complicating this before I am able to
re-convince myself of the approach I am taking here.</p>

<p>The curse of working in <a href="https://jenkins.io">Jenkins</a> for so long is that I
know how so many CI system design decisions ultimately run into limitations for
certain use-cases.</p>

<p>Regardless of how challenging the path ahead appears, I will inch along, slowly
but surely. :)</p>

<hr />

<p>As always, if you’re curious to learn more, you’re welcome to join <code class="language-plaintext highlighter-rouge">#otto</code> on the
<a href="https://freenode.net">Freenode</a> IRC network, or follow along on
<a href="https://github.com/rtyler/otto">GitHub</a></p>]]></content><author><name>R. Tyler Croy</name></author><category term="otto" /><category term="cicd" /><category term="rust" /><summary type="html"><![CDATA[Recently I have been making good progress with Otto such that I seem to be unearthing one challenging design problem per week. The sketches of Otto pipeline syntax necessitated some internal data structure changes to ensure that to right level of flexibility was present for execution. Otto is designed as a services-oriented architecture, and I have the parser service and the agent daemon which will execute steps from a pipeline. I must now implement the service(s) between the parsing of a pipeline and the execution of said pipeline. My current thinking is that two services are needed: the Orchestrator and the Pipeline State Machine.]]></summary></entry><entry><title type="html">Sketches of syntax, a pipeline for Otto</title><link href="https://brokenco.de//2020/11/06/pipeline-syntax-for-otto.html" rel="alternate" type="text/html" title="Sketches of syntax, a pipeline for Otto" /><published>2020-11-06T00:00:00+00:00</published><updated>2020-11-06T00:00:00+00:00</updated><id>https://brokenco.de//2020/11/06/pipeline-syntax-for-otto</id><content type="html" xml:base="https://brokenco.de//2020/11/06/pipeline-syntax-for-otto.html"><![CDATA[<p>Defining a good continuous integration and delivery pipeline syntax for
<a href="https://github.com/rtyler/otto">Otto</a> is one of the most important
challenges in the entire project. It is one which I struggled with early in
the project almost a year and a half ago. It is a challenge I continue to
struggle with today, even as the puzzles pieces start to interlock for the
multi-service system I originally imagined Otto to be. Now that I have started
writing the parser, the pressure to make some design decisions and play them
out to their logical ends is growing. The following snippet compiles to the
<em>current</em> Otto intermediate representation and will execute on the <em>current</em>
prototype agent implementation:</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="o">{</span>
  <span class="n">stages</span> <span class="o">{</span>
    <span class="n">stage</span> <span class="o">{</span>
      <span class="n">name</span> <span class="o">=</span> <span class="s1">'Build'</span>
      <span class="n">steps</span> <span class="o">{</span>
        <span class="n">echo</span> <span class="s1">'&gt;&gt; Building project'</span>
        <span class="n">sh</span> <span class="s1">'make all'</span>
      <span class="o">}</span>
    <span class="o">}</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>“It works!”</p>

<p>The reaction upon sharing this with friends and colleagues on Twitter was
largely: “looks like Declarative [Pipeline].” The syntax above is just a simple
shim to get the basics working, but the similarities are no accident. <a href="https://jenkins.io/doc/book/pipeline/syntax/">Jenkins
Pipeline</a> is the best publicly
available syntax for describing a CI/CD pipeline (in my not-so-humble opinion).
If I didn’t believe that I wouldn’t continue to advocate for its use at every
possible turn.</p>

<p>For Otto however, the goal is not to create a Jenkins Pipeline knock-off. In
this post I wanted to share some sketches of what I think Otto Pipelines should
look like and why. For starters, the README <a href="https://github.com/rtyler/otto/tree/3c2daa412ab091d827040c56f891ad0fdcd7cd2c#modeling-continuous-delivery">has an incomplete
list</a>
covering some high-level goals that I have for modeling continuous delivery. The <a href="https://github.com/rtyler/otto/tree/3c2daa412ab091d827040c56f891ad0fdcd7cd2c/examples">examples/
directory</a>
also has a few sample <code class="language-plaintext highlighter-rouge">.otto</code> files which I will use for reference throughout
this post.</p>

<p>It may also be worth reading my previous post describing <a href="/2020/10/18/otto-steps.html">Step
Libraries</a> if you haven’t already, since they play
an integral part in making this syntax “go.”</p>

<p>With the pre-requisites out of the way, let’s walk through some syntax!</p>

<h2 id="defining-the-available-tools">Defining the available tools</h2>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">use</span> <span class="o">{</span>
  <span class="n">stdlib</span>
  <span class="s1">'./common/slack'</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Sprinkled at the top of Otto Pipelines is the <code class="language-plaintext highlighter-rouge">use</code> block. A common problem
with Jenkins Pipeline is that the steps available in the <code class="language-plaintext highlighter-rouge">Jenkinsfile</code> is
completely dependent on what plugins have been installed on the controller <em>or</em>
what <a href="https://jenkins.io/doc/book/pipeline/shared-libraries/">Shared Libraries</a>
have been configured. The <code class="language-plaintext highlighter-rouge">use</code> block effectively brings Otto Step Libraries
into scope for the given pipeline. Because Step Libraries require no
“controller-side” execution in Otto, each Otto Pipeline can use a completely
different sets of steps for users to leverage in their workflow.</p>

<p><strong>Open questions</strong>:</p>

<ul>
  <li>Versioning for step libraries seems like it is worth doing, but what’s the right syntax for expressing it?</li>
  <li>Referring to step libraries by URL could be incredibly useful, but is it worth the complexity?</li>
</ul>

<h2 id="defining-the-execution-environment">Defining the execution environment</h2>

<p>Next is declaring the execution environment for a stage/stages:</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* snip */</span>
<span class="n">stage</span> <span class="o">{</span>
  <span class="n">name</span> <span class="o">=</span> <span class="s1">'Build'</span>

  <span class="n">runtime</span> <span class="o">{</span>
    <span class="n">docker</span> <span class="o">{</span>
      <span class="n">image</span> <span class="o">=</span> <span class="s1">'ruby:2.6'</span>
    <span class="o">}</span>
  <span class="o">}</span>

  <span class="n">steps</span> <span class="o">{</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Resource allocation is one of the areas I am most excited to explore with Otto,
but from a modeling standpoint and an execution standpoint. Jenkins was build
before “cloud” was a thing, and arguably before “containers”, depending on
whether or not any rabid Solaris users are within earshot. As such it has some
pitfalls when mapping pipeline execution to these much more dynamic
environments. On the flip side, newer CI/CD systems seem to have all gravitated
towards container-all-the-things and typically don’t consider non-container
workloads in any form, and will also usually require a Kubernetes clusters just
to get started.</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">runtime</span> <span class="o">{</span>
  <span class="n">arch</span> <span class="o">=</span> <span class="s1">'amd64'</span>
  <span class="n">linux</span> <span class="o">{</span>
    <span class="n">pkgconfig</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'openssl'</span><span class="o">,</span> <span class="s1">'libxml-2.0'</span><span class="o">]</span>
  <span class="o">}</span>
  <span class="n">python</span> <span class="o">{</span>
    <span class="n">version</span> <span class="o">=</span> <span class="s1">'~&gt; 3.8.5'</span>
    <span class="n">virtualenv</span> <span class="o">=</span> <span class="kc">true</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>For Otto I want to do <em>better</em> and started thinking about <em>capabilities</em> rather
than fixed labels or names. In many cases, I don’t particularly care where my
Rust project builds, just so long as it has <code class="language-plaintext highlighter-rouge">cargo</code> and an up-to-date stable
<code class="language-plaintext highlighter-rouge">rustc</code>. Similarly for Python projects, I might need an execution environment
with Python 3.x, <code class="language-plaintext highlighter-rouge">virtualenv</code>, and <code class="language-plaintext highlighter-rouge">libxml2</code> installed. In most systems that
precede Otto, administrators end up defining complex labels which users must
know. Another way to paper over this complexity is to say “just bring your own
container!” which pushes a lot of work back onto developers, typically leading
to one-off <code class="language-plaintext highlighter-rouge">Dockerfile</code>s which just take an upstream image and add one or two
dependencies.</p>

<p>With a capabilities-oriented model, the pipeline orchestration layer is no
longer looking for machines labeled “linux-python” or and then hoping one is
available. Instead the orchestrator can be smarter and find any available
capacity to meet the capabilities request.  I believe this approach can improve
on overall system performance and scheduling. An idea which I have floating
around as <a href="https://github.com/rtyler/otto/blob/main/rfc/0003-resource-auctioning.adoc">a draft
RFC</a>
right now is basically to “auction” pipeline tasks to the lowest bidder. When I
first started considering this idea, I found this paper titled <a href="https://www.scribd.com/document/439692166/Efficient-Nash-Equilibrium-Resource-Allocation-Based-on-Game-Theory-Mechanism-in-Cloud-Computing-by-Using-Auction">Efficient Nash
Equilibrium Resource Allocation Based on Game Theory Mechanism in Cloud
Computing by Using
Auction</a>,
which will likely guide the implementation of <code class="language-plaintext highlighter-rouge">auctioneer</code> quite a bit.</p>

<p>What remains to be seen is whether users are actually interested in
<em>expressing</em> the capabilities that would be necessary to make a highly
efficient resource auction practical.</p>

<p><strong>Open questions:</strong></p>

<ul>
  <li>Do most developers think about what their pipeline needs in the same way I think about capabilities?</li>
  <li>How would an administrator define capabilities of a cloud-based VM template?</li>
</ul>

<h2 id="caching">Caching</h2>

<p>From an operational standpoint, I think the most common problem of <em>any</em> CI
system is overuse of remote resources by pipelines. <a href="https://twitter.com/technosophos/status/1324037674588463105">This is not a niche
problem</a>, but
rather something that affects practically everybody. Some will say “you should
be caching and proxying all your remote resources!” which is simply not a
practical solution for the vast majority of the users in the ecosystem. Many at
users won’t be at organizations large enough to deploy such caching proxies.</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stage</span> <span class="o">{</span>
  <span class="n">name</span> <span class="o">=</span> <span class="s1">'Build'</span>

  <span class="n">cache</span> <span class="o">{</span>
    <span class="c1">// Create a cache named "gems", immutable for the rest of the Pipeline</span>
    <span class="n">gems</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'vendor/'</span><span class="o">]</span>
    <span class="n">assets</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'dist/css/'</span><span class="o">,</span> <span class="s1">'dist/js/'</span><span class="o">]</span>
  <span class="o">}</span>

  <span class="cm">/* snip */</span>
<span class="o">}</span>
<span class="n">stage</span> <span class="o">{</span>
  <span class="n">name</span> <span class="o">=</span> <span class="s1">'Test'</span>
  <span class="n">cache</span> <span class="o">{</span>
    <span class="n">use</span> <span class="n">gems</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cache</code> block is intended to provide pipeline authors with a way to cache
arbitrary sections of the workspace for later re-use in the pipeline across
multiple agents. This is a pretty simple syntax addition, but something built
into the Otto infrastructure from the beginning.</p>

<p>On the implementation side, this requires that archiving and retrieving these
artifacts is relatively quick, which I don’t believe will be a major challenge.</p>

<p><strong>Open questions:</strong></p>

<ul>
  <li>Is it sufficient to cache a file subtree and simply restore it into the same
location in another agent’s workspace?</li>
  <li>Would this syntax accommodate the caching of Docker image layers?</li>
</ul>

<h2 id="composition-and-re-use">Composition and Re-use</h2>

<p>Inevitably developers try to abstract common functionality and behaviors into
re-usable components. Step Libraries can provide one flavor of this
re-usability, but I don’t believe that it is sufficient. The ubiqitous adoption
of YAML by newer CI/CD tools lead me to joke about <a href="/2018/08/15/five-stages-of-yaml.html">the five stages of
YAML</a> wherein developers end up turning a
declarative syntax into templates and then into just another turing-complete
language.</p>

<p>In Jenkins we have seen numerous tools for templatizing jobs, pipelines, or
other aspects of Jenkins configuration. Suffice it to say, there is a need to
compose and re-use various aspects of pipelines.</p>

<p>For Otto, I have been playing around with a context-aware <code class="language-plaintext highlighter-rouge">from</code> keyword, such as below:</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stage</span> <span class="o">{</span>
  <span class="n">name</span> <span class="o">=</span> <span class="s1">'Test'</span>
  <span class="n">runtime</span> <span class="o">{</span>
    <span class="n">from</span> <span class="s1">'Build'</span>
  <span class="o">}</span>
  <span class="n">steps</span> <span class="o">{</span>
    <span class="n">sh</span> <span class="s1">'bundle exec rake spec'</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>In the above example, <code class="language-plaintext highlighter-rouge">from</code> instructs the pipeline to re-use the contents of
the runtime block from the <code class="language-plaintext highlighter-rouge">Build</code> stage. My current thinking is that this
simple use of <code class="language-plaintext highlighter-rouge">from</code> allows for pipeline-internal re-use of pieces without the
need to set variables or turn this into a scripting language.</p>

<p>That said, re-usability within the pipeline isn’t where the main interest in
“templates” lies.</p>

<p>I have been exploring the concept of a “blueprint” which can act as an
re-usable unit of Otto Pipeline. I am imagining that these would be published
and managed similarly to Step Libraries. In order to provide maximum
flexibility, I think blueprints should be able to capture just about any
snippet of the Otto Pipeline syntax for re-use, consider the following example
to help make common Ruby gem build/test/publish pipelines cleaner:.</p>

<p><strong>rubygem.blueprint</strong></p>
<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">use</span> <span class="o">{</span>
  <span class="n">stdlib</span>
<span class="o">}</span>

<span class="n">blueprint</span> <span class="o">{</span>
  <span class="n">parameters</span> <span class="o">{</span>
    <span class="n">rubyVersion</span> <span class="o">{</span>
      <span class="n">description</span> <span class="o">=</span> <span class="s1">'Specify the Ruby container'</span>
      <span class="k">default</span> <span class="o">=</span> <span class="s1">'ruby'</span>
      <span class="n">type</span> <span class="o">=</span> <span class="n">string</span>
    <span class="o">}</span>

    <span class="n">deploy</span> <span class="o">{</span>
      <span class="n">description</span> <span class="o">=</span> <span class="s1">'Push to rubygems.oorg'</span>
      <span class="k">default</span> <span class="o">=</span> <span class="kc">true</span>
      <span class="n">type</span> <span class="o">=</span> <span class="kt">boolean</span>
    <span class="o">}</span>
  <span class="o">}</span>

  <span class="n">plan</span> <span class="o">{</span>
    <span class="n">stages</span> <span class="o">{</span>
      <span class="n">stage</span><span class="o">(</span><span class="s1">'Build'</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">runtime</span> <span class="o">{</span>
          <span class="n">docker</span> <span class="o">{</span>
            <span class="n">image</span> <span class="o">=</span> <span class="n">vars</span><span class="o">.</span><span class="na">rubyVersion</span>
          <span class="o">}</span>
        <span class="o">}</span>

        <span class="n">steps</span> <span class="o">{</span>
          <span class="n">sh</span> <span class="s1">'bundle install'</span>
          <span class="n">sh</span> <span class="s1">'bundle exec rake build'</span>
        <span class="o">}</span>
      <span class="o">}</span>

      <span class="n">stage</span><span class="o">(</span><span class="s1">'Test'</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">runtime</span> <span class="o">{</span> <span class="n">from</span> <span class="s1">'Build'</span> <span class="o">}</span>
        <span class="n">steps</span> <span class="o">{</span>
          <span class="n">sh</span> <span class="s1">'bundle exec rake test'</span>
        <span class="o">}</span>
      <span class="o">}</span>

      <span class="n">stage</span><span class="o">(</span><span class="s1">'Deploy'</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">gates</span> <span class="o">{</span> <span class="n">enter</span> <span class="o">{</span> <span class="n">vars</span><span class="o">.</span><span class="na">deploy</span> <span class="o">}</span> <span class="o">}</span>
        <span class="n">runtime</span> <span class="o">{</span> <span class="n">from</span> <span class="s1">'Build'</span> <span class="o">}</span>
        <span class="n">steps</span> <span class="o">{</span>
          <span class="n">sh</span> <span class="s1">'bundle exec rake push'</span>
        <span class="o">}</span>
      <span class="o">}</span>
    <span class="o">}</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This would then be later re-used within an Otto Pipeline by using the same <code class="language-plaintext highlighter-rouge">from</code> syntax as before:</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="o">{</span>
  <span class="n">from</span> <span class="s1">'blueprints/rubygem'</span>

  <span class="cm">/*
   * Optionally I could add additional post-deployment configuration here,
   * which would be ordered after the blueprint's stages have completed
   */</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">from</code> would be somewhat context-aware and would be able to pull all the
right stages “into place” within the pipeline. I’m optimistic that this
approach would allow the definition that includes just one stage for example,
or other blocks which can be defined within the <code class="language-plaintext highlighter-rouge">pipeline { }</code>.</p>

<p>I am not yet sure what the right mechanism for passing parameters into the
blueprint should be. Right now I am leaning towards keyword arguments on the
<code class="language-plaintext highlighter-rouge">from</code> directive: <code class="language-plaintext highlighter-rouge">from blueprint: 'blueprints/rubygem', rubyVersion: '2.6',
deploy: false</code>.I am not really sure what the implementation complexity of this
approach will bring however.</p>

<p><strong>Open Questions:</strong></p>

<ul>
  <li>Will treating <code class="language-plaintext highlighter-rouge">from</code> almost like a preprocessor directive allow the parser to successfully handle blueprints for arbitrary blocks of pipeline?</li>
  <li>Does this amount of composition alleviate the pressure that templates tend to solve for other systmes?</li>
</ul>

<h2 id="gates">Gates</h2>

<p>The final bit of syntax I wish to discuss at the moment are “gates.” One of the
least appreciated parts of just about every CD pipeline, gates define how the
pipeline should behave differently under certain conditions, including pausing
for user input or an external event.</p>

<p>From one of the modeling goals I had set:</p>

<blockquote>
  <p><strong>External interactions must be model-able.</strong> Deferring control to an external
system must be accounted for in a user-defined model. For example, submitting
a deployment request, and then waiting for some external condition to be made
to indicate that the deployment has completed and the service is now online.
This should support both an evented model, wherein the external service
“calls back” and a polling model, where the process waits until some external
condition can be verified.</p>
</blockquote>

<p>A contrived example of what this might look like for a pipeline which prepares
a deployment whenever changes land in the <code class="language-plaintext highlighter-rouge">main</code> branch:</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gates</span> <span class="o">{</span>
  <span class="n">enter</span> <span class="o">{</span> <span class="n">branch</span> <span class="o">==</span> <span class="s1">'main'</span> <span class="o">}</span>

  <span class="cm">/*
  * The exit block is where external stimuli back into the system
  * should be modeled, providing some means of holding back the pipeline
  * until the condition has been met
  */</span>
  <span class="n">exit</span> <span class="o">{</span>
    <span class="n">input</span> <span class="s1">'Does staging look good to you?'</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Of anything discussed thus far, gates have the <em>most</em> runtime implementation requirements. In the primitive example above we have:</p>

<ul>
  <li>A Git branch being referenced, which needs to be pulled into scope somehow/somewhere.</li>
  <li>An expression that needs to be evaluated in the service mesh before this stage of the pipeline is dispatched.</li>
  <li>An <code class="language-plaintext highlighter-rouge">input</code> step which should allow the agent which executed the stage to
deallocate <strong>and</strong> pause further execution of the pipeline until some
external event is provided.</li>
</ul>

<p>The last item is the most challenging for me to think about from an
implementation and modeling standpoint. Somewhere within Otto a state machine
for each pipeline must be maintained, and once an <code class="language-plaintext highlighter-rouge">input</code>, <code class="language-plaintext highlighter-rouge">webhook</code>, or some
other step is encountered, the state machine must pause for external actions.
How those external actions should be wired in? Not sure! How those steps should
be defined? Not sure!</p>

<p>There are so many open questions at this point.</p>

<p>Gates leave me with the most discomfort of any of my ideas for Otto. Done well,
gates could provide a key component missing from many existing tools. The
challenge is going to be finding the space between the pipeline modeling
language and the execution engine which will accommodate them.</p>

<hr />

<p>I still probably have more questions than answers at this point about how the
pipeline modeling syntax should be defined and how it should execute. The one
major lesson which I have learned from my time in the Jenkins project is that
the pipeline syntax cannot be improved in isolation from the execution
environment. There are many key design decisions which need to be made in both
domains which will have major repercussions in the other.</p>

<p>I think back to the word used by a developer who read my thoughts on what I
want to do with Otto:</p>

<p>“<strong>Ambitious</strong>.”</p>

<hr />

<p>As always, if you’re curious to learn more, you’re welcome to join <code class="language-plaintext highlighter-rouge">#otto</code> on the
<a href="https://freenode.net">Freenode</a> IRC network, or follow along on
<a href="https://github.com/rtyler/otto">GitHub</a></p>]]></content><author><name>R. Tyler Croy</name></author><category term="otto" /><category term="jenkins" /><category term="pipeline" /><category term="cicd" /><category term="ci" /><summary type="html"><![CDATA[Defining a good continuous integration and delivery pipeline syntax for Otto is one of the most important challenges in the entire project. It is one which I struggled with early in the project almost a year and a half ago. It is a challenge I continue to struggle with today, even as the puzzles pieces start to interlock for the multi-service system I originally imagined Otto to be. Now that I have started writing the parser, the pressure to make some design decisions and play them out to their logical ends is growing. The following snippet compiles to the current Otto intermediate representation and will execute on the current prototype agent implementation:]]></summary></entry><entry><title type="html">Moving again with Otto: Step Libraries</title><link href="https://brokenco.de//2020/10/18/otto-steps.html" rel="alternate" type="text/html" title="Moving again with Otto: Step Libraries" /><published>2020-10-18T00:00:00+00:00</published><updated>2020-10-18T00:00:00+00:00</updated><id>https://brokenco.de//2020/10/18/otto-steps</id><content type="html" xml:base="https://brokenco.de//2020/10/18/otto-steps.html"><![CDATA[<p>I have finally started to come back to <a href="https://github.com/rtyler/otto">Otto</a>,
an experimental playground for some of my thoughts on what an improved CI/CD
tool might look like. After setting the project aside for a number of months
and letting ideas marinate, I wanted to share some of my preliminary thoughts
on managing the trade-offs of extensibility. From my time in the <a href="https://jenkins.io">Jenkins</a> project,
I can vouch for the merits of a robust extensibility model. For Otto however, I wanted to implement something that I would call “safer” or “more scalable”, from the original goals of Otto:</p>

<blockquote>
  <p><em>Extensibility must not come at the expense of system integrity.</em> Systems which allow for administrator, or user-injected code at runtime cannot avoid system reliability and security problems. Extensibility is an important characteristic to support, but secondary to system integrity.</p>

  <p><em>Usage cannot grow across an organization without user-defined extension.</em> The operators of the system will not be able to provide for every eventual requirement from users. Some mechanism for extending or consolidating aspects of a continuous delivery process must exist.</p>
</blockquote>

<p>Starting with Jenkins and Jenkins Pipeline as a frame of reference. I do this
not only because I am intimately familiar with how it works, but also because
Jenkins Pipeline is the most successful and widely adopted pipeline modeling
language. Key to its success are “steps.” There are a number of default steps
provided by the system and new plugins introduced on the controller provide new
steps for users. The “execution environment” for steps in Jenkins Pipeline is
however incredibly confusing. If I were to interview a Jenkins developer or
administrator, I would give them a sample <code class="language-plaintext highlighter-rouge">Jenkinsfile</code> and ask them to explain
to me what is executing <em>where</em> as the pipeline progress. In essence, steps can
execute code on <em>both</em> the controller and the agents, hopefully with users
never knowing about the quirks of the runtime dance between the two.</p>

<p>For Otto’s pipeline language, I wanted steps to have a perfectly clear
execution environment: <strong>agent only</strong>. Along with this are a number of other requirements that I have in mind:</p>

<ul>
  <li>Language-independent: I want steps to be implemented in whatever language a
developer sees fit. Therefore the tooling needs remain flexible enough to
distribute and execute Python-based steps as well as native compiled steps.</li>
  <li>Statically verifiable: A step invocation in a pipeline should be verifiable
<em>without</em> actually executing the step. That is to say, it should be known
<em>before</em> execution whether parameters and types are correct.</li>
  <li>Lowest necessary privilege: Steps shouldn’t be able to “know” anything about
the system, credentials, configuration, etc, without an administrator or user
being aware. If a step needs to access a shared configuration variable, it
must self-declare that requirement. Steps should never be allowed to simply
poke around in global variables or configuration of the environment.</li>
</ul>

<p>The approach I’m settling on with “step libraries” is that each step is a
package (<code class="language-plaintext highlighter-rouge">.tar.gz</code>) containing a <a href="https://github.com/rtyler/otto/blob/d820a75ed5be8b1a400652ae518eae22db32d5d7/rfc/0011-step-library-format.adoc#manifest-file">manifest
file</a>
and whatever other assets it requires to execute. The manifest file contains
the description of the parameters, the entrypoint, and configuration values the
step may require.</p>

<p>At runtime, the step’s <code class="language-plaintext highlighter-rouge">entrypoint</code> will always be invoked with a single
<a href="https://github.com/rtyler/otto/blob/d820a75ed5be8b1a400652ae518eae22db32d5d7/rfc/0011-step-library-format.adoc#invocation-file">invocation
file</a>
that contains all the information necessary to execute the step correct. For
this I debated a couple different approaches: setting environment variables,
piping JSON data into the process, or even having the processes request a JSON
payload of data from a central server. I ultimately decided on the invocation
file approach since that requires the least system knowledge for the step to
actually be executed by an agent.</p>

<p>The role of the agent in this process remains fairly simple, regardless of which steps are being executed:</p>

<ul>
  <li>Consider the steps which it should execute. (e.g. <code class="language-plaintext highlighter-rouge">echo</code>, <code class="language-plaintext highlighter-rouge">sh</code>, <code class="language-plaintext highlighter-rouge">junit</code>)</li>
  <li>Retrieve the appropriate step library artifacts, originally this is going to be from a centralized store but I can easily imagine an agent retrieving “remote step libraries” in a distant future.</li>
  <li>Unpack the step libraries</li>
  <li>Validate that the step libraries support the parameters specified by the user’s pipeline.</li>
  <li>Iterate through the steps and execute the <code class="language-plaintext highlighter-rouge">entrypoint</code>.</li>
</ul>

<p>In <a href="https://github.com/rtyler/otto/commit/a5de9294aa4cbd75d8ea1cc6be6c4471786c7eb4">this
commit</a>
I managed to get something dumb and primitive working with this model. Excusing
the <code class="language-plaintext highlighter-rouge">STEPS_DIR</code> hack to avoid needing to reach out to fetch steps, the basic
test pipeline referenced in the commit contains the essence of how I believe
step libraries can provide a powerful and <em>safe</em> extensibility model for Otto.</p>

<hr />

<p>There are still a number of open questions I need to answer:</p>

<ul>
  <li>How will credentials be accessed by a step in a secure manner?</li>
  <li>How will I balance the trade-off of “bring your own step libraries” with
“don’t leak credentials.” Right now I’m thinking about “trusted” versus
“untrusted” step libraries, and everything user-defined would be untrusted
unless added to an “allow” list by an administrator.</li>
  <li>For more complex step parameters, like files, how well will the invocation file format hold up?</li>
  <li>How should steps affect the flow control of a pipeline? Conventionally a
non-zero exit of a step will halt the pipeline in Jenkins, but is there a
more granular flow control system that can be extended to steps which are
defined in a step library?</li>
</ul>

<hr />

<p>Despite sparingly little free time, I am enjoying getting back into this part
of Otto. I had let myself fall into a tar pit of distributed systems problems
and stalled any progress with Otto. Bringing the focus back to the pipeline
model and extensibility has allowed me re-focus on some of the challenges
unique to the CI/CD space.</p>

<p>If you’re curious to learn more, you’re welcome to join <code class="language-plaintext highlighter-rouge">#otto</code> on the
<a href="https://freenode.net">Freenode</a> IRC network, or follow along on
<a href="https://github.com/rtyler/otto">GitHub</a></p>]]></content><author><name>R. Tyler Croy</name></author><category term="rust" /><category term="otto" /><category term="cicd" /><summary type="html"><![CDATA[I have finally started to come back to Otto, an experimental playground for some of my thoughts on what an improved CI/CD tool might look like. After setting the project aside for a number of months and letting ideas marinate, I wanted to share some of my preliminary thoughts on managing the trade-offs of extensibility. From my time in the Jenkins project, I can vouch for the merits of a robust extensibility model. For Otto however, I wanted to implement something that I would call “safer” or “more scalable”, from the original goals of Otto:]]></summary></entry><entry><title type="html">Jenkins should not be the only line of defense</title><link href="https://brokenco.de//2019/04/15/trust-and-jenkins.html" rel="alternate" type="text/html" title="Jenkins should not be the only line of defense" /><published>2019-04-15T00:00:00+00:00</published><updated>2019-04-15T00:00:00+00:00</updated><id>https://brokenco.de//2019/04/15/trust-and-jenkins</id><content type="html" xml:base="https://brokenco.de//2019/04/15/trust-and-jenkins.html"><![CDATA[<p>This past week a missed security update contributed to a <a href="https://matrix.org/blog/2019/04/11/security-incident/">compromise at
Matrix.org</a>. As I have
<a href="/2017/08/07/jenkins-pipeline-shell.html">said</a>
<a href="/2019/02/14/untrusted-docker-workloads.html">before</a>, for purposes of
infrastructure design, it is prudent to consider CI/CD tools like Jenkins as
“remote code execution as a service.” In the <a href="https://cd.foundation">Continuous Delivery
world</a>, I think we have a serious problem with user
education around securely running CI/CD tools; anything which can touch
production represents a potential liability.</p>

<p>While these thoughts were bubbling around in my mind, I saw <a href="https://twitter.com/mukherjee_mk/status/1117585756095012864">this
tweet</a> from
<a href="https://mobile.twitter.com/mukherjee_mk">Mrinal Mukherjee</a>:</p>

<blockquote>
  <p><em>Many organisations tend to have separate non-production and production
instances of a deployment orchestrator
(<a href="https://twitter.com/jenkinsci">@jenkinsci</a>) to manage non-production
and production deployments respectively.  This, as opposed to a single instance
which handles both use-cases. Thoughts?</em></p>
</blockquote>

<p>In this post, I wanted to expand on <a href="https://twitter.com/agentdero/status/1117602907052888064">my
response</a>:</p>

<blockquote>
  <p><em>I tend to prefer two systems because it is rather difficult to totally and
completely secure credentials for production systems, when you give
developers “Pipeline as code” :)</em></p>

  <p><em>The “production” instance of Jenkins would typically just handle the last mile
of delivery.</em></p>
</blockquote>

<p>The unending trade-off infrastructure and tools developers must make is one of
flexibility versus reliability. While it would be nice to live in a world where
our automated systems allow code from individuals to fail in ways which do not
adversely impact customers, for the most part we have to draw the line in the
sand somewhere. Whether that is restricting access to networks, reducing the
scopes of credentials, or by segmenting systems entirely. I do not view this as
a problem, but a realistic approach to systems of safety.</p>

<p>My approach to this when structuring Jenkins infrastructure is to segment along
“non-production” and “production” systems. The non-production system has
non-production credentials, which have a low consequence if disclosed or
misused by developers who author a <code class="language-plaintext highlighter-rouge">Jenkinsfile</code>. The production system
however maintains production credentials, which are scoped to specific Folders
or Pipelines in Jenkins, and does not process pull requests or any code not
deemed fit for production, such as that in the <code class="language-plaintext highlighter-rouge">master</code> branch.</p>

<p>If you step back from Jenkins itself and consider an application which stores
highly valuable secrets, what would your <a href="https://en.wikipedia.org/wiki/Defence_in_depth">defense in
depth</a> strategy look like?
Running any app on a hostile network requires this kind of thinking. A
critical credential or bit of data living in an application which is a single
bug away from being exposed is simply bad design.  We take this approach
seriously in the Jenkins project, because we run a Jenkins environment on a
hostile network, also known as “the internet.”</p>

<p>In our case, there are Jenkins environments on the public internet, but the
Jenkins environments which hold deployment or production credentials are simply
<em>unroutable</em> on the public internet. By requiring a jump host or a VPN to
access the environment, it is simply impossible for an attacker who might be
scanning cloud provider’s address space to find and compromise the environment.
There are certainly other problematic avenues, but that’s where the “defense in
depth” comes in again. I’ve wrote some more tips on managing credentials in
Jenkins specifically in a previous blog post:
<a href="/2019/02/22/its-not-credentials-stealing.html">It’s not stealing when you’re giving them
away</a>. One of my favorite approaches is using tools like Hashicorp Vault which can
<a href="https://learn.hashicorp.com/vault/secrets-management/sm-dynamic-secrets">generate secrets
dynamically</a>,
making the leakage of credentials less impactful.</p>

<p>Regardless, it is absolutely critical to put services which have production
credentials, or keys which can lead to secondary levels of compromise behind
VPNs or other encrypted gateways. The public internet is a scary place, and if
you launch a Jenkins instance into AWS, Google Cloud, or Azure, I guarantee it
will be scanned without 10-15 minutes by script kiddies.</p>

<p>CI/CD tools represent an ideal attack vector not only for credentials, but for
other supply-chain attacks that could further compromise your end users.
Designing a layered and secure approach to running any CI/CD tool is incredibly
important for everybody shipping software today.  But generally, please don’t
let any single application be the sole line of defense between credentials or
user data, and the goblins running around on public networks.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opinion" /><category term="jenkins" /><category term="cicd" /><category term="otto" /><summary type="html"><![CDATA[This past week a missed security update contributed to a compromise at Matrix.org. As I have said before, for purposes of infrastructure design, it is prudent to consider CI/CD tools like Jenkins as “remote code execution as a service.” In the Continuous Delivery world, I think we have a serious problem with user education around securely running CI/CD tools; anything which can touch production represents a potential liability.]]></summary></entry><entry><title type="html">Securely running Docker workloads in your CI/CD environment</title><link href="https://brokenco.de//2019/02/14/untrusted-docker-workloads.html" rel="alternate" type="text/html" title="Securely running Docker workloads in your CI/CD environment" /><published>2019-02-14T00:00:00+00:00</published><updated>2019-02-14T00:00:00+00:00</updated><id>https://brokenco.de//2019/02/14/untrusted-docker-workloads</id><content type="html" xml:base="https://brokenco.de//2019/02/14/untrusted-docker-workloads.html"><![CDATA[<p>Over the past few years, the topic of architecture and security for CI/CD
environments has become among my favorite things to discuss with Jenkins users
and administrators. While security is an important consideration to include in
the design of <em>any</em> application architecture, with an automation server like Jenkins,
security is crucial in a much more fundamental way than a traditional
<a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a> app.
Walking that fine line between enabling arbitrary use-cases from developers and
preserving the integrity of the system is a particularly acute problem for
CI/CD servers like Jenkins.</p>

<p>In one of my previous “<a href="/2017/08/07/jenkins-pipeline-shell.html">old man yells at cloud</a>”
posts I concluded with:</p>

<blockquote>
  <p>People sometimes joke that Jenkins is “cron with a web UI”, but I will
typically refer to it as “remote code execution as a service.” A statement
which garners some uncomfortable laughs. If you’re not thinking of CI/CD
systems like Jenkins, GoCD, Bamboo, GitLab, or buildbot as such, you might be
sticking your head in the proverbial sand, and not adequately addressing some
important security ramifications of the tool.</p>
</blockquote>

<p>In this post I would like to outline some of the architectural and
security-oriented decisions I made for Docker-based workloads when rebuilding
<a href="https://ci.jenkins.io/blue/">ci.jenkins.io</a>, the
<a href="https://jenkins.io/">Jenkins</a> project’s own Jenkins environment, in 2016.</p>

<h3 id="requirements">Requirements</h3>

<p>For the vast majority of users, I think a Jenkins environment that doesn’t
support Docker is a glaring omission. Supporting container-based workloads in a
CI/CD environment, even if a production environment does <em>not</em> utilize Docker,
allows such a tremendous amount of flexibility for developers to <em>own</em> their
build and test environment.</p>

<p>The Docker horse has been beaten to death at this point; I don’t have much
interest in convincing people to adopt it, any more than I have a desire to
convince people to adopt writing tests, use source control, or any other
sensible development practices circa 2018.</p>

<p>Within the Jenkins project, our CI infrastructure requirements were/are
loosely:</p>

<ul>
  <li>Must be able to support elastic workloads to handle the periodic “thundering
herds” of re-testing Pull Requests. Some repositories, such as the <a href="https://github.com/jenkinsci/git-plugin">git plugin</a>
have a number of outstanding Pull Requests which must be re-tested when
commits are merged to the master branch, in order to ensure the commit status
(green checkmark) is still valid, and master is always passing tests. In
practice this means that a single merged Pull Request could create upwards of
50 Pipeline Runs at once.</li>
  <li>Should reduce, or eliminate, the potential for Pipeline Runs to contaminate
each other’s workspaces, or adversely affect the Docker environment for a
subsequent Pipeline Run using that daemon.</li>
  <li>Must allow developers to specify their own execution environment, in effect,
a developer must be able to “bring their own container” without prior
approval by an administrator.</li>
  <li>Potential “container escapes” must not seriously impact the performance,
security, or stability of other parts of the environment. While these are
rare, they <a href="https://www.openwall.com/lists/oss-security/2019/02/11/2">do
happen</a> as was the
case with this year’s CVE-2019-5736</li>
</ul>

<p>I don’t believe these to be necessarily unique requirements to the Jenkins
project, but rather general purpose requirements for any sizable organization.
That is to say, once a team or organization grows past the phase of “everybody
is admin” trust, these requirements likely apply.</p>

<p>For purposes of discussion, imagine the following Pipeline is our typical
workload, one which specifies its Docker environment, and then runs scripts
inside of that environment.</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="o">{</span>
    <span class="n">agent</span> <span class="o">{</span> <span class="n">docker</span> <span class="s1">'maven:3'</span> <span class="o">}</span>
    <span class="n">stages</span> <span class="o">{</span>
        <span class="n">stage</span><span class="o">(</span><span class="s1">'Build'</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">steps</span> <span class="o">{</span> <span class="n">sh</span> <span class="s1">'mvn'</span> <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="options">Options</h3>

<p>The learning curve around the options in the container ecosystem can be quite
steep, there are a plethora of options and not all of them are safe, secure, or
reliable for “untrusted” workload requirements. The inventory in this post is
<strong>not comprehensive</strong> but rather a listing of options which I have personally
evaluated.</p>

<h4 id="docker-the-easy-but-not-the-smartest-way">Docker: the easy, but not the smartest way</h4>

<p>The most common pattern I have seen from Jenkins users in the wild has been to
use the Docker daemon on the Jenkins master instance to run their workloads.
For untrusted workloads this is a <strong>bad idea</strong>. Setting aside the potential
performance impacts of running workloads on the same machine as the Jenkins
master, let’s focus on the security aspect.</p>

<p>Jenkins stores all of its configuration, logs, and secrets on disk, usually in
<code class="language-plaintext highlighter-rouge">/var/lib/jenkins</code>. While secrets are encrypted on disk, elsewhere on the
file system, the key for decrypting those secrets is stored. In essence, this
means that once an untrusted user has access to the Jenkins master’s file
system, it’s as good as compromised.</p>

<p>When the Docker daemon (<code class="language-plaintext highlighter-rouge">dockerd</code>) runs, it is effectively running as root. If
a user can launch a Docker container, that is functionally equivalent to
granting them root access to the machine. I do not consider this a bug in
Docker however, replicating the entire access control subsystem from Linux in
<code class="language-plaintext highlighter-rouge">dockerd</code> would be impractical.</p>

<p>Plainly put, it is not safe to allow untrusted workloads, Docker or otherwise,
to execute on the Jenkins master’s instance. We regularly advise people to set
the number of executors for the master node to zero to help avoid this security
pothole.</p>

<p>It <em>is</em> possible to configure Docker-based agents in Jenkins, which run on the
master, but are not user-defined in Jenkins Pipeline. These <em>can</em> be safer, but
are still susceptible to container escape vulnerabilities, and <em>will</em> result in
performance problems as workloads and the Jenkins master compete for memory and
compute time.</p>

<h4 id="docker-swarm">Docker Swarm</h4>

<p>Another option considered was using a scalable <a href="https://docs.docker.com/engine/swarm/">Docker Swarm</a>
cluster for running the untrusted workload containers. What is interesting
about Docker Swarm, is that it can be relatively easy to enable a cluster of
machine which have the Docker engine installed. At the time when our
environment was built out, it was however not mature enough for me to trust it.
In addition, it didn’t quite match our infrastructure model. At no point have
we had latent capacity waiting  to be enabled, but rather we have had a
strongly managed environment between Puppet and Jenkins.</p>

<p>Docker Swarm, and Kubernetes for that matter both have a usability flaw in
Jenkins Pipeline. In order to use the <code class="language-plaintext highlighter-rouge">agent { docker 'maven:3' }</code> syntax,
Jenkins needs to be able to execute <code class="language-plaintext highlighter-rouge">docker run</code> <em>somewhere</em>. But that
somewhere must already be running a Jenkins agent. Unfortunately Jenkins is not
smart enough at the moment to see that the Pipeline wants to run an image and
use a configured orchestration engine, without the user needing to consider
what Docker-in-Docker, or JNLP agent hacks might be necessary. This problem
gets even more hairy if your workloads need to <em>build</em> Docker containers at any
point. This topic is one I have devoted substantial effort to, and am happy to
discuss separately, but suffice it to say for this blog post: Jenkins and
orchestrators is suboptimal at best right now.</p>

<h4 id="kubernetes">Kubernetes</h4>

<p>Kubernetes is another option considered at the time. The Jenkins project
currently runs a non-trivial amount of infrastructure in Kubernetes in
production, and I am quite pleased with it. I still do not believe it is the
appropriate basis for a CI infrastructure like ours, wherein we must run
untrusted workloads.</p>

<p>First considering the performance: Kubernetes itself is relatively
low-overhead, but tends to operate with a fixed-size cluster. While there are
options in some public clouds to auto-scale Kubernetes, I don’t frequently see
that enabled. From my experience, CI workloads are incredibly compute heavy. At
one point our cloud provider contacted us to let us know that they believed
some of our dynamically provisioned VMs were compromised by cryptominers. From
their perspective, the behavior of a high-intensity Jenkins build looked
similar to cryptomining! The manner in which Kubernetes schedules containers
works very well for different types of workloads, packing one compute heavy
container on a node with other containers which do not have the same
requirements allows for an ideal and efficient use of compute resources. When
everything will heavily utilize one dimension, such as the CPU of the
underlying computer, the benefits of Kubernetes’ resource allocation dwindle.</p>

<p>From the security standpoint, I believe Kubernetes can be used safely for CI
workloads. The mistake that I most frequently see is mixing the “management
plane” (Jenkins master) with user-defined workloads (agent pods). Running both
on the same Kubernetes infrastructure is a 
fundamental failure of isolation and <strong>will</strong> result in compromise. Any
eventual bypass may allow access to the underlying Kubernetes API, from there
it would be trivial to schedule new workloads, or attach the persistent volume
from the Jenkins master. I do not consider this to be a theoretical problem, as
my understanding of Kubernetes is that it was never designed to be a multi-tenant
orchestrator. <a href="https://blog.jessfraz.com/post/secret-design-docs-multi-tenant-orchestrator/">Jess Frazelle has an interesting design for one
however!</a></p>

<p>Another security wrinkle arises if the cluster needs to support <em>building</em> of
Docker containers. To the best of my knowledge, this requires either
Docker-in-docker hacks, or more commonly, pass-through access to the Kubernetes node’s
Docker socket. Once that socket has been passed through from the node to an
untrusted container, it’s a relatively trivial exercise to use that socket to
access and peek at any other workload on that specific Kubernetes node. As
alluded to above in the case of using Docker on the Jenkins master: <strong>never allow untrusted workloads access to a trusted Docker
socket</strong>.</p>

<p>This is not to say that you should never use Kubernetes with Jenkins. For
internal deployments, with different threat models and trust characteristics,
Jenkins and Kubernetes can work quite successfully together. As is usually the
case with security and infrastructure design, the devil is in the details.</p>

<h4 id="actually-docker-in-docker">Actually Docker-in-Docker</h4>

<p>Running Docker inside of Docker on top of the orchestrators described above was
something I considered as well. At the time of the design of ci.jenkins.io, the
stability of Docker-in-docker approaches was highly questionable. <a href="https://medium.com/hootsuite-engineering/building-docker-images-inside-kubernetes-42c6af855f25">This may be
different
nowadays</a>,
and might be worth reconsidering for newer system designs.</p>

<h4 id="docker-the-hard-but-perhaps-the-most-reliable-way">Docker: the hard, but perhaps the most reliable way</h4>

<p>The design that I ultimately chose, which is still in place today, I think of
as “Docker the hard way.” Jenkins dynamically provisions fresh VMs in the
cloud, installs Docker on them, and then launches its agent. This has numerous
benefits from a security, isolation, and performance standpoint. Workloads get
dedicated high-performance compute capacity, and if any of those workloads
tries to do something nefarious, the impact is isolated to that single machine
which is usually deprovisioned shortly after the workload has finished
executing.</p>

<p>This isolation does come at a cost however. The time-to-available can be
multiple minutes, meaning the cluster cannot rapidly grow when that “thundering
herd” problem occurs. The actual infrastructure cost is also non-trivial. Our
Jenkins infrastructure is the most costly part of our infrastructure right now.
While anything “big and beefy” is going to be expensive in the cloud, the
time-overhead to request, provision, and de-provision has a real financial
impact.</p>

<hr />

<p>Running untrusted workloads in a CI environment is not a requirement isolated
to large environments like the Jenkins project. Most organizations really
should treat their CI environment as if it were “untrusted”, not because there
are malicious actors internally, but the same design considerations to minimize
the impact of malice, also have the beneficial effect of preventing errors or
incompetence from destabilizing the CI system. If a new developer in the
organization, can accidentally brick the CI/CD environment, that will most
certainly be disruptive and costly for the org.</p>

<p>There are other concerns which are not accounted for in this post, which I
would like to make special mention of as they’re worth considering:</p>

<ul>
  <li>Runaway resource utilization: presently in Jenkins it is rather difficult to
globally restrict how much time, or resource, a Jenkins Pipeline is able to
allocate. We have strived to make it easy to developers to do the right thing,
but must remain vigilant, keeping an eye out for Pipelines which have locked
up or are stuck in infinite loops. While rare, these still can tie up
resources, and time is money when operating in the public cloud!</li>
  <li>Secrets management with Pipelines: inevitably some Pipelines will need an API
token, or credential in order to access or push to a given system. Jenkins
has some support for separating credentials but the audit and access control
functionality is currently lacking, making it difficult to delegate trust in a
mixed trust environment. An easy workaround is to put trusted credentials in
another Jenkins environment, which is exactly what we do in the Jenkins
project, but is a worthy subject of another post entirely.</li>
</ul>

<p>Future iterations on our environment will likely incorporate a mixture of VMs
and container services to balance speed and security more effectively. Not all
workloads need Docker, some just need Maven, Node, etc. More efficiently
balancing the disparate requirements of the hundreds of Jenkins project
repositories which rely on ci.jenkins.io is slated for “version 2” of this
infrastructure. :)</p>

<p>Overall, using containers in any CI/CD environment, at this point I would
consider an absolute must. The challenge for system administrators, as it
usually ends up, is balancing cost, security, and flexibility for users.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="cicd" /><category term="jenkins" /><category term="docker" /><category term="opinion" /><category term="security" /><summary type="html"><![CDATA[Over the past few years, the topic of architecture and security for CI/CD environments has become among my favorite things to discuss with Jenkins users and administrators. While security is an important consideration to include in the design of any application architecture, with an automation server like Jenkins, security is crucial in a much more fundamental way than a traditional CRUD app. Walking that fine line between enabling arbitrary use-cases from developers and preserving the integrity of the system is a particularly acute problem for CI/CD servers like Jenkins.]]></summary></entry><entry><title type="html">Get excited for the Continuous Delivery Foundation</title><link href="https://brokenco.de//2019/01/31/lets-go-cdf.html" rel="alternate" type="text/html" title="Get excited for the Continuous Delivery Foundation" /><published>2019-01-31T00:00:00+00:00</published><updated>2019-01-31T00:00:00+00:00</updated><id>https://brokenco.de//2019/01/31/lets-go-cdf</id><content type="html" xml:base="https://brokenco.de//2019/01/31/lets-go-cdf.html"><![CDATA[<p>Not knowing what I was getting myself into, about eleven years ago I started
contributing to what became known as the Jenkins project. What followed has
been nothing short of incredible; hundreds of new contributors, tens of
thousands of new users, and millions of executed pipelines. Growth is
challenging. Growth means new problems which demand new solutions.  Two and a
half years ago I stood in front of a large group of contributors at the 2017
Jenkins World Contributor Summit and made a pitch for what I called a “Jenkins
Software Foundation”, never shy to pilfer ideas from the
<a href="https://python.org">Python</a> community. With help from my pal <a href="https://twitter.com/cra">Chris
Aniszczyk</a> and the Linux Foundation, the concept
morphed into something far more comprehensive the <strong>Continuous Delivery
Foundation</strong> (CDF), for which my colleague <a href="https://github.com/tracymiranda">Tracy
Miranda</a> has been leading the charge, helping
drive the founding of the CDF.</p>

<p><a href="https://github.com/kohsuke">Kohsuke</a> wrote up a good <a href="https://groups.google.com/d/msgid/jenkinsci-dev/CAN4CQ4z%2BQzaBc1pDtciKXH%3DMhB3vUR%3DCShiFbwy__2W6eEH_EQ%40mail.gmail.com">overview post for the
jenkinsci-dev@ mailing
list</a>
which spells out the reasons why the Jenkins project should join the Continuous
Delivery Foundation once it has been established. For those interested in the
Jenkins project, I encourage you to take the time to read Kohsuke’s mail if you
have not already. In <em>this</em> post, I wanted to share some of the reasons that <em>I</em>
am excited to help establish the Continuous Delivery Foundation (CDF).</p>

<p>Continuous Delivery (CD) has been an integral part of my career, something which I
learned early and became passionate about, even before it was so clearly
characterized by Jez Humble. I view it to be so fundamental to the practice of
software development, that I have started to react like a puzzled puppy when
somebody says they don’t practice CI or CD. Imagine if somebody said “eh,
we’ve got a project to adopt Source Control here, but the executives aren’t
really convinced yet.” Your eye would twitch and your jaw would drop. “How can
any organization not use Source Control in this day and age?!” I believe CD is
<em>that</em> fundamental to modern software development.</p>

<p>Continuous Delivery is also <strong>not</strong> the domain a single tool like Jenkins, but
rather relies on many tools working together in concert.  While I might put
Jenkins at the center of it all, it is by no means the only pretty face in the
picture. Unfortunately, many open source communities like Jenkins tend to have
a necessarily narrower view of their world. They focus on <em>their</em> thing, which
makes sense, but this can result in missed opportunities for incredibly
valuable cross-over episodes.</p>

<p>Many of the tools we rely on for CD are supported wholly, or in part by
different vendors as well. Jenkins receives substantial investment from
CloudBees, as well as Microsoft and Red Hat to name a few. In the last five
years, I have come to understand how and why foundations such as the CDF, can
act as neutral territory for these different companies. By providing corporate
contributors a set of guidelines, rules, and expectations, open source
projects stand a much greater chance of eliciting support from them. Whether
it’s advocacy, code, or cash, helping bring corporate contributors under the
same neutral tent as the rest of us helps ensure the longevity of open source
efforts. The added benefit of the rules set forth by the foundation is that
corporate actors cannot overrun one another or individual contributors,
intentionally or otherwise.</p>

<p>In the earlier days of free and open source projects, we deluded ourselves into
thinking that everybody would read our licenses, subscribe to our “open source
ethos”, file and fix issues, and contribute code back upstream. The reality is
that it takes a lot more to <em>operate</em> large open source communities. It takes
<em>people</em>, it takes <em>infrastructure</em>, and it takes <em>money</em>. Foundations like the
CDF provide a means for organizations which depend on, or are otherwise
invested in projects, to participate in a meaningful way. The Jenkins
project runs on a shoe-string budget. We spend no more than $10-15k annually.
If we were to tabulate the value of our donated assets, free services, or any
of the other things I have managed to beg for over the past eleven years, that
number would be closer to 60-80k annually. Kohsuke can attest to my ability to
beg for free stuff for the Jenkins project, but free stuff is not guaranteed
year to year. In order to grow, Jenkins needs a stable budget which we can
invest in services and <strong>people,</strong> similar to larger foundations like the
<a href="https://www.freebsdfoundation.org/what-we-do/grants/">FreeBSD Foundation</a>.</p>

<p>If you find yourself worried about the sustainability of open source,
looking at different community homes, crowd-funding, or other ideological tools such
as licensing changes, let me help you out. What makes large open source projects
sustainable is a consistent budget. Because underneath it all, what makes open source projects “go”
is <em>people</em>. Ensuring talented writers, developers, marketers, testers, and
designers continue to contribute means that their employers have to invest time
on their behalf, or they need to be paid through other means. I strongly
believe that open source foundations provide a path for larger free and open
source projects to solve that fundamental problem of <em>budget</em>.</p>

<p>The Continuous Delivery Foundation is not yet launched, but I’m already excited
for its potential. Not only for the Jenkins project, but for the entire domain
of continuous delivery.</p>

<p>It’s about time.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="cdf" /><category term="cicd" /><category term="jenkins" /><category term="opensource" /><summary type="html"><![CDATA[Not knowing what I was getting myself into, about eleven years ago I started contributing to what became known as the Jenkins project. What followed has been nothing short of incredible; hundreds of new contributors, tens of thousands of new users, and millions of executed pipelines. Growth is challenging. Growth means new problems which demand new solutions. Two and a half years ago I stood in front of a large group of contributors at the 2017 Jenkins World Contributor Summit and made a pitch for what I called a “Jenkins Software Foundation”, never shy to pilfer ideas from the Python community. With help from my pal Chris Aniszczyk and the Linux Foundation, the concept morphed into something far more comprehensive the Continuous Delivery Foundation (CDF), for which my colleague Tracy Miranda has been leading the charge, helping drive the founding of the CDF.]]></summary></entry><entry><title type="html">Crawling towards continuous delivery for Jenkins</title><link href="https://brokenco.de//2018/08/30/cd-for-the-jenkins-project.html" rel="alternate" type="text/html" title="Crawling towards continuous delivery for Jenkins" /><published>2018-08-30T00:00:00+00:00</published><updated>2018-08-30T00:00:00+00:00</updated><id>https://brokenco.de//2018/08/30/cd-for-the-jenkins-project</id><content type="html" xml:base="https://brokenco.de//2018/08/30/cd-for-the-jenkins-project.html"><![CDATA[<p>This year I’ve been working on an ambitious new project referred to as Jenkins
Evergreen. It is ambitious in that we’re aiming to significantly alter the way
in which <a href="https://jenkins.io/">Jenkins</a> is downloaded, updated, and used.
In most visible ways Evergreen is the same as a traditional Jenkins
installation, but the way it is assembled into a package and delivered is
radically different. Among the many challenges which the Evergreen project must tackle,
there is one problem in common with most other organizations:
<strong>how do you take a big, complex system, and make it continuously deliverable</strong>.</p>

<p>Long story short: very carefully.</p>

<h2 id="the-old-way">The Old Way</h2>

<p>Jenkins follows a pretty typical development and release process,
we chat on mailing lists, open up loads of
pull requests, merge some of them, and then release binary packages at
prescribed intervals.
Users are then expected to know an update has occurred, run some program to check for
updates (<code class="language-plaintext highlighter-rouge">apt-get update</code>) and install the updates. From the user’s
perspective, each release might contain a lot of relevant or important changes,
or it might contain completely trivial ones. Depending on the release line, a
bug identified and fixed may take anywhere from one to a few weeks before
it’s made available. Then of course, the user must go through the update song
and dance once more. This common <strong>release train model</strong> sucks for users and, I
would argue, for developers too..</p>

<p>Jenkins has an additional complication: it is plugin-based, and all those
plugins are developed and released largely independently from one another.
Jenkins “core” by itself isn’t very useful at all. It is those <em>plugins</em> which
make Jenkins a joy (and sometimes a pain) to use. For all intents and purposes,
plugins <em>also</em> follow the <strong>release train model</strong> (which sucks for users), but
with the bonus feature of requiring users to check for updates through the
built-in Update Center rather than through the same distribution mechanism as
Jenkins core.</p>

<p>Altogether, this leads to large numbers of Jenkins users never updating their
systems.</p>

<p>Nonetheless, the release train model has helped Jenkins grow to where it is
today. The times however, have changed.</p>

<p>Expectations around how we consume and operate our software have changed
radically in the past decade. It is my steadfast opinion that the <strong>release
train model</strong> is now a legacy which we should all be leaving behind.</p>

<h2 id="the-new-way">The New Way</h2>

<p>The model for Jenkins Evergreen is completely different. Rather than a
time-based pull model (also known as the release train model), it provides an on-demand <em>push</em> model. As I described in the
design overview document,
<a href="https://github.com/jenkinsci/jep/tree/master/jep/300">JEP-300</a>:</p>

<blockquote>
  <p>Jenkins Evergreen will be distributed as an automatically
self-updating distribution, containing Jenkins core and a version-locked set
of plugins considered “essential.” Rather than attempting to mirror the
existing Weekly and LTS release lines for core, plus some plugin version
matrix, Jenkins Evergreen will update in a manner similar to Google Chrome.</p>

  <p>For Jenkins end users, this automatically updating distribution will mean
that Jenkins Evergreen will require significantly less overhead to manage,
receiving improvements and bug fixes without any user involvement.</p>
</blockquote>

<p>Fundamentally, Jenkins Evergreen is about building the machinery to practice
Continuous Delivery with Jenkins itself. The argument for
Continuous Delivery is that smaller releases are <strong>safer</strong> than
big-bang releases. Risk is amortized, and the tooling and habits of
releasing often result in higher-quality software.</p>

<p><strong>Jenkins needs Continuous Delivery</strong>.</p>

<hr />

<p>How on earth do we get from the release train model (which sucks for
users), to something more continuously delivered?</p>

<p>Very carefully!</p>

<p>Like most transitions to continuous delivery, Jenkins Evergreen requires a
significant amount of ground work in our existing code bases before new code
adopts the Evergreen distribution model.</p>

<h3 id="incremental-releases">Incremental Releases</h3>

<p>My colleague Jesse wrote a pretty in-depth article on a new pattern we’ve
introduced into the Jenkins project, generally referred to as <a href="https://jenkins.io/blog/2018/05/15/incremental-deployment/">incremental
releases</a>.
Jenkins core and plugins are all Java projects which have rich
<a href="https://maven.apache.org">Maven</a> metadata describing their interdependencies.</p>

<p>In the release train model the velocity of of changes, and version bumps,
required for any given plugin will be fairly minimal. In the release train
model, it is <em>okay</em> to create a pull request to Plugin B, wait for that to be
released, then update Plugin A, to depend on that change, and then wait for
that to be released. In the release train model it is <em>okay</em> to wait for weeks
on end before users see the effects of changes.</p>

<p>In 2018 however, that long cycle time is <strong>not okay</strong>.</p>

<p>Incremental releases allow for plugins to produces artifacts built from pull
requests, or branches, and for those artifacts to be published to a special
<code class="language-plaintext highlighter-rouge">incrementals</code> Maven repository. From that repository, incremental releases of
artifacts can be subsequently consumed by other tooling.</p>

<p>In the case of Jenkins Evergreen, this allows us to craft a distribution with
changes that are hot off the presses, using another foundational component: the
Bill of Materials.</p>

<p>If you’re curious about the design of incremental releases, consult
<a href="https://github.com/jenkinsci/jep/tree/master/jep/305">JEP-305</a> which outlines
their design.</p>

<h3 id="the-bill-of-materials">The Bill of Materials</h3>

<p>Curation is a key component of any continuous delivery system. We do not
necessarily want any old commit to be released all the way through to
“production.” Instead we want a means to describe what versions of which
components are safe to proceed through the pipeline.</p>

<p>As described in
<a href="https://github.com/jenkinsci/jep/blob/master/jep/309">JEP-309</a>, the Bill of
Materials gives us a means of describing a combination of Jenkins core and plugins, which should
be delivered together. This specification is currently being used by multiple
parts of the Jenkins project where we have a similar need to test across
multiple components and repositories. In Evergreen it is taken much further.</p>

<p>The Bill of Materials describes what code will be delivered to a Jenkins
Evergreen instance, and the Evergreen distribution system will attempt to
ensure that <em>all</em> instances are at the same exact version of that Bill of
Materials.  The Evergreen distribution system treats all instances as if they
were part of as single fleet, similar to how SaaS applications are deployed.</p>

<p>This homogeneity addresses a fundamental problem with plugin-based ecosystems
like Jenkins’s: an explosion of possible installed combinations of software
across all user installations. The large variety of plugin combinations
possible “in the wild” makes bug reporting and reproduction difficult, and
serious pre-release acceptance testing <em>practically impossible</em>. In many cases,
the first time certain combinations will ever be executed together will be on
the user’s installation</p>

<h3 id="feedback">Feedback</h3>

<p>The final logical piece of the puzzle which any continuous delivery pipeline
requires is <em>feedback</em>. Much as it pains me to say this, current releases of
Jenkins provide no automated feedback to the Jenkins project on whether they
are operating successfully. No automated crash reports. No error logs. No
analytics. Nothing. The <em>only</em> two ways that a Jenkins contributor will ever
learn about a bug in their plugin or core is if:</p>

<ol>
  <li>They see it themselves.</li>
  <li>A user actually takes the time to manually report it.</li>
</ol>

<p>Regrettably, this is also the case for tons of free and open source software,
and it’s an absolute shame.</p>

<p>With Jenkins Evergreen, basic error reporting is built in by default. We have
integrated with <a href="https://sentry.io">Sentry</a> for collecting errors automatically
from Jenkins Evergreen installations without any required user involvement. In
the future I’m sure we’ll add more advanced feedback mechanisms, but at the
moment a blurry picture of how Jenkins is running “in the real world” is tons
better than flying blind.</p>

<hr />

<p>Jenkins, like any large piece of software which has grown over a long period of
time, has its flaws. After a couple beers, I could tell you about some of the
skeletons in its closet, but on the whole I don’t believe Jenkins is inherently
broken, or a lost cause. In fact, I believe that Jenkins is likely now more
important than ever. With the practices of continuous integration and
continuous delivery becoming a core part of every software project, a flexible
and customizable open source tool like Jenkins is increasingly important.</p>

<p><a href="https://github.com/jenkins-infra/evergreen">Jenkins Evergreen</a> is my vision of
how we get to a better future with Jenkins. By continuously delivering Jenkins,
I believe we will be able to improve the user experience, alleviate troublesome
bugs, and make Jenkins even more accessible to new developers.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="jenkins" /><category term="maven" /><category term="continuousdelivery" /><category term="cicd" /><summary type="html"><![CDATA[This year I’ve been working on an ambitious new project referred to as Jenkins Evergreen. It is ambitious in that we’re aiming to significantly alter the way in which Jenkins is downloaded, updated, and used. In most visible ways Evergreen is the same as a traditional Jenkins installation, but the way it is assembled into a package and delivered is radically different. Among the many challenges which the Evergreen project must tackle, there is one problem in common with most other organizations: how do you take a big, complex system, and make it continuously deliverable.]]></summary></entry><entry><title type="html">Enforcing administrative policy in Jenkins, the hard way</title><link href="https://brokenco.de//2018/01/05/jenkins-policy-enforcement.html" rel="alternate" type="text/html" title="Enforcing administrative policy in Jenkins, the hard way" /><published>2018-01-05T00:00:00+00:00</published><updated>2018-01-05T00:00:00+00:00</updated><id>https://brokenco.de//2018/01/05/jenkins-policy-enforcement</id><content type="html" xml:base="https://brokenco.de//2018/01/05/jenkins-policy-enforcement.html"><![CDATA[<p>One foggy morning a few weeks ago, I received a disk usage alert courtesy of
the Jenkins project’s infrastructure on-call rotation. In every infrastructure
ever, disk usage alerts seem to be the most common alert to crop up, something
<em>somewhere</em> is not properly cleaning up after itself. This time, the alert was
from our own <a href="https://ci.jenkins.io/">Jenkins environment</a>. The logging
filesystem wasn’t the problem, the filesystem hosting <code class="language-plaintext highlighter-rouge">JENKINS_HOME</code> was
perilously close to running out of space. The local time, about 6:20 in the
morning, and yours truly was quietly furious at the back of a bus headed into
San Francisco for the day.</p>

<p>To put it delicately, Jenkins has always been a pain for Systems
Administrators. What was originally a huge selling point, the WYSIWYG
configuration screens, over time, and thanks to the healthy adoption of
“infrastructure as code” tooling such as Puppet, has become a weakness. With the
introduction of “Pipeline as Code” as a core concept in Jenkins 2,
circa 2016, the problem was even further exacerbated.  Empowering developers
with some level of code-driven autonomy is now a key aspect of any modern
development tool, but without corresponding tooling and controls for
administrators, such autonomy rapidly leads to chaos.</p>

<p>Back on the bus ride, the usage of <code class="language-plaintext highlighter-rouge">JENKINS_HOME</code> slowly inched towards 100%. A
quick analysis indicated that most of the disk space was being occupied by
what any capable Jenkins admin would expect:</p>

<ul>
  <li>Old archived artifacts.</li>
  <li>Old test reports.</li>
  <li>Old console logs.</li>
</ul>

<p>With Jenkins Pipeline, developers have control. To the detriment of
administrators like me, who have no (<em>simple</em>) means to systematically enforce
things like log rotation.</p>

<p>That doesn’t mean administrators are left entirely out in the cold, but rather
we have to enforce administrative policy <strong>the hard way</strong>.</p>

<h3 id="scripting-jenkins">Scripting Jenkins</h3>

<p>Jenkins has support for built-in <a href="http://groovy-lang.org">Groovy</a> scripting,
which is the usual solution for enforcing administrative policy in Jenkins.
In order to rectify the disk usage situation, I wrote a little snippet of
Groovy which will forcefully purge <strong>all but the last 5 runs</strong> of every
Pipeline in the “Plugins” folder on the system:</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Jenkins</span><span class="o">.</span><span class="na">instance</span><span class="o">.</span><span class="na">items</span><span class="o">.</span><span class="na">each</span> <span class="o">{</span> <span class="n">f</span> <span class="o">-&gt;</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">f</span><span class="o">.</span><span class="na">name</span> <span class="o">==</span> <span class="s1">'Plugins'</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">f</span><span class="o">.</span><span class="na">items</span><span class="o">.</span><span class="na">each</span> <span class="o">{</span> <span class="n">p</span> <span class="o">-&gt;</span>
            <span class="cm">/* each  p is really a Multibranch Pipeline, which looks like a
             * folder, so need to iterate over its items */</span>
            <span class="n">p</span><span class="o">.</span><span class="na">items</span><span class="o">.</span><span class="na">each</span> <span class="o">{</span> <span class="n">pipeline</span> <span class="o">-&gt;</span>
                <span class="k">if</span> <span class="o">(</span><span class="n">pipeline</span><span class="o">.</span><span class="na">builds</span><span class="o">.</span><span class="na">size</span><span class="o">()</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="o">)</span> <span class="o">{</span>
                    <span class="n">println</span> <span class="s2">"Deleting from ${p}"</span>
                    <span class="cm">/* Delete runs older than the last five */</span>
                    <span class="n">pipeline</span><span class="o">.</span><span class="na">builds</span><span class="o">[</span><span class="mi">5</span> <span class="o">..</span> <span class="o">-</span><span class="mi">1</span><span class="o">].</span><span class="na">each</span> <span class="o">{</span> <span class="n">it</span><span class="o">.</span><span class="na">delete</span><span class="o">()</span> <span class="o">}</span>
                <span class="o">}</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Scary! Right now I have only added this little Groovy script to the
infrastructure team’s runbooks. If I wanted to enforce this more
systematically, I would add file to the <code class="language-plaintext highlighter-rouge">init.groovy.d/</code> directory on the
Jenkins master.</p>

<h4 id="initgroovyd">init.groovy.d</h4>

<p>Many administrators aren’t aware of the <code class="language-plaintext highlighter-rouge">init.groovy.d/</code> directory, which can
be added to <code class="language-plaintext highlighter-rouge">JENKINS_HOME</code>. The <em>really really</em> useful characteristic of Groovy
scripts added to <code class="language-plaintext highlighter-rouge">init.groovy.d/</code> is that they are executed after Jenkins
plugins are loaded, but before Jenkins is “ready” and starts accepting web
requests or executing workloads. These qualities make <code class="language-plaintext highlighter-rouge">init.groovy.d/</code> an ideal
place to insert scripts which:</p>

<ul>
  <li><strong>Clean up the filesystem</strong>, such as with my forceful log rotation script
referenced above.</li>
  <li><strong>Enforce security policy</strong>, like my Groovy scripts which <a href="https://github.com/CodeValet/master/blob/master/init.groovy.d/disable-cli.groovy">disable the
Jenkins CLI</a>, or <a href="https://github.com/CodeValet/master/blob/master/init.groovy.d/setup-github-oauth.groovy">configure GitHub OAuth-based authentication and authorization</a>.</li>
  <li><strong>Configure monitoring tooling</strong>, such as <a href="https://github.com/CodeValet/master/blob/master/init.groovy.d/configure-datadog.groovy">the Datadog
plugin</a></li>
  <li><strong>Pre-configure Pipeline Libraries</strong>, like those which should be <a href="https://github.com/CodeValet/master/blob/master/init.groovy.d/pipeline-global-configuration.groovy">enabled
globally for all Pipelines</a></li>
</ul>

<p>As I mentioned in my previous post <a href="/2017/07/24/groovy-automation-for-jenkins.html">Developing Groovy Scripts to Automate
Jenkins</a>, creating these
scripts requires a <strong>lot</strong> of knowledge about how Jenkins works on the inside.
While this is definitely “the hard way,” the end result is a much more
automated and manageable Jenkins environment.</p>

<p>To learn more about scripting Jenkins, I highly recommend the talk embedded
below, given by my pal Sam Gleske at Jenkins World 2017.</p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/qaUPESDcsGg" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen=""></iframe><br /></center>

<h3 id="scripting-pipeline">Scripting Pipeline</h3>

<p>In my previous post <a href="/2017/08/03/overriding-builtin-steps-pipeline.html">Overriding steps in Pipeline with Shared Library sleight
of hand</a>, I discussed another
option for enforcing administrative policy: overriding Pipeline steps. While I
won’t repeat too much, I do wish to point out a very useful pattern to
consider: enforcing timeouts on built-in steps. Take the <code class="language-plaintext highlighter-rouge">sh</code> step as an
example, by default in Jenkins there is no built-in, configurable or otherwise,
way to constrain the time spent by a step. This means a malicious or
incompetent developer can run script which performs an infinite loop,
wastefully tying up resources in the Jenkins environment.</p>

<p>By overriding the <code class="language-plaintext highlighter-rouge">sh</code> step, I can wrap it with a 2 hour timeout safe-guard as
is implemented below. Once the Shared Library has been implicitly loaded in the
Global Pipeline Libraries configuration, developers won’t notice any changes,
but the beleaguered administrator will sleep a bit easier at night.</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">def</span> <span class="nf">call</span><span class="o">(</span><span class="n">Map</span> <span class="n">params</span> <span class="o">=</span> <span class="o">[:])</span> <span class="o">{</span>
    <span class="n">String</span> <span class="n">script</span> <span class="o">=</span> <span class="n">params</span><span class="o">.</span><span class="na">script</span>
    <span class="n">Boolean</span> <span class="n">returnStatus</span> <span class="o">=</span> <span class="n">params</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s1">'returnStatus'</span><span class="o">,</span> <span class="kc">false</span><span class="o">)</span>
    <span class="n">Boolean</span> <span class="n">returnStdout</span> <span class="o">=</span> <span class="n">params</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s1">'returnStdout'</span><span class="o">,</span> <span class="kc">false</span><span class="o">)</span>
    <span class="n">String</span> <span class="n">encoding</span> <span class="o">=</span> <span class="n">params</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s1">'encoding'</span><span class="o">,</span> <span class="kc">null</span><span class="o">)</span>

    <span class="n">timeout</span><span class="o">(</span><span class="nl">time:</span> <span class="mi">2</span><span class="o">,</span> <span class="nl">unit:</span> <span class="n">HOURS</span><span class="o">)</span> <span class="o">{</span>
        <span class="cm">/* invoke the built-in sh step */</span>
        <span class="k">return</span> <span class="n">steps</span><span class="o">.</span><span class="na">sh</span><span class="o">(</span><span class="nl">script:</span> <span class="n">script</span><span class="o">,</span>
                    <span class="nl">returnStatus:</span> <span class="n">returnStatus</span><span class="o">,</span>
                    <span class="nl">returnStdout:</span> <span class="n">returnStdout</span><span class="o">,</span>
                        <span class="nl">encoding:</span> <span class="n">encoding</span><span class="o">)</span>
    <span class="o">}</span>
<span class="o">}</span>
<span class="cm">/* Convenience overload */</span>
<span class="kt">def</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">script</span><span class="o">)</span> <span class="o">{</span>
    <span class="k">return</span> <span class="nf">call</span><span class="o">(</span><span class="nl">script:</span> <span class="n">script</span><span class="o">)</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="an-easier-way">An easier way?</h3>

<p>Work is currently being undertaken, spear-headed by <a href="https://github.com/ewelinawilkosz2">Ewelina
Wilkosz</a> at Praqma
under <a href="https://github.com/jenkinsci/jep/tree/master/jep/201">JEP-201</a> titled
“Configuration as Code.”</p>

<blockquote>
  <p>We want to introduce a simple way to define Jenkins configuration from a
declarative document that would be accessible even to newcomers. Such a
document should replicate the web UI user experience so the resulting structure
looks natural to end user. Jenkins components have to be identified by
convention or user-friendly names rather than by actual implementation class
name.</p>
</blockquote>

<p>While I haven’t had the time to really dive deeper into what Ewelina and her
crew are proposing, they are certainly in the right ballpark for making Jenkins
easier to administer, and policies easier to enforce.</p>

<hr />

<p>Once you come to terms with scripting Jenkins, there are a number of ways in
which policy can be enforced using those scripts. My current preferred method
is to use <code class="language-plaintext highlighter-rouge">init.groovy.d/</code>, but those only apply during boot/restarts. It’s
also possible to execute those very same scripts via the Jenkins CLI, which I
have done in the past. Through a clever combination of shell, Groovy, and
Puppet scripting, it’s possible to write idempotent scripts which Puppet can
run every time the Puppet Agent runs, ensuring on-going compliance.</p>

<p>Just because it isn’t easy, doesn’t mean it’s impossible,</p>]]></content><author><name>R. Tyler Croy</name></author><category term="cicd" /><category term="jenkins" /><category term="opinion" /><category term="pipeline" /><summary type="html"><![CDATA[One foggy morning a few weeks ago, I received a disk usage alert courtesy of the Jenkins project’s infrastructure on-call rotation. In every infrastructure ever, disk usage alerts seem to be the most common alert to crop up, something somewhere is not properly cleaning up after itself. This time, the alert was from our own Jenkins environment. The logging filesystem wasn’t the problem, the filesystem hosting JENKINS_HOME was perilously close to running out of space. The local time, about 6:20 in the morning, and yours truly was quietly furious at the back of a bus headed into San Francisco for the day.]]></summary></entry></feed>