I have a love/hate relationship with containers. We have used containers for production services in the Jenkins project’s infrastructure for six or seven years, where they have been very useful. I run some desktop applications in containers. There are even a few Kubernetes clusters which show the tell-tale signs of my usage. Containers are great. Not a week goes by however when some oddity in containers, or the tools around them, throws a wrench into the gears and causes me great frustration. This week was one of those weeks: we suddenly had problems building our Docker containers in one of our Kubernetes environments.

I’m a strong supporter of running Jenkins workloads in Kubernetes for a myriad of reasons, which I won’t go into here. Like most organizations however, we don’t just need containers for the testing of our applications, we need to package them into containers as well. As such, we need to build Docker containers atop Kubernetes, which isn’t as straight-forward as you might hope.

For years I have followed the same approach that Hoot Suite describes in this post, utilizing Docker’s own “Docker in Docker” container (docker:dind). By using a pod with Docker-in-Docker and a Docker client container, the Jenkinsfile can be fairly simple for building a container but certainly not as simple as a plain sh 'docker build rofl:copter'. With the linked configuration above, our pipelines would typically have an explicit stage which would build Docker containers:

pipeline {
    stages {
        stage('Buildo Roboto') {
            agent { 
                kubernetes {
                    label 'docker'
                    defaultContainer 'docker'
                }
            }
            steps {
                sh 'docker build -t roboto:latest'
            }
        }
    }
}

In one of our environments, this recently stopped working. What’s worse, is that we still aren’t entirely sure why. We migrated the Jenkins workloads from an older Kubernetes cluster to a newer one, and afterwards this “dind” approach to building containers started throwing incredibly confusing network and filesystem errors. Smart money is on some host kernel or filesystem configuration issue which is causing the “dind” container, which must run “privileged”, to function incorrectly. After an hour or two of debugging, I said “forget this” (I may have used slightly different words) and started looking at other options.

Kaniko

Kaniko is a curious tool from Google which allows the building of containers on top of Kubernetes. By curious I mean that it works fairly different from a “stock” docker build invocation and required some tweaking on our end to get things working comfortably. That said, our initial work is promising and we think we’re going to be switching fully over to it.

The biggest oddity is the need for intermediate layers in the container build, and the resultant image to be published to repository. My colleague hypothesized that this was likely a pattern from Google Cloud Platform, where local VM disks might not be as fast as the container registry affiliated with a cluster. While there are local filesystem caching options we found them too unreliable to be useful.

For our configuration of Kaniko, we riffed on the Scripted Pipeline examples shared by my former colleagues at CloudBees, but made some fairly significant modifications along the way. Most notably, we decided to stand up an ephemeral Docker registry inside the Kaniko pod rather than rely on an external registry for intermediate layers. The end product is pushed to a well supported network-based registry, but the intermediate layers are perfectly fine to run locally, as we have very fast disk I/O on our Kubernetes nodes.

Kaniko’s invocation is much different, and the way it treats its build context is also a little odd. In our testing we found that the --cleanup flag was not enabled by default and successive calls to Kaniko would mash all the files from different contexts on top of one another in some temp directory used by Kaniko for builds, thereby leading to frustrating build failures. It should also be noted that the Kaniko containers use Busybox for their shell, but it’s on a fun non-standard path (/busybox/sh), so shell scripts expecting /bin/sh or /bin/bash will definitely fail!

We use Declarative Pipeline very heavily and also utilize own custom JNLP agent image in Jenkins (custom root certificates!), so the snippet below is should be largely portable to your environment but may need some tweaks:

pipeline {
    stages {
        stage('Buildo Roboto') {
            agent { 
                kubernetes {
                    defaultContainer 'kaniko'
                    yamlFile 'kaniko.yaml'
                }
            }
            steps {
                /*
                 * Since we're in a different pod than the rest of the
                 * stages, we'll need to grab our source tree since we don't
                 * have a shared workspace with the other pod(s)..
                 */
                checkout scm
                sh 'sh -c ./scripts/build-kaniko.sh'
            }
        }
    }
}

kaniko.yaml

# This pod specification is intended to be used within the Jenkinsfile for
# building the Docker containers
#
# E.g. /kaniko/executor --context `pwd` --destination localhost:5000/roboto:latest --insecure-registry localhost:5000 --cleanup
---
kind: Pod
metadata:
  name: kaniko
spec:
  containers:
  - name: jnlp
    # Overwriting the jnlp container's default "image" parameter, this will be
    # merged automatically with the Kubernetes plugin's built-in jnlp container
    # configuration, ensuring that the pod comes up and is accessible
    image: 'our-awesome-registry/rtyler/jenkins-agent:latest'
  - name: kaniko
    image: gcr.io/kaniko-project/executor:debug
    imagePullPolicy: Always
    # Command and args are important to set in this manner such that the
    # Jenkins Pipeline can send commands to be executed from the Jenkinsfile via
    # stdin (that's how it really works!)
    command:
    - /busybox/sh
    - "-c"
    args:
    - /busybox/cat
    tty: true
  #  Kaniko requires a registry, so we're just bringing one online in the pod
  #  for the intermediate caching of layers
  - name: registry
    image: 'registry'
    command:
    - /bin/registry
    - serve
    - /etc/docker/registry/config.yml

Our experience with Kaniko thus far is that it has been slower, and less verbose in some of its output than docker build. Fortunately though it’s been quite reliable, and that’s the key factor here!

Hopefully with the snippets of code above you won’t need to spend nearly as much time tinkering as my colleague and I did. But in the process of switching over to Kaniko we needed to do a lot of interactive debugging in Jenkins, so I was glad to have something like an interactive shell in my bag of Jenkins Pipeline tricks.

While I liked the “dind” solution, the Kaniko-based solution is just as well. The future development for us is to hide some of this complexity with shared libraries, but that’s a project for another day!