Enforcing administrative policy in Jenkins, the hard way

One foggy morning a few weeks ago, I received a disk usage alert courtesy of the Jenkins project’s infrastructure on-call rotation. In every infrastructure ever, disk usage alerts seem to be the most common alert to crop up, something somewhere is not properly cleaning up after itself. This time, the alert was from our own Jenkins environment. The logging filesystem wasn’t the problem, the filesystem hosting JENKINS_HOME was perilously close to running out of space. The local time, about 6:20 in the morning, and yours truly was quietly furious at the back of a bus headed into San Francisco for the day.

To put it delicately, Jenkins has always been a pain for Systems Administrators. What was originally a huge selling point, the WYSIWYG configuration screens, over time, and thanks to the healthy adoption of “infrastructure as code” tooling such as Puppet, has become a weakness. With the introduction of “Pipeline as Code” as a core concept in Jenkins 2, circa 2016, the problem was even further exacerbated. Empowering developers with some level of code-driven autonomy is now a key aspect of any modern development tool, but without corresponding tooling and controls for administrators, such autonomy rapidly leads to chaos.

Back on the bus ride, the usage of JENKINS_HOME slowly inched towards 100%. A quick analysis indicated that most of the disk space was being occupied by what any capable Jenkins admin would expect:

Old archived artifacts.
Old test reports.
Old console logs.

With Jenkins Pipeline, developers have control. To the detriment of administrators like me, who have no (simple) means to systematically enforce things like log rotation.

That doesn’t mean administrators are left entirely out in the cold, but rather we have to enforce administrative policy the hard way.

Scripting Jenkins

Jenkins has support for built-in Groovy scripting, which is the usual solution for enforcing administrative policy in Jenkins. In order to rectify the disk usage situation, I wrote a little snippet of Groovy which will forcefully purge all but the last 5 runs of every Pipeline in the “Plugins” folder on the system:

Jenkins.instance.items.each { f ->
    if (f.name == 'Plugins') {
        f.items.each { p ->
            /* each  p is really a Multibranch Pipeline, which looks like a
             * folder, so need to iterate over its items */
            p.items.each { pipeline ->
                if (pipeline.builds.size() > 5) {
                    println "Deleting from ${p}"
                    /* Delete runs older than the last five */
                    pipeline.builds[5 .. -1].each { it.delete() }
                }
            }
        }
    }
}

Scary! Right now I have only added this little Groovy script to the infrastructure team’s runbooks. If I wanted to enforce this more systematically, I would add file to the init.groovy.d/ directory on the Jenkins master.

init.groovy.d

Many administrators aren’t aware of the init.groovy.d/ directory, which can be added to JENKINS_HOME. The really really useful characteristic of Groovy scripts added to init.groovy.d/ is that they are executed after Jenkins plugins are loaded, but before Jenkins is “ready” and starts accepting web requests or executing workloads. These qualities make init.groovy.d/ an ideal place to insert scripts which:

Clean up the filesystem, such as with my forceful log rotation script referenced above.
Enforce security policy, like my Groovy scripts which disable the Jenkins CLI, or configure GitHub OAuth-based authentication and authorization.
Configure monitoring tooling, such as the Datadog plugin
Pre-configure Pipeline Libraries, like those which should be enabled globally for all Pipelines

As I mentioned in my previous post Developing Groovy Scripts to Automate Jenkins, creating these scripts requires a lot of knowledge about how Jenkins works on the inside. While this is definitely “the hard way,” the end result is a much more automated and manageable Jenkins environment.

To learn more about scripting Jenkins, I highly recommend the talk embedded below, given by my pal Sam Gleske at Jenkins World 2017.

Scripting Pipeline

In my previous post Overriding steps in Pipeline with Shared Library sleight of hand, I discussed another option for enforcing administrative policy: overriding Pipeline steps. While I won’t repeat too much, I do wish to point out a very useful pattern to consider: enforcing timeouts on built-in steps. Take the sh step as an example, by default in Jenkins there is no built-in, configurable or otherwise, way to constrain the time spent by a step. This means a malicious or incompetent developer can run script which performs an infinite loop, wastefully tying up resources in the Jenkins environment.

By overriding the sh step, I can wrap it with a 2 hour timeout safe-guard as is implemented below. Once the Shared Library has been implicitly loaded in the Global Pipeline Libraries configuration, developers won’t notice any changes, but the beleaguered administrator will sleep a bit easier at night.

def call(Map params = [:]) {
    String script = params.script
    Boolean returnStatus = params.get('returnStatus', false)
    Boolean returnStdout = params.get('returnStdout', false)
    String encoding = params.get('encoding', null)

    timeout(time: 2, unit: HOURS) {
        /* invoke the built-in sh step */
        return steps.sh(script: script,
                    returnStatus: returnStatus,
                    returnStdout: returnStdout,
                        encoding: encoding)
    }
}
/* Convenience overload */
def call(String script) {
    return call(script: script)
}

An easier way?

Work is currently being undertaken, spear-headed by Ewelina Wilkosz at Praqma under JEP-201 titled “Configuration as Code.”

We want to introduce a simple way to define Jenkins configuration from a declarative document that would be accessible even to newcomers. Such a document should replicate the web UI user experience so the resulting structure looks natural to end user. Jenkins components have to be identified by convention or user-friendly names rather than by actual implementation class name.

While I haven’t had the time to really dive deeper into what Ewelina and her crew are proposing, they are certainly in the right ballpark for making Jenkins easier to administer, and policies easier to enforce.

Once you come to terms with scripting Jenkins, there are a number of ways in which policy can be enforced using those scripts. My current preferred method is to use init.groovy.d/, but those only apply during boot/restarts. It’s also possible to execute those very same scripts via the Jenkins CLI, which I have done in the past. Through a clever combination of shell, Groovy, and Puppet scripting, it’s possible to write idempotent scripts which Puppet can run every time the Puppet Agent runs, ensuring on-going compliance.

Just because it isn’t easy, doesn’t mean it’s impossible,