Taking inspiration from Smalltalk for Otto steps

I have recently been spending more time thinking about how Otto should handle “steps” in a CI/CD pipeline. As I mentioned in my previous post on the step libraries concept, one of the big unanswered questions with the prototype has been managing flow-control of the pipeline from a step. To recap, a “step” is currently being defined as an artifact (.tar.gz) which self-describes its parameters, an entrypoint, and contains all the code/assets necessary to execute the step. The execution flow is fairly linear in this concept: an agent iterates through a sequence of steps, executing each along the way, end. In order for a step to change the state of the pipeline, this direction of flow control must be reversed. Allowing steps to communicate changes to the agent which spawned them requires a control socket.

The agent control socket will allow steps to send a fixed number of message types back to the agent during their execution. My current thinking is that the control socket should speak Nanomsg, which puts a little bit more system level requirements on the steps to be able to communicate with that protocol. My first thought was just lines of JSON encoded over the wire, but there are a number of practical problems with trying to send JSON over a unix socket, for example.

Update 2020-10-30: I have since decided to shy away from Nanomsg for the control socket and instead have opted for a small HTTP server listening on a unix socket. I chose this approach to make the debugging and client interactions much easier, even a totally bash-based shell step would be able to interact with it!

For the first implementation, I am planning to have a single long-lived socket for the duration of a pipeline’s execution by the agent. By adding the ipc field to the invocation file (below), I should have the flexibility to allow an agent to create a single IPC socket for each step to avoid any accidental overlap.

---
configuration:
  ipc: 'ipc:///tmp/agent-5171.ipc'
parameters:
  script: 'ls -lah'

The types of messages that come to mind are:

Terminate the pipeline
Change the pipeline’s running status (e.g. unstable)
Capture a variable

The last item really struck me as necessary but I have been struggling with it quite a bit. In a declarative Jenkinsfile there is no provision for setting variables. It wouldn’t otherwise be very declarative! This restriction leads to some confusing hacks in real-world pipelines. The most common hack is to use the script {} block as an escape hatch, such as:

stage('Build') {
    steps {
        sh 'make'

        script {
            def output = readYaml file: 'output.yml'
            sh "./deploy.sh ${output.stage}"
        }
    }
}

There are numerous legitimate reasons to capture and utilize variables inside of a CI/CD pipeline. I want to support variables in some fashion without building a full-on interpreter or sacrificing clarity in the pipeline modeling language.

As I wrestled with the concept, I noticed that my pseudo-code I was writing for how variables might be used looked familiar:

prompt msg: 'What is the best color for a bike shed?', into: 'color'

To me, this looks a lot like Smalltalk. Mmmm Smalltalk. If you have some spare time, and haven’t yet experienced Smalltalk, you should go download Pharo and explore! It’s a wonderful language and development environment, and well-worth experimenting with in your career.

Anyways, back to Otto.

The syntax above would be the prompt step saving some user-provided string (hand-waves right now on how that would manifest in a GUI) and storing it in the color variable.

With variables, storing is one part of the problem, but using is the other much more interesting part. I knew I didn’t want if color == 'red' { } type blocks littering the code, lest a user think that this pipeline language is a programming language for them to build application in! (This is a very real problem with Scripted Jenkins Pipelines).

A related problem I had set aside the day prior was how to handle “block-scoped steps”, such as the following in Jenkins:

stage('Build') {
    steps {
        sh 'make'
        dir('deploy') {
            echo 'Deploying from the deploy/ directory'
            sh './shipit.sh'
        }
    }
}

All steps executed within the dir block are executed with a current working directory of deploy/.

Variable use and block-scoped steps both led me to a very Smalltalk syntax, which honestly has me quite excited to explore further! In Smalltalk there is no control structures in the traditional sense. No if, no for, etc. Instead one can send the ifTrue/ifFalse message to a Boolean:

color = 'red'
    ifTrue: [
        "Great choice!"
    ]
    ifFalse: [
        "Why did you chose wrong?!"
    ]

Fully embracing this Smalltalk-inspired concept would also be convenient to implement. Anything that isn’t a defined step can be looked at like a variable, using an approach similar to #method_missing in Ruby (which is actually just Smalltalk striking again! It’s called the doesNotUnderstand message in Smalltalk).

Exploring what this would look like in a more concrete pipeline snippet:

sh 'ls -lah'
prompt msg: 'Which file should I dump?', into: 'filename'

filename equals: '/etc/password',
    then: [
        echo 'Stop trying to pwn me!'
    ],
    else: [
        # Not sure on this yet, I _think_ I want to avoid raw string interpolation syntax
        format pattern: 'cat {}', with: [filename], into: 'dumpcmd'
        sh script: dumpcmd
    ]

dir 'deploy' [
    echo 'Deploying from the deploy/ directory'
    sh './shipit.sh'
]

# Intentionally drop the `filename` variable, which would go out of scope
# at the end of the stage anyways
drop 'filename'

A couple notes on the above pseudo-code:

I’m not yet sold on the syntax. The benefit of this approach rather than copying Smalltalk directly is that this syntax will make it easier support more robust string operations in the future. The other benefit of this syntax is that it makes everything behave step-like, insofar as a stringvariable internal/hidden step could use the parameters, including the two blocks, and just execute the block scoped steps like any other step.
The block syntax is intentionally different from the directive syntax (to use Jenkins terminology) of curly braces I think will help make the code more readable.
I don’t want to actually implement a full Smalltalk interpreter here, but I am liking that the syntax does keep things (subjectively) simple.

In order to implement block-scope steps, I am planning to refactor some of the step execution code into an agent crate which will allow steps to re-use the logic for executing steps. From a data structure standpoint the invocation file for the dir in the example might look like:

---
configuration:
  ipc: 'ipc:///tmp/foo.ipc'
parameters:
  directory: 'deploy'
  block:
    - symbol: echo
      parameters:
        msg: 'Deploying from the deploy/ directory'
    - symbol: sh
      parameters:
        script: './shipit.sh'

At runtime the process tree on the agent machine would look something like:

.
└── agent
    └── dir
        └── echo

Despite the state of these ideas right now I haven’t actually implemented them! I typically like to sketch out syntax and run through use-cases before I go running into Rust code.

Part of why I am sharing these early thoughts is because I want to make sure my love of Smalltalk is not blinding me to usability issues with this approach. I think this pattern will allow some non-declarative functionality in the pipeline without requiring an actual interpreted language to be used, but these thoughts are still fresh. If you’ve got some thoughts on what could be improved, or pitfalls to be aware of, feel free to join #otto on Freenode, or email me (about)!