This week we launched the Scribd tech blog, on which I published today’s article: We’re building the largest library in history. I frequently have to remind myself that I have been here less than a year, and we have undergone incredible positive change, with more coming in 2020.
Howdy!
Welcome to my blog where I write about software
development, cycling, and other random nonsense. This is not
the only place I write, you can find more words I typed on the Buoyant Data blog, Scribd tech blog, and GitHub.
Building containers in Jenkins with Kaniko
I have a love/hate relationship with containers. We have used containers for production services in the Jenkins project’s infrastructure for six or seven years, where they have been very useful. I run some desktop applications in containers. There are even a few Kubernetes clusters which show the tell-tale signs of my usage. Containers are great. Not a week goes by however when some oddity in containers, or the tools around them, throws a wrench into the gears and causes me great frustration. This week was one of those weeks: we suddenly had problems building our Docker containers in one of our Kubernetes environments.
Broam Chomsky
A number of years ago I was building out a product with a small team, like most
teams I’ve worked with, an irreverent sense of humor emerged. One of my
colleagues quite enjoyed using the term “bro” ironically; he certainly was the
type of person who wouldn’t come within earshot of any group of people who
might use the term with any level of seriousness. As the product started to
take shape, we found ourselves in need of fake users in our test system. I’m
not sure who created this first user, but the user’s fullName was set to
“Test Bro.” Shortly thereafter another user was added: “Broam Chomsky.”
JKS? jfc. Adding a root certificate
TLS certificates have the largest “complexity/importance” scores imaginable. Everything about them is error prone and seemingly over-engineered from top to bottom, yet they are one of the most important pieces of security and authentication in our software architectures. From an engineering management standpoint, I am finding myself adopting the rule of: estimates for any project involving certificates should be multiplied tenfold. If the project involves the Java Virtual Machine (JVM) and the Java Key Store (JKS), multiply by another ten I suppose. For my own future convenience, in this blog post I would like to outline how to add a root certificate to a Java Key Store in Red Hat-derived environments.
Tell your executives to sit down
Over the course of my professional career I have witnessed the transition from free and open source software being something useful engineers do, to a multi-billion dollar industry with companies jumping into the frenzy. During this time I have also gone from an open source user, to contributor, to a board member. Helping to steward a few small projects, but mainly focusing on the Jenkins project. Along the way I have interacted with businesses in each role, forming opinions of their businesses. Getting a sense of their cultural values by watching and listening as their employees interact with the project, or their executives make public statements about Jenkins or open source software in general. By night I am open source contributor, but by day I am now what enterprise sales people refer to as the “buyer.” One with opinions formed by years of interactions with these companies whose products we evaluate.
Jenkins with agents on a separate Kubernetes cluster
Running untrusted CI/CD workloads in Jenkins is perhaps my favorite security discussion. Throwing Docker into the mix makes things even interesting, and in some cases less secure. Today I implemented a pattern which I have discussed with colleagues but hadn’t yet had the opportunity to try: a multi-Kubernetes cluster for Jenkins. In short, running a Jenkins master in a cluster which acts as the control pane for it and many other services, while running all of its workloads in an entirely separate Kubernetes cluster. For those who know the joy of managing Kubernetes this may seem like madness, but it does offer a number of security benefits which I would like to outline.
Ruby Infrastructure Engineering
My favorite part of the stack is the netherworld between the underlying infrastructure and the app. That fuzzy grey area where data goes from databases to object-relational mappers (ORMs), web servers to request libraries (e.g. Rack/WSGI), and so on. In many cases a technology roadmap where one considers infrastructure, but not the application, or vice-versa, is doomed from the start. At Scribd, I have been given permission to hire more people that love this layer of the stack, and I have taken to calling it “Ruby Infrastructure.” A phrase which is fairly unique, that I wanted to define in greater detail.
Defining the Real-time Data Platform
One of the harder parts about building new platform infrastructure at a company which has been around a while is figuring out exactly where to begin. At Scribd the company has built a good product and curated a large corpus of written content, but where next? As I alluded to in my previous post about the Platform Engineering organization, our “platform” components should help scale out, accelerate, or open up entirely new avenues of development. In this article, I want to describe one such project we have been working on and share some of the thought process behind its inception and prioritization: the Real-time Data Platform.
Zooming out to Platform Engineering at Scribd
The team that I joined Scribd to build, Core Platform is now up and running with five incredibly talented people. I could not be more pleased with the very friendly and highly functional group of people we have been able to assemble. With that team’s projects underway, my focus has been shifting, zooming out to “Platform Engineering” as a comprehensive part of the engineering group. In this post, I want to expand on what Platform Engineering is planned to be and discuss some of the teams and their responsibilities.
The Configuration as Code plugin and "id must be specified" errors
Yesterday we rebuilt and re-deployed one of the Jenkins containers we use at work, and much to my chagrin the Jenkins environment no longer wanted to boot. We use Jenkins on top of Kubernetes, integrated with Hashicorp Vault, configured with the Configuration as Code plugin and the Job DSL plugin. While I am pleased with this stack of tools, it is not a “simple” set up. It had been three weeks since the last rebuild and redeploy, and the name of the game was: what of the dozen changes that have happened in one of these tools over the last three weeks was the culprit.
I hate the made up word 'performant'
The tech industry is filled with all sorts of silly jargon and acronyms. Our overuse of jargon not only makes us very easy to identify in a crowded restaurant but also helps make things confusing for new-comers and veterans alike. In my current role, I find myself spending a lot of time with vendors who also seem to delight in barraging prospects with unpleasant jargon. My least favorite word among it all is performant.
Modeling continuous delivery
I spend more time than I wish to admit thinking about how continuous delivery (CD) processes should be modeled. The problem domain is one that affects every single organization which distributes software, yet the approach each organization takes is almost as unique as the software they develop. From my perspective Jenkins Pipeline, especially its declarative syntax, is the best available option for most organizations to model their continuous delivery processes. That does not mean however that I believe Jenkins Pipeline is the best possible option.
545 miles in slow motion
San Francisco, Santa Cruz, King City, Paso Robles, Santa Maria, Lompoc, Ventura, Los Angeles. For the better part of seven days, I sat on a bicycle with over 2,200 cyclists and 650 volunteers riding from one part of California to another to raise money for HIV/AIDS services as part of AIDS/LifeCycle. For perspective, 545 miles is further than the distance from Boston to Washington D.C., further than Brussels to Berlin, further than Tokyo to Hiroshima. It is countless hills, steep descents, farm fields, supportive on-lookers, packets of chamois butter, potholes, water bottles, and sliced bananas. Based on this, my first year’s experience, it is also six inner tubes, one bike tire, and an entire bike frame long.
Solving CERTIFICATE_VERIFY_FAILED with Gmail and Offlineimap
Offlineimap has been a major part of my desktop computing environment for many years, indulging my use of mutt for all work and personal email. My work email has unfortunately been stored in Gmail, which does support IMAP but tends to do a few wacky things with files and folders.
Austria capturing the far-right zeitgeist
For a myriad of reasons the only video-news I consume tends to be German-language news out of Germany. Local or national American news is usually lower quality, setting aside the abhorrent monopolies, it always trends towards an insular world view, missing many major international events. One such event skirting under radar of American media has been the disintegration of the Austrian parliament after the deputy chancellor, a member of a far-right party, was caught on video soliciting bribes from a woman posing as a relative to a Russian oligarch.
Marching towards JRuby/Gradle 2.0
JRuby/Gradle is one of the few open source projects which I created that actually resonates with people. One that I find myself continuing to work on, despite not using it in my day-to-day work. JRuby/Gradle is a collection of Gradle plugins which make it easy to build, test, manage and package Ruby applications. By combining the portability of JRuby with Gradle’s excellent task and dependency management, JRuby/Gradle provides high quality build tooling for Ruby and Java developers alike. With my fellow maintainer, Schalk Crojné, I started working towards the 2.0 milestone.
How Jenkins usage statistics work
For years the Jenkins project has published anonymous usage statistics to stats.jenkins.io. Despite its warts, the system has ultimately proven useful for determining which plugins are most frequently installed, big coarse-grained changes in growth, and providing various marketing departments with the validation they so desperately crave. Like many of the tucked away corners of the Jenkins project, being an infrastructure maintainer affords me an understanding of how the system works, and sometimes doesn’t. As I promised to the CDF Technical Oversight Committee many weeks ago, in this post I will attempt to describe how this system works.
What's Uplink
Making changes safely to an application like Jenkins is incredibly tricky. Jenkins is distributed to hundreds of thousands of independently owned and operated servers and is used in a myriad of ways. Our changes with the best intentions, can still result in confounding bugs and errors for users with different configurations, or different combinations of plugins. Over on the Jenkins project blog, Daniel wrote about the first use of “telemetry” by Jenkins core, a project on which we collaborated. I ended up building the backend service for receiving this telemetry, Uplink, and I hope it paves the way for making smarter changes across Jenkins core in the future.
Oh shit. One month until AIDS/LifeCycle 2019!
Today marks one month until the beginning of AIDS/LifeCyle 2019 (ALC)! Which means I am one month away from starting a bicycle journey with thousands of other riders from San Francisco to Los Angeles as part of our effort to raise money for AIDS/HIV related services. As of this writing, my fundraising is at $3,377 which is still short of my fundraising goal: $5,000. If you appreciate my work in the Jenkins project, the JRuby/Gradle project, or if you have enjoyed my sass on Twitter, please convert your appreciation into a donation to AIDS/LifeCYcle. :)
Thoughts about a secure enclave for Jenkins Pipeline
Continuous integration and continuous delivery (CI/CD) projects might just be one of the hardest to lock down and secure. As system designers and implementors we must enable developers to automate their builds, tests, and deployments. And yet, in doing so, we also give those same developers the ability to bypass many of the boundaries we may have set up to secure our environments. If you give me the ability to automate my deployment with a script, I can think of a number of ways in which that ability can lead to information disclosure or other types of breaches. Jenkins Pipeline is filled with any number of problematic examples here the same feature can be looked at as empowering or as compromising. I believe the immense flexibility of Jenkins Pipeline also gives us a path to provide automation which is inherently more secure than some competitors. In this post, I’ll outline one such idea: a pipeline secure enclave.