brokenco.de

Improving lock performance for delta-rs

2023-11-29T00:00:00+00:00

I have had the good fortune this year to help a number of organizations develop and deploy native data applications in Python and Rust using a project I helped found: delta-rs. At a high level delta-rs is a Rust implementation of the Delta Lake protocol which offers ACID-like transactions for data lake use-cases. One of the big areas of my focus has been in evaluating and improving performance in highly concurrent runtime environments on AWS.

To help others understand the problem domain I spent some time earlier in the week documenting the challenges in AWS on the Buoyant Data blog: Concurrency limitations for Delta Lake on AWS

In the case of AWS S3’s consistency model many operations are strongly consistent, but concurrent operations on the same key are not. AWS encourages application-level object locking, which the delta-rs implements using AWS DynamoDB.

AWS S3 is an incredible piece of technology that washes away a myriad of common storage problems, and has been jokingly referred to as “the 8th wonder of the world” by Corey Quinn. THe lack of a “putIfAbsent” like semantic is however very annoying for the Delta Lake protocol, adding the need for an application-wide lock for Delta users:

The dynamodb-lock approach allows for some sensible cooperation between concurrent writers but the key limitation is that all concurrent operations must synchronize on the table itself. There is no smaller division of concurrency than a table operation

In the blog post I offer some potential approaches to mitigate the weakness of needing a table-level lock for concurrent Delta Lake writers on AWS, but the problem will unfortunately remain until in some form or fashion until S3 introduces a “putIfAbsent” semantic which allows writers to “put” a file only if it doesn’t exist in an atomic way.

For concurrent Delta writers I can offer some advice, but unfortunately effective cooperative distributed concucrrency at scale remains a challenging problem! :)

Solving a FreeBSD Jails issue: interface already exists

2023-11-12T00:00:00+00:00

For a long time after I rebuilt my jails host, I could not restart a certain number of jails due to an “interface already exists” error. For the life of me I could not make sense of it, The services running in the jails were useful but not required so I put off tinkering with it. I thought that I would magically stumble into the solution in my sleep or something equally silly.

watermelon# service jail start gitea
Starting jails: cannot start jail  "gitea":
ifconfig: interface epair14 already exists
jail: gitea: ifconfig epair14 create up: failed
.
watermelon# service jail stop gitea
Stopping jails:.

What perplexed me about this issue is that I would run ifconfig epair14a after the failure to start the jail, and the interface would be there. “Surely this must be a FreeBSD bug!”

The “eureka!” happened earlier today, not while I was sleeping, but rather while I was solving other problems. “I bet there’s something fishy in the configuration, I should just rewrite it” I thought to myself. Most esoteric bugs are not bugs with the compiler, libraries, or operating systems. Usually they’re the user doing something slightly stupid and not realizing it.

My jail configuration (/etc/jail.conf) resembled the following:

gitea {
        $id = "14";
        $ip_addr = "10.10.10.${id}";

        vnet.interface = "epair${id}b";

        exec.prestart = "ifconfig epair${id} create up";
        exec.prestart += "ifconfig epair${id}a up descr vnet-${name}";
        exec.prestart += "ifconfig $public_bridge addm epair${id}a up";

        exec.start = "/sbin/ifconfig epair${id}b ${ip_addr}";
        exec.start += "/sbin/route add default ${public_gw}";
        exec.start += "/bin/sh /etc/rc";

        exec.prestop = "ifconfig epair${id}b -vnet ${name}";
        exec.poststop = "ifconfig ${public_bridge} deletem epair${id}a";
        exec.poststop += "ifconfig epair${id}a destroy";
}

Looking at the block and comparing it to other functional jails, I saw something missing: a vnet; declaration:

--- jail.conf   2023-11-12 20:09:03.028010000 -0800
+++ /etc/jail.conf      2023-11-12 19:59:02.867271000 -0800
@@ -230,6 +230,7 @@
        $id = "14";
        $ip_addr = "10.10.10.${id}";

+       vnet;
        vnet.interface = "epair${id}b";

        exec.prestart = "ifconfig epair${id} create up";

Sometimes you have to just walk away from a problem for a bit, but yeesh was that a silly one!

Hashicorp Nomad, almost but not quite good

2023-10-20T00:00:00+00:00

My home office has grown in size and for the first time in decades I believe I have a surplus of compute power at my disposal. These computational resources are not in the form of some big beefy machine but a number of smaller machines all tied together by a gigabit network hiding away in a server cabinet. The big problem has become how to effectively utilize all that computational power, I turned to Nomad to orchestrate arbitrary workloads on static and ephemeral (netboot) machines. As the title would suggest, it’s almost good but it still falls frustratingly short for my use-cases.

I started investigating Nomad because Hashicorp pulled out a big licensing foot-gun and pulled the trigger, changing to a non-open source license for all of their projects, Nomad included. Unlike it’s friend Terraform, whose community rightfully revolted and created OpenTofu, no such community seems to exist for Nomad. Extension and integration points are the raw materials necessary to build a blossoming third-party community, and without something akin to Terraform’s providers and modules, there simply isn’t a common way for Nomad users to share patterns. Nomad has equivalent to Helm charts, and the user community is worse off for it.

While Nomad does technically have a plugin architecture, it is poorly documented and seems to only exist for task drivers (e.g. docker, exec, pot). The vast majority of users are not going to need to write new task drivers, but I can imagine a ripe opportunity for something akin to Terraform modules for shared workload definitions in Nomad. It just doesn’t seem to have ever materialized.

The roughness around the edges are many, but some of the ones bugging me this week are:

A glitchy web UI that rivals old Jenkins in its ability to hide the common user flows behind a too many clicks.
A description language that doesn’t “cascade” properly. Some blocks can be configured at the job, group, and task level. Others can be configured at the task level, like env, leading to redundant definitions across every task in a job.
Secrets integration is through Hashicorp Vault or … nothing. Which means I guess I’ll just shove things into environment variables and hope nobody notices.

I do kind of like Nomad though, which makes this all the more frustrating. Most of what I need to do are ad-hoc on-premise compute workloads, some of those workloads fit “cleanly” into Docker containers, others do not. Nomad does meet that lovely middle ground of allowing me to orchestrate both. The support for service (run a web server), batch (run a nightly job), and sysbatch (run a management task on a slice of nodes) task types also covers a very useful spectrum of my needs.

Despite all the really interesting qualities of Nomad it is a perhaps overly complex piece of software which never lent itself to strong open source contributions or community engagement. With the change in its license I fear it’s going to fall further behind and ultimately be forgotten in a sea of ambitious but ultimately mismanaged software projects.

Returning to the needs that led me to adopt Nomad in the first place, they’re still not entirely met but I’m a bit lost on options to orchestrate workloads that could fit in Nomad really well.

Yes, the rough edges of Nomad are frustrating. What is much more frustrating is that I can see how Nomad could be a great piece of software, but because of social factors rather than technical ones, will never actually get there.

Why we re-export symbols from other libraries in Rust

2023-07-26T00:00:00+00:00

Dependency management in the Rust ecosystem is fairly mature from my perspective, with crates.io, Cargo, and some cultural norms around semantic versions, I feel safer with dependencies in Rust than I have in previous toolchains. It’s far from perfect however, and this question helps highlight one of the quirks of how Rust dependency management does or does not work, depending on your perspective:

What is it that makes Rust users want libraries to re-export stuff from other libraries?

I often get requests for axum to re-export stuff from hyper, time, or other common crates. Why? Just “cargo add hyper” and you’re good to go. Hyper is in your crate graph regardless.

I also often get feature requests for the few types axum does re-export so it does confuse some. That’s why I’m reluctant to just re-export everything.

I started writing up a reply in Mastodon but then I noticed that my words were approaching the 500 character limit and perhaps this topic wasn’t microbloggable! I help maintain the deltalake package for Rust and we do re-export a number of libraries, such as arrow of which I am a strong supporter.

The biggest motivation for re-exporting is to preserve ABI compatibility in our interfaces. For some crates your transitive dependencies may be masked entirely from the end-user, for example if I pull in the regex crate I’m typically just using it for regular expressions inside my crate and not exposing an interface which takes a regex::Regex. The ABI is safe from transitive version changes of that crate. If however my crate exposes an API which is dependent on a transitive dependency then I can have problems with version mismatches. Such is the case with arrow in delta-rs, which exposes arrow_array::RecordBatch. There is a much larger chance of ABI incompatibilties between a transitive version of arrow needed by the deltalake crate and what the consuming project may specify. This is exacerbated in our case because another transitive dependency of deltalake specifies a dependency on arrow: datafusion.

That means that the user, deltalake, and datafusion all have to agree on the same version of arrow for types to properly interoperate between API calls.

But it gets worse!

The Rust community generally seems to follow semantic versioning, but that doesn’t mean anything about the releases, just the version numbers used for them. I can make major breaking API changes every one of my 0.x.x releases, or in the case of arrow and datafusion I can just increment the major version every release.

By re-exporting symbols from those two crates, downstream users of the deltalake package will have a stable RecordBatch type ABI to work with for every release, and can largely ignore non-API breaking changes such as struct layout changes, etc.

I am still mixed on whether all types from other crates exposed in my APIs should be exported. I think there is benefit to doing so for faster moving dependencies. In essence:

arrow, moving fast, better user experience to re-export
url, moves slow, very mature, not really needed to re-export.

The judgement call I am typically making is whether this would make my life easier as a downstream consumer of the crate. It’s not that much effort of maintenance burden to pub use something in a crate if that’s convenient.

Dynamically forwarding SSH ports with "commandline disabled"

2023-07-10T00:00:00+00:00

I frequently use SSH for accessing one of the many development workstations I use for work, which includes developing network services among other things. A couple of years ago I wrote about this hidden gem in ssh which allows dynamocaily forwarding ports. This handy little feature allows dynamocailly adding local port forwards from within an already running SSH session. Recently however this feature has stopped working properly, emitting commandline disabled.

It turns out that this is due to a backwards incompatible change which OpenSSH released in 9.2 earlier this year:

ssh(1): add a new EnableEscapeCommandline ssh_config(5) option that controls whether the client-side ~C escape sequence that provides a command-line is available. Among other things, the ~C command-line could be used to add additional port-forwards at runtime.

The reason for this change is to support some sandboxing use-case which I don’t entirely understand but also don’t need, so I needed to add the following option to my host entries in ~/.ssh/config:

Host foobar
    Hostname 172.16.1.1
    EnableEscapeCommandLine yes

This can also be configured on the command line with -o EnableEscapeCommandline=yes. Happy port forwarding!

Requiring non-default features to be set in Rust

2023-05-26T00:00:00+00:00

I found myself refactoring a Rust crate in which I had two non-default features but at least one would need to be set in order for cargo build to function. Cargo allows a default feature set, or allows different targets to have required-features defined. My use-case is different unfortunately, I wanted slightly different semantics to support either s3 or azure features. I stopped by ##rust on libera.chat and as usually happens, got a nudge in the right direction: build.rs:

By adding the following to build.rs I was able to forcefully halt the build operation before it even really got started.

#[cfg(not(any(feature = "s3", feature = "azure")))]
compile_error!(
    "Either the \"s3\" or the \"azure\" feature must be enabled to compile"
);
fn main() {}

Using the compile_error! macro in build.rs ensures that users will only see the following compilation error message, rather than a long list of other errors which may come from missing feature definitions.

Quick and easy trick to get required non-default features enabled!

AIDS/LifeCycle 2023 is a go!

2023-05-17T00:00:00+00:00

I am really excited to be officially in for AIDS/LifeCycle 2023! This will by my third year supporting the life-saving services offered by San Francisco AIDS Foundation and the Los Angeles LGBT Center by riding from SF to LA with AIDS/LifeCycle.. This past 12 months has been among the most stressful and rewarding in my adult life, so I’m doubly excited to have the support of so many friends and fmaily. In the next month I’ll continue fundraising to try to meet my goal, and would appreciate your help too!

Please donate now!

I originally started riding with a friend of mine impacted by HIV and have since come to appreciate the importance of our fundraising to support: counseling, HIV/STD screenings, linking youth experiencing homelessness and people living with HIV to housing, and so much more.

Riding with AIDS/LifeCycle has rekindled my love of cycling and since I began training again in 2021, I haven’t stopped. Riding with purpose has done wonders for my mental and physical health. Like the thousands of people our fundraising supports, I can also credit AIDS/LifeCycle for helping me live a happier and fuller life.

As in years past I will try to share as much of the ride as I can on my blog. You can read about last year with the alc2022 tag. You can also follow my training on Strava!

On behalf of all the people AIDS/LifeCycle helps I want to thank you all for your continued support this year!

Invalid signature in boot block on FreeBSD

2023-03-13T00:00:00+00:00

I don’t have a lot of opinions about UEFI, but it seems that building something as critical as booting around the FAT32 filesystem is not a great idea. FAT32 is a simple but archaic filesystem which has the resiliency of a paper boat. While moving machines around in my homelab this weekend I was bit by that resiliency as halfway through booting my FreeBSD NAS it complained that it could not complete fsck operations: Invalid signature in boot block: 0000.

This FreeBSD machine uses UEFI and boots directly to ZFS. Imagine my surprise that the operating system had complaints about my boot partitions…after it had already booted. This machine had recently been rebuilt with new disks after I discovered that the previous disks I had been sold were “SNR” (Shingled Magnetic Recording), which have such abhorrent performance that it’s a wonder they’re even marketed at all. Suffice it to say, disk issues on this machine terrify me. I doni’t want to deal with another rebuild!

The boot process failed half-way through, which means that FreeBSD drops you into a single-user mode in the console. With that I could poke around a little bit:

zfs list showed all data sets I expected
zpool status showed that each disk in the pool was healthy.
zpool scrub for good measure to make sure the pool was legitimately healthy.
gpart showed that the partitions on all the disks were in tact as well.
fsck reported errors on the EFI partitions for three of the four disks.

For whatever reason, the efi partitions were all hosed in the same way on 3/4th of the disks: Invalid signature in boot block.

I am still not entirely sure how this corruption occurred but getting the machine back online to do more disk diagnostics was a key step forward. Fortunately with one valid efi partition, I was able to dd its contents onto every other disk, since they’re all supposed to be identical anyways:

dd if=/dev/ada0p1 of=/dev/ada1p1 bs=4M

After a round of copying bytes around, I was able to reboot and everything came up perfectly fine!

Since there are no other indications of disk failure or problems, I may never know what originally caused the corruption. The consensus on IRC however is that building a foundational part of the boot process on an unreliable filesystem was perhaps a bad idea.

Considering object-orientedness from the Rust perspective

2023-02-22T00:00:00+00:00

A very simple question in a community channel earlier this week sent me deep into reflection on software design. I started writing software as is classically understood as Object Oriented Programming (OOP), with Java, Python, Ruby, Smalltalk. Design has been mostly about creating those little boxes that encapsulate behavior and state: the object. Rust in contrast I wouldn’t describe as an object-oriented programming language, to be honest I’m not sure what we call it. It’s not functional programming and it’s not object-oriented programming as I understand it. It’s something else which is the key to why Rust is so enjoyable.

The simple question from Mr Powers was:

Just noticed an interesting delta-rs / delta-spark difference. Delta Spark doesn’t let you instantiate a Delta Table with a specific table version, but delta-rs does.

delta-rs: DeltaTable("../rust/tests/data/simple_table", version=2)

delta-spark: DeltaTable.forPath(spark, "/path/to/table") - no version argument available

Are there any implications of this difference we should think about?

The difference may seem trivial, one appears to have an optional “constructor” parameter and the other does not, who cares? But that’s not it.

Will responded correctly with:

I think the distinction to make is that DeltaTable represents a table at some particular time, and not the table in general

The thing is, I was there when the first API was written. I remember the design discussions and considerations we evaluated. I didn’t catch the subtle change of thinking that was happening at the time.

When I am working in Ruby or Python, I find myself thinking about how to represent state and behavior as this black box. “How would I represent this in a diagram with boxes and arrows?”

Take a filter for example, a filter is almost always just behavior but when I might design something like that in Ruby or Python, Filter becomes a base class which may or may not end up having state too. The base class becomes the means for describing “things which behave like this” but the very nature of defining class implies state.

Most object-oriented languages follow my beloved Smalltalk where everything is an object which contains both behavior and state, even when that doesn’t quite make metaphorical sense.

Coming back to the question posed.

The reason this simple design difference seems so impactful to me is when I consider the Spark (Scala) implementation, it’s design bugs me. It bugs me in a way that it wouldn’t have, prior to starting to use Rust. Delta tables are constantly evolving as new writes occur, as new transactions are being written the idea of what the table is also changes with the underlying data. This is especially the case when a metadata change is committed to the transaction log. Therefore making an object encapsulate the concept of an ever-changing Table itself presents this jarring conflict: if I have this object, what is the actual nature of the object? How does (or does not) this object change over time?

Writing and reasoning about this, I think I have a better sense of what makes Rust so pleasing to work with. The ownership model and borrow checker do make things much easier, but the nature of a program is not object-oriented, nor is it functional, but something else. It accommodates the current reality of software development which is inherently multi-modal.

At our disposal we have:

Functions which do things, and can be grouped into modules, etc.
Structs which contain state, but like objects in other languages can have associated behaviors. Unlike in those object-oriented languages these cannot be extended. This forces the Rust developer to design structs around the state first and foremost. We are encouraged to take this data first approach and when combined with mutability and ownership rules, Rust programs tend to have fewer large evolving objects or object hierarchies.
Traits which allow defining behaviors and grouping them in a hierarchy separated from data and state. This separation allows us to consider behaviors which might have slight variations but otherwise present a similar interface such as the filters example that I mentioned above.

I could wax on and on about how important traits are from a design standpoint. Being able to group and “inherit” behaviors separate from data is liberating.

The mental contortions I found myself doing in a more object-oriented world are no more. Nor am I going down the “functional programming all the things” rabbit hole. Rust has a lot of both to offer but I find that its structure has led me to better designs because it has just the right amount of multiple different programming models thoughtfully mixed together.

Ditching the cloud is most likely a bad idea

2023-02-21T00:00:00+00:00

I have the dubious honor of leading a migration from an on-premise managed colocation facility into AWS. It was necessary to help the business succeed, but frankly I would rather not have needed to do it. Earlier this morning I saw a post about ‘leaving the cloud” by that attention-seeking guy who keeps trying to keynote RailsConf, I had some opinions. I was hopped up on caffeine and free office snacks, and just could not help but share my thoughts in the fediverse.

Long story short, I think the original author’s analysis is nonsense and will most likely result in him Musking his own company. Either way, here are some thoughts saved for posterity:

I have always disliked this dude’s simpleton analyses but IF you are considering leaving AWS (or other cloud providers) you must include:

Operational cost: which is all that the original author’s analysis includes.
Labor cost: migrations use people’s time, which is typically the biggest portion of a company’s budget.
Opportunity cost: managing infrastructure or migrating it means you’re not investing in growing the business. If your business isn’t about running infrastructure (e.g. CloudFlare, Fastly, etc), this typically means you’re actively harming your business by focusing elsewhere.

But there’s so much more!

IF the business’ workloads are CPU intensive and consistent, buying metal might be cheaper.

Otherwise, if your math shows that on-premise is cheaper than I would have questions about the current infrastructure, are you using:

ECS/Fargate is crazy cheap and works great for almost all web apps you can shove into a container.
AWS Aurora is crazy good and makes a lot of RDMS work and scaling easy.
AWS Savings Plans help further reduce costs for predictable compute.

IF the business already has a big investment into AWS S3, I hope you’re planning to get punished with S3 egress costs.

S3 is a modern marvel as Corey Quinn has said. You literally cannot make faster, cheaper, or more resilient storage But AWS uses cost to encourage you not to walk away from S3.

Depending on the relation of the application to the S3 storage, transit fees can eat you alive.

IF the business’ SLAs allow for the risk of a single-site on-premise deployment, that’s coo.

AWS can have downtimes but it can be enlightening to ask the ops old guard about the time suck of configuration management, rack management, or dealing with RMAs with shitty hardware vendors.

I don’t relish funding Jeff Bezos’ next super yacht any more than you do, but the stack you can get on AWS is unrivaled in its cost, reliability, and ease of use.

Nobody gives AWS enough credit for their security work.

Building secure infrastructure is really challenging. There’s patch management, role-based access control systems, data encryption needs, certificates, all sorts of things.

Not all clouds do it well (lol azure).

But walking away from VPCs, Security Groups (Network Isolation), IAM (Role-based access controls), CloudTrail (audit logging), GuardDuty (intrusion detection), and automated upgrades for managed services would have me very seriously questioning what security posture the org may or may not have.

Anyways, I don’t love AWS. It’s a monoculture and it makes an ugly anti-competitive business viable.

It’s still the right choice in my opinion for the vast majority of businesses.

Scheduling work with market dynamics

2023-02-03T00:00:00+00:00

I had a lucky break in the day and was able to read this blog post which popped up in my social feed. In essence it talks about what Fly.io did to rebuild their scheduler to better match what they’re trying to accomplish. Orchestration and scheduling are topics I like to geek out on, going back many years as part of the Jenkins project. But this quote in particular caught my eye:

flyd has a radically different model from Kubernetes and Nomad. Mainstream orchestrators are like sophisticated memory allocators, operating from a reliable global picture of all capacity everywhere in the cluster. Not flyd.

Instead, flyd operates like a market. Requests to schedule jobs are bids for resources; workers are suppliers. Our orchestrator sits in the middle like an exchange. ratemysandwich.com asks for a Fly Machine with 4 dedicated CPU cores in Chennai (sandwich: bun kebab?). Some worker in MAA offers room; a match is made, the order is filled.

I love this idea for a lot of reasons, not the least of which is that it’s a real-world incantation of an unoriginal idea that I had for Otto, an overly ambitious CI/CD side-project.

In my work I referred to it as resource allocation by auction and had only just begun to experiment with the concept. I once read a computer science paper which described this concept more in detail, but I cannot seem to find it again.

Suffice it to say, there’s a lot of good efficiency to be gained by resource auctioning in this manner, especially in a multi-tenant system. The Fly.io blog post is an interesting read either way, but efficient resource scheduling in this way I hope makes it into a lot of other systems.

I think coding interviews categorically suck

2023-01-27T00:00:00+00:00

I recently had a good discussion with another engineering leader about the merits of coding interviews. They have long been a trusted part of the tech company interview process, but I have been mostly hiring without them over the last 5 years. Below I wanted to share some of the thoughts that I sent my colleague:

(in response to a concern about hiring somebody that can’t actually build software)

I have also made one or two hires who didn’t end up being able to really build and implement things. No interview process is going to be 100%, sometimes a dud gets through. :)

Many coding interviews necessarily need to fit in the time allotted and therefore are merely puzzles or computer science questions. The internet is littered with tools on how to practice your way into passing a coding interview, in fact, I have even seen a book or two at my library on the subject. For the most part, a coding interview tells you how well somebody can pass a coding interview, it doesn’t actually tell you that they can build software. [SOME VENDOR] claims to alleviate some of that, but software development is a team sport and there’s a lot around the programming that is expected of software engineers, especially more senior ones.

My second main concern is that it has always come across to me as almost disrespectful of people’s time. FAANG companies are awful about this. Many interview processes are already requesting substantial time commitment from people, and to see companies then ask people to do a “take home assignment” or a test boggles the mind. Automatic does an interesting twist on this in that they basically pay people up front and take them on in a contracting capacity before hiring.

As a hiring manager, my objective is to determine whether somebody can build software. I will typically try to find a way without some form of coding exercise that’s tailored to each candidate, for example:

If they’re on GitHub and have activity, I’ll look at open source contributions. In some cases that’s sufficient, because I can see how they respond to code review, interact with others, and structure their code in a real world scenario (commit messages too!). I enjoy discussing pull request reviews with these candidates too.
If they don’t have public activity, I will look at their resume for items which mention “design and implementation” and then we’ll do more of a “code architecture” interview where I discuss that system with them and ask questions about how they structure their code, create modules, test, etc.
If they are simply too junior or for whatever reason they don’t have anything above, then what I’ll do is a “debugging interview” rather than a coding interview. Where we start with something pre-existing and debug it to make it work, refactoring along the way. In these interviews I’m typically using a bit of our production code, rather than something that’s contrived.

Interviewing is hard and imprecise to say the least. Writing code is an important part of a software engineering role, but we rarely do it as performance art, making the coding interview an awkward and flawed means of assessing skill.

An HR leader I once worked with told my team and I to “find reasons to hire [the candidate]” rather than finding reasons they weren’t good enough. That dramatically changed my approach to hiring. Coding interviews, like any “tests” during the interview process are finding reasons to bounce the candidate from the funnel. By taking a more personalized approach to each candidate, I believe an organization can still make really strong hires with a more respectful and collaborative interview process that results in better outcomes for everybody involved.

A lot of engineering management is actually information management

2023-01-19T00:00:00+00:00

Are you an organized person? Do you understand information flow in your organization? The importance of categorization and taxonomy? You might be a good fit for Engineering Management! Having now spent a number of years in management and leadership positions, I have noticed a number of successful patterns, and unsuccessful patterns. In this post I want to focus on one of the more successful patterns: good information management.

Engineering managers are expected to have loads of information ready at all times. The architecture of the systems their team is responsible for, current project priorities, cross-team points of dependence or collaboration, and a myriad of other snippets of information. It’s a lot, but I don’t think it’s reasonable to expect a person to maintain so much information in their active memory. That’s why information management is very important for a management role, I don’t need to remember everything, but I do want to remember where everything is documented.

Some of the productive patterns that I have seen and utilized:

Decision Log: it’s great when a team can make decisions quickly, but an inventory of decisions made is increasingly important as the team grows or evolves over time. This should include a synopsis of the decision being made, the alternatives considered, the trade-offs discussed between options, and the reasoning behind the decision ultimately made.
Link everything: Tim Berners-Lee wants you to hyperlink all your hypertext! Creating a meeting invite? Link to the meeting notes page in the agenda. Creating a meeting notes page to discuss a project? Link to the project in the issue tracker. Creating a ticket in the issue tracker? Link to the decision made to implement that solution, or the customer support ticket(s) it relates to, or the other projects that this ticket blocks. Creating a commit to complete a ticket, link to the ticket in the commit and pull request. Every link created is a breadcrumb for the manager and the team to tap into this web of useful and related information.
Research must produce documentation: frequently a manager or engineer needs to answer a question, that’s it. “Can this technology be used to solve this type of problem.” That research work doesn’t usually result in a direct code or systems change to a production application, but the output of that research should be documentation in the wiki. In essence every bit of work in engineering should produce an artifact. Most tasks will produce a pull request, but research tasks should produce a document which outlines what was learned, or create a new decision in the decision log. This allows the manager to benefit and reference back to knowledge gained during a project that did not lead to tangible code changes.
Metadata is crucial: At least in the Atlassian suite of tools there are a myriad of ways to categorize pages and tickets. Use them. A good taxonomy of labels can go a long way. In the case of documentation in the wiki, this allows for creating aggregations of pages around a particular topic. These aggregation pages can provide a quick overview for all resources relating to a specific technology or project. In the issue tracker labels can provide a useful point to query tickets relating to a point in the ticket lifecycle, a project, or even a specific customer’s needs.

From my perspective it is not the project managers job to add the necessary links or information hierarchy, it is not even really the engineering managers job. It is however the managers job to build the culture of information management that allows them and the team to quickly recall or re-discover critical information about the projects that are being worked.

Some managers I know use running Google Docs or Spreadsheets to manage their workload, which may work for personal task tracking, but I typically discourage their use. They’re not linkable and discoverable enough! Many spreadsheets are write-once and read-once. By building and collaborating with a shared information management scheme, the team and the managers can benefit from the on-going “gardening” of information.

Regardless of the system you use or consider, if you are a manager, please consider that a large part of your job relies on managing information, and institute the practices and systems necessary!

ChatGPT and your intellectual property

2023-01-09T00:00:00+00:00

There is an excessive number ChatGPT screenshots littering social media right now, and not nearly enough critical thinking about feeding data into this novel new chatbot. An anecdotal survey of my timeline includes people asking ChatGPT to solve math equations, write emails for them, create short story prompts, identify bugs in code, or even generate code for them. Behold, the power of AI!

ChatGPT is created by OpenAI, which despite the name is not any form of “open” organization, but rather a startup which has been considering funding at a pretty monstrous valuation. In essence, ChatGPT is an AI tool trained on a large corpus of public and proprietary information, packaged up as a kooky chatbot.

Fine. Setting aside my own annoyance with ML developers co-opting data from “the commons”, fine.

The zeal with which most people are dumping information into ChatGPT really concerns me however. I have seen a number of people feeding their own source code into ChatGPT to ask it to find bugs or security holes. It would be foolish to assume that the inputs into ChatGPT are not also used to train ChatGPT, or at least the next generations of the model.

I am certainly no lawyer, but the two primary problems here are:

Most developers are not authorized to disclose proprietary information of their employers. Pasting source code into any browser window creates a liability, but a browser window with ChatGPT increases the likelihood that the source code disclosed will be reproduced in the future, for some other user of the system. Uh oh!
Can the code generated by ChatGPT could be considered yours? Who actually owns the copyright to machine generated code, or machine generated anything for that matter? Do the architects of the system own it, or the users supplying the inputs? This particular wrinkle isn’t unique to ChatGPT, but any ML tool generating data which occupies a space adjacent to human created, and copyrighted works.

My concerns with what OpenAI is doing with this data is not tin-foil paranoia. Adobe is catching grief for opting Lightroom users in to train their AI with those users copyrighted or proprietary works.

I am sure the legal system will catch up to the rapid evolution of these ML robber barons, but until then I think we should all be very weary of feeding intellectual property to these systems.

The problem with ML

2023-01-04T00:00:00+00:00

The holidays are the time of year when I typically field a lot of questions from relatives about technology or the tech industry, and this year my favorite questions were around AI. (insert your own scary music) Machine-learning (ML) or Artificial Intelligence (AI) are being widely deployed and I have some Problems™ with that. Machine learning is not necessarily a new domain, the practices commonly accepted as “ML” have been used for quite a while to support search and recommendations use-cases. In fact, my day job includes supporting data scientists and those who are actively creating models and deploying them to production. However, many of my relatives outside of the tech industry believe that “AI” is going to replace people, their jobs, and/or run the future. I genuinely hope AI/ML comes nowhere close to this future imagined by members of my family.

Like many pieces of technology, it is not inherently good or bad, but the problem with ML as it is applied today is that its application is far outpacing our understanding of its consequences.

Brian Kernighan, co-creator of the C programming language and UNIX, said:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

Setting aside the mountain of ethical concerns around the application of ML which have and should continue to be discussed in the technology industry, there’s a fundamental challenge with ML-based systems: I don’t think their creators understand how they work, how their conclusions are determined, or how to consistently improve them over time. Imagine you are a data scientist or ML developer, how confident are you in what your models will predict between experiments or evolutions of the model? Would you be willing to testify in a court of law about the veracity of your model’s output?

Imagine you are a developer working on the models that Tesla’s “full self-driving” (FSD) mode relies upon. Your model has been implicated in a Tesla killing the driver and/or pedestrians (which has happened). Do you think it would be possible to convince a judge and jury that your model is not programmed to mow down pedestrians outside of a crosswalk? How do you prove what a model is or is not supposed to do given never before seen inputs?

Traditional software does have a variation of this problem but source code lends itself to scrutiny far better than the ML models. Many of which have come from successive evolutions of public training data, proprietary model changes, and integrations with new data sources.

These problems may be solvable in the ML ecosystem, but problem is that the application of ML is outpacing our ability to understand, monitor, and diagnose models when they do harm.

That model your startup is working on to help accelerate home loan approvals based on historical mortgages, how do you assert that your models are not re-introducing racist policies like redlining. (forms of this have happened).

How about that fun image generation (AI art!) project you have been tinkering with uses a publicly available model that was trained on millions of images from the internet, and as a result in some cases unintentionally outputs explicit images, or even what some jurisdictions might consider bordering on child pornography. (forms of this have happened).

Really anything you teach based on the data “from the internet” is asking for racist, pornographic, or otherwise offensive results, as the Microsoft Tay example should have taught us.

Can you imagine the human-rights nightmare that could ensue from shoddy ML models being brought into a healthcare setting? Law-enforcement? Or even military settings?

Machine-learning encompasses a very powerful set of tools and patterns, but our ability to predict how those models will be used, what they will output, or how to prevent negative outcomes are dangerously insufficient for the use outside of search and recommendation systems.

I understand how models are developed, how they are utilized, and what I think they’re supposed to do.

Fundamentally the challenge with AI/ML is that we understand how to “make it work”, but we don’t understand why it works.

Nonetheless we keep deploying “AI” anywhere there’s funding, consequences be damned.

And that’s a problem.

Meet Buoyant Data, and let me reduce your data platform costs

2023-01-02T00:00:00+00:00

One of the many things I learned in 2022 is that I have a particular knack for understanding, analyzing, and optimizing the costs of data platform infrastructure. These skills were born out of both curiosity and necessity in the current economic climate, and have led me to start a small consuhltancy on the side: Buoyant Data. Big data infrastructure can be hugely valuable to lots of businesses, but unfortunately it’s also an area of the cloud bills that is frequently misunderstood, that’s something that I can help with!

Mike Julian from The Duckbill Group once made the proclamation that the way to actually save money in AWS is to design your infrastructure to be cost-effective. “Optimization” techniques can only take you so far, and once you’ve burned through all the optimizations, you may find yourself needing to further reduce the cost of your infrastructure and have no more “fat” to trim! In the first blog post I outline a “reference architecture” for a data platform which I know is cost-effective, easy to manage, and lends itself well to growth.

Planning for sensible, cost-concious growth is very important. With most data platforms as they start to prove their value, the organization will bring even more workloads to them. If you give a data scientist a good platform, they will find themselves wanting ever more from that data platform, and Buoyant Data can help make sure that growth is sustainable and the value to the business is easy to identify as well.

Please add the Buoyant Data RSS feed to your reader, as I have a number of blog posts queued up already with some gratis tips and tricks for understanding the cost of your data platform! 😄

The technology stack for Buoyant Data is something I cannot wait to write more about. After funding the creation of delta-rs as part of my day job, I am utilizing the library in a big way to build extremely lightweight and cost-efficient data ingestion pipelines with Rust and AWS Lambda. There’s still plenty of space for Apache Spark on the querying and processing side, but as DataFusion matures, I’m looking forward to exploring where that can fit into the picture.

There’s a lot of evolution happening right now in the data and ML platform space, I’m really looking forward to growing Buoyant Data in my spare time!

The fastest way to make Rust Strings

2022-10-28T00:00:00+00:00

A friend of mine learning how to code with Python was complaining about the myth that “there’s a Pythonic way” to do things. The “one true way” concept wasn’t ever taken seriously in Python, not even by the standard library. Practically speaking, it’s impossible not to have multiple ways to accomplish the same outcome in a robust programming language’s standard library. This flexibility jumped out at me while hacking on some Rust code lately: how many ways can you turn str into String?

In Rust "this thing" is a primitive str type and will have the &'static lifetime. Without diving into lifetimes and how Rust ownership works, this is basically read-only memory that exists for the duration of the program. They’re static and you can’t do much with it. In most APIs you’ll need the String type, which will give you an allocated bit of data you can play around with.

Without much effort I came up with five different ways that I have written Rust code to perform this conversion:

String::from("The boring way")
"Using a trait".into()
"This is actually a trait too".to_string()
"Lol, this is also a trait".to_owned()
format!("Wake up and choose violence")

If you have some other nifty ways to create Strings, let me know on Twitter or via email (rtyler@ this domain)!

But which is the most fastest?! I wrote the following very important, and very serious microbenchmarking code:

use microbench::{self, Options};

fn into_trait() {
    let _s: String = "Rust is cool!".into();
}

fn to_string() {
    let _s: String = "Rust is cool!".to_string();
}

fn format() {
    let _s: String = format!("Rust is cool!");
}

fn owned() {
    let _s: String = "Rust is cool!".to_owned();
}

fn string_from() {
    let _s: String = String::from("Rust is cool!");
}

fn main() {
    let options = Options::default();
    microbench::bench(&options, "String::from!", || string_from());
    microbench::bench(&options, "Into<String>", || into_trait());
    microbench::bench(&options, "ToString<str>", || to_string());
    microbench::bench(&options, "ToOwned<str>", || owned());
    microbench::bench(&options, "format!", || format());
}

I compiled the program with rustc version 1.63.0 and after running some truly rigorous and scientific tests on my workstation, I am thrilled to share the results:

❯ cargo run
   Compiling rust-strings-are-silly v0.1.0 (/home/tyler/source/github/rtyler/rust-strings-are-silly)
    Finished dev [unoptimized + debuginfo] target(s) in 0.25s
     Running `target/debug/rust-strings-are-silly`
String::from! (5.0s) ...                 278.552 ns/iter (0.991 R²)
Into<String> (5.0s) ...                  286.293 ns/iter (0.983 R²)
ToString<str> (5.0s) ...                 292.736 ns/iter (0.987 R²)
ToOwned<str> (5.0s) ...                  290.276 ns/iter (0.985 R²)
format! (5.0s) ...                       300.144 ns/iter (0.995 R²)

HOW INTERESTING!

Well, not really.

Microbenchmarking like this has lots of flaws, especially when sampling on a single machine running many other concurrent processes. After executing the tool a few times, one common pattern that I did see was that the format! macro is consistently the slowest way to create Strings. In fact cargo clippy will complain about you using in this way, not because it’s slow, but because it’s a “useless use of format!”, which I can agree with! :)

Choosing between the rest of them probably is nothing more than a style choice of the developers working on any given Rust project. With these types of things it’s typically best to adopt one consistent way of doing things within the codebase to improve readability, but they’re all functionally equivalent..

In Rust there’s no “one true way” to create a String, but my personal preference is .into() for no other reason than it is the fewest characters to type!

The Death Ride

2022-08-09T00:00:00+00:00

Endurance athletes have a misconfiguration in their brain, one that compels them to pursue increasingly foolish goals, for me the Death Ride was as foolish as it was ambitious. The course is 103mi, starting at ~5k feet elevation, with a total of about 14k feet of elevation gain. It is not a race per se, though I’m sure somebody is “first” back to the finish line. What is celebrated are completions. If you can survive all six passes, you’re a winner! The mountains are steep, the road largely exposed, and the heat is oppressive, but hey! Good luck! Have a great ride!

I managed to complete all six passes in 7:58:50.

Enough time has passed for me to reflect on the event, almost a month now, and both my brain and legs have forgotten enough that doing it again doesn’t seem so ridiculous.

Around 5am I rolled up in my car to the starting point outside of Markleeville. A CHP officer was directing cars to park on the side of the road. Cyclists were already passing by, having ridden from their nearby campgrounds. Aside from ALC I had never seen this many cyclists in one spot. “If these old geezers can do this, so can I!” ran through my head as I put my shoes on, topped up my tires, and ate the last of my food in the car.

The Death Ride is very well supported, there are aid and water stations along the way but with a new event I trend towards more self-sufficiency; better to have too much food instead of too little.

Picking up my number the dawn’s light is starting to creep over the mountains. The air is cool and the feeling is electric. I am excited! What an adventure! Look at all these old geezers, I’ll be fine!

The first mile is a coasting downhill through the town of Markleeville. The makeup of the course means that the last mile will then be an uphill slog to the finish line. Something to worry about later!

Monitor Pass

As I turn to start the ascent of Monitor Pass I find myself passing cyclists and have to intentionally slow myself down. I know that my adrenaline is making me all antsy in my pantsy. I don’t want to use up my legs on the first climb. At this stage of the ride the mental effort expended is about discipline. Don’t be stupid, pace.

The sun streaks over the mountains as I grind up to Monitor Pass and some of the views are simply spectacular! Despite wildfire which had recently burned through the area, the landscape is still something to behold.

As I crest the climb I see the first aid station and remember: “oh right, I have to go down the other side and then back up this bastard!” I pass by the aid station, I’ll hit it on the way back, I will need it then.

Coming down the southeast side of Monitor Pass is genuinely awesome, the view opens up in a big way and the massive valley is on full display in the morning sun. There is precious little time to enjoy the view because I am accelerating and the descent is fucking insane. 40+ mph rocketing down a mountain with certain death should you be stupid or unlucky and go off the side. I have to remind myself a couple times to relax my grip on the handlebars. At one point I exceeded 49mph, which was not the fastest I would go during the ride.

Approaching the Topaz Lake rest stop the descent slows through a rock walled canyon, which gives me the opportunity to see the slog being endured by cyclists heading back up to Monitor Pass.

I don’t take much nutrition in at Topaz because I intended to stop at the rest stop back up topside. I drop some gear in a drop bag and start my ascent. Falling in with a couple of doctors I intentionally chat them up a bit. If I’m talking, I won’t be tempted to pass people on the climb as much. Eventually they fall back because my pace is too aggressive for them. Climbing solo my pace picks up as I constantly find new people to chase. My legs feel good, it’s not too hot, the view is gorgeous, what a wonderful ride!

Stopping topside at the Monitor Pass rest stop again I stuff myself full of food. It’s basically all downhill from here until the lunch stop. My neighbor gave me the advice to not fill up at lunch since that’s at the base of the Ebbett’s Pass climb. As I finish chewing and drinking a pepsi (sugar water!) and prepare to leave the rest stop, somebody knocks over a rack of bikes. Oops!

The descent down from Monitor Pass to the fork was fucking fast. I chase a couple people down the hill, hug my top tube, and enjoy the big straightaways and gradual sweeping turns. My top speed for this segment is the fastest I will go all day: 55.4mph. According to Strava, the fastest person on this segment topped out at 70.4mph which is absolutely insane.

At lunch somebody who was descending with me mentions that they saw me narrowly miss a rock on the road and were anxious that I wasn’t going to see it in time. Fortunately I did see the rock coming, which could have been disastrous, but at high speeds it’s important not to make sudden corrections!

I nibble a bit and pack a sandwich in my back pocket from lunch for later. Time for Ebbett’s Pass, the biggest bastard climb of them all.

Ebbett’s Pass

The top of Ebbett’s Pass is at 8,703ft and has a variable gradient from around 6-7% at the outset and then it gets steeper between 10-15% towards the summit.

To be honest I don’t remember much of this part of the ride. It was simply a slog, but if these geezers can do it, so can I! Honestly, much of the ride is really just a mental test of how much you can grind it out. All said and done, it was about an hour of sitting in and mashing pedals.

The rest stop is perched right at the top and a welcome reprieve. They were serving instant ramen, sprite, pepsis, and all manner of snacks with salt and sugar in them to replenish the tired muscles. As I sat in one of the graciously provided camp chairs eating my ramen I overheard a couple other cyclists talking about how many passes they were going to do. One geezer said “nope, this was it, I’m just doing this one.”

I vaguely recalled registration where you selected the number of passes. I was signing up for the Death Ride, so I said “six”. I’m going to do them all damnit! The nuance of that registration form was lost on me. A lot of cyclists do shortened versions of the ride, picking and choosing which passes they’re going to do, enjoying their ride, and going home! A lot of these geezers were going to do six passes, but not all of them. I had to re-orient my motivational tactic slightly 😄

Either way, I had summitted Ebbett’s Pass, that was the “hard one” in my head. Three of six passes completed. “I’m practically done!”

Pacific Grade

Cycling is a constant lesson in humility. The distance between the Ebbett’s Pass rest stop and the turnaround point was only 14 miles, but four of those miles were painfully steep. After 50 miles of work already, the steep climbs up Pacific Grade were brutal, for the first time of the day I started to see cyclists stopped taking a breather.

One of the punchier sections of the climb is a brief stint at 32%.

My bottles were full as was my stomach so I passed some water stops and decided to keep my momentum pressing onwards to the turnaround at 69 miles.

Upon arrival I found some shade where other cyclists were sitting on rocks hiding from the sun. I took my spot and started eating my warm sandwich. Despite those climbs there was a lot of downhill that was about to turn into uphill on the return.

The sun was in full effect, it was only going to get hotter. I filled my bottles, saddled up, and started to climb back up the backside of Pacific Grade.

Long road home

Ebbett’s Pass is a mother fucker.

The rapid descent from Pacific Grade is followed by 5-6 miles of 8-10% gradient, exposed in the full afternoon sun, with little wind, and nothing to do but look at the road in front of your handlebars. Letting your eyes drift any further ahead and you’ll be reminded of just how hopeless it all is.

I slowly crank by cyclist after cyclist hiding from the sun under the few trees providing some shade near the narrow mountain road. The previous climbs had conversation and sometimes even laughter. The climb back up to Ebbett’s Pass is silent. Nobody is talking, nobody is following, nobody is happy, we’re all just surviving. I have difficulty deciding whether it’s better to drink or douse myself with hot water in my bottles.

Thinking about the geezers doesn’t help.

My legs feel fried, it’s hot as shit, the view doesn’t matter, what a miserable ride.

Getting closer to the top I hear echoes of what I think are cowbell and shouting, the rest stop must be just up ahead! I fooled myself more times than I can remember with that mirage. By the time I finally arrived at the rest stop I was almost surprised it actually existed this time.

Give me water, give me electrolytes, give me a couple of these sprites, I’ll take some of that watermelon too. I need to sit in one of those alluring camp chairs and reconsider the erroneous decisions which led me here.

As I sit and contemplate whether I’m hot enough for cartoon steam to shoot from my ears, I see people finishing the first ascent of Ebbett’s. Those poor souls, it’s just going to get hotter, the climb back up from the turnaround is a already a bastard.

Once my core temperature lowers a bit, I pull myself up and back into the saddle for the “easy” descent to the finish line. My plans change slightly, I’m confident I will finish, I now want to get off this route as quickly as possible.

The descent off Ebbett’s back towards the fork has some hairpin turns which slow me down quite a bit. I’ve come too far to eat shit on some mountain road just before the finish line. But as the road straightens out, I speed up, pushing my top speed for this segment of 44.9mph. I also fall in with a couple other guys and we start a paceline towards the finish. Teamwork always makes for fun cycling and high speeds, both of which I’m glad to have at this point in the afternoon.

Climbing into Markleeville I somehow fumble my water bottle when trying to return it to its cage. While I’m fatigued, I’m not about to leave my water bottle! We’ve come so far together! Of course, the problem with a cylindrical bottle on a hill is that as I dismount it starts to roll away from me. Water bottle no! Come back!

Clickety-clack go the bike cleats as I jog downhill 15 yards to capture the bottle. I cannot help but laugh at how ridiculous the scene must have been as I sprint back to try to catch my group.

The last three miles are uphill. Only a 5% grade, but fully exposed with a headwind, and after 100mi of absolutely mind-warping riding. I don’t think I have ever hated a stretch of road like I hated that one.

Completion

The relief of crossing the finish line was delayed. My core temperature was high, my heart rate was high, i felt dehydrated. There was live music, beer, ice cream, and food. That would all have to wait. I sat on a bench shirtless for probably 30 minutes slowly taking in water and electrolytes before I started to become functional again.

At a rational level I understand that the Death Ride was a brutal slog which was more of mental challenge than a physical one. Did I enjoy it? I think so.

The brain of an endurance athlete seems to have a misconfiguration, one which makes it difficult to distinguish between a challenge, punishment, and fun. The Death Ride was all three, so who knows, maybe I will be back next year.

Cycling through calories

2022-08-08T00:00:00+00:00

I never really paid attention to the calories burned during cycling until recently, and it’s still somewhat shocking when I look at it. With my love of cycling rekindled by AIDS/LifeCycle I have spent a lot more time in the saddle this year. Between short criterium races, my longest at 140mi, or the most elevation with the Death Ride, I have needed to be very mindful of my nutrition before, during, and after these rides. In short, cycling can burn a lot of calories.

The “nutrition facts” panel on commercially sold food typically accounts for a 2,000 calorie daily allocation. This is a rough approximation of what the average American should eat. Reasonable I suppose, but let me share some of the calorie expenditures estimated on my recent rides:

Patterson Pass Road Race, 43mi, 4,400ft elevation: 2,400 calories
Sonoma Parks tour, 140mi, 6,700ft elevation: 5,122 calories
Death ride, 103mi, 14,000ft elevation: 7,557 calories

The numbers are insane! I expect that I need almost 3,000 calories a day just to keep my weight and activity levels normal. That means for these more significant rides my body requires 3-4x the average daily suggested intake.

“I wish I could eat like you!”

I will frequently get comments about my appetite. Eating 3-5k calories a day is quite the challenge! Are you sure you’re up to it? 😄

Because I have no idea what a thousand calories look like, I have had to enlist the help of a calorie tracker. In doing so I have learned a few things:

Making each meal ~1k calories is hard, especially challenging when eating vegetarian.
The day needs four meals, not three.
Feeling hungry during the day is a sign that I’m behind.
“Palate Fatigue” is a thing.

Nutrition science is something I am learning more serious athletes spend a lot of time thinking about and experimenting with. Logically it makes sense: if your body is the engine, food is the fuel and something you should be optimizing to improve performance. As a lay person it is still surprising to me how rudimentary my own nutrition education was, remember the food pyramid?

There’s still a lot to learn and tune with my own nutrition as it relates to my weight and performance. I wish I had useful tips to share, but the experience is so individualized that I think you may be best suited exploring what works best for you. Keeping track of calories, macronutrients, and expenditures is a start, but there’s a lot worth exploring!

Finishing: AIDS/LifeCycle Day Seven

2022-06-14T00:00:00+00:00

Waking up on the last day of big gay summer camp is always a downer. In the warm and muggy air of Ventura, the love bubble starts to pop and you’re left with one last bike ride before returning to the real world. This year was my second AIDS/LifeCycle, and I was not excited to wake up for day seven. Once the tent and gear were dropped off, my breakfast consumed, there was nothing but a measly 70 miles remaining for ALC 2022.

I also posted a thread to Twitter for today with more pictures

Camp closes up early on day seven, so everybody is awake early. The alarm rang at 4:15 and there was already a flurry of activity to hear outside. People rustling in their tents, zippers zippering, flip-flops slapping against heels, the deepened morning voices of tired cyclists and roadies. I followed my usual protocol of going straight to the porta-potties before heading over to breakfast, but since everybody was waking up, there was quite the queue for number two. I decided I could wait, scurried back to my tent to get dressed, tear down, swung by the gear trucks, and then found a block of line-less porta-potties en route to the food tent.

In the food line I did not grab “The Daily Spin”, the little camp newspaper that’s printed every day, like I normally do, and therefore miss a key instruction: gear trucks will not arrive at the finish line until 1pm.

Methodically chewing eat bite of my breakfast I planned my day: my knee was doing okay, but this is the last day and the last chance to go fast with some of these other riders. I figured that either way I was going to sit around at lunch to wait for the finish line to open at 11am, so why not try to get to lunch as fast as I can!

I have some short-circuit in my wiring that prevents me from “calming the fuck down” as Ride Director Tracy puts it. Riding fast with a group of other lunatics really is quite a lot of fun, and getting away from the main pack of cyclists has allowed me to enjoy the scenery much more than I had in 2019. Either way, this is the last chance to pedal hard with these folks until 2023, so I’m going to make every mile count.

Bike parking opens early and I roll out with the first 40-50 riders. We cruise along the boardwalk and into the city of Ventura for a little bit before meandering through some fields and suburban sprawl. I do a lot of the usual “on your left” routine before I get separated from some folks due to my speed and some red lights. As we ride by some naval base a bunch of fast riders come up, including the Triathelete, and I catch their wheel.

Bike friends!

The group is probably 9 people large and it includes some of the fast riders I’ve been chasing all week, plus a couple of new faces. We all cruise along together towards Rest Stop One, each keeping the pace and trading off pulls. After a while of keeping up at the number 3 or 4 position, I figure it’s my turn to pull for a bit, pop out to the left and throw down some power. My back wheel pops up a little bit as I do so, a bad habit I’m trying to break myself of, since a wheel in the air is not transferring power to the road.

The way I have found myself passing people has been to basically do a mini-sprint, something I’ve found useful in criteriums. The downside of this approach is that if the group is chugging along at 22mph or so, and I’m all of a sudden pushing 26mph, I’m going to push too far out in front. I accidentally turned “my turn to pull” into a breakaway. Oops.

The fun thing about this group of cyclists is that somebody follows my breakaway, and that just makes the whole effort feel very much like a normal crit or road race. I can feel the lactic acid building in my quads, thighs, and glutes. 545 miles of cycling has given me a lot of time to focus on getting every watt of power out of my legs, and leading out this group I’m acutely aware of each muscle involved. After a mile or so we all bunch back up and rocket onwards to Rest Stop One.

The Triathelete comments in the rest stop that he really enjoys following behind me. I’m able to push a strong pace, and I’m tall, so at his shorter stature he can tuck in behind me for a free ride. Somebody else comments how fun that bit of teamwork was, and that we’re all maybe a little competitive.

Once my routine is done, I leave the rest stop alone and push through the wind between the Santa Monica mountains and the Pacific.

At some point a cyclist I will come to know as Nils passes me, and as is my customary response, I sprint to catch his wheel and start to work together with him to keep a strong pace towards Rest Stop Two.

Nils is dutch, is about as tall as I am, has been cycling seriously since sometime last year, and is fast. He is inexperienced though, and I learn as we cruise along working together that he hasn’t really had much of this teamwork experience on ALC thus far. We trade off and on into Rest Stop Two, and then depart together to continue flying towards lunch.

Between Rest Stop Two and Lunch is Malibu. I hate Malibu. The Pacific Coast Highway is flanked on the east side by mountains, and on the west side by expensive homes and cars parked ever-so-slightly off the road. Everybody in Malibu drives like they’re the only ones on the road, and cyclists can get squeezed between aggressive drivers, and the door-zone from parked cars. The city is basically 20+ miles of coastline, and it sucks.

Fortunately the flying dutchman and I are making insane time. We spot a number of large cycling groups riding together on the PCH, which is genuinely cool to see. It seems like every cyclist north of LA has come to engage in battle with motorists for who should really get to own this stretch of beautiful highway.

At a stoplight some local cyclist with some aero kit, a fast looking carbon bike, and stacked legs pulls up next to us. When the light turns green, Nils takes off, followed by me, followed by the local. No more than a quarter mile down the road, the local flies by Nils and I.

Rabbit!

We have probably ridden 45 strong miles at this point, but I’ll be damned if I’m not going to give chance. I pop out of the saddle and put in the best sprint I can muster to chase him down. I get within a few bike lengths but cannot get into his draft. Nils later told me that I had left him in the dust on that sprint too!

Disheartened I settle into cranking at my 21-22mph pace, which is meager compared to the local. Nils comes flying by me and says “why don’t I give it a shot!” So of course now I have to keep up with Nils in his sprint. His effort falls short as well, but we fall into a tight rotation and chase this local, less than couple hundred yards away, for the remainder of the PCH until we pull off for lunch.

I haven’t been smoked like that all week. Good lord was that dude fast.

Reviewing my app over lunch, I had put down 55 miles at a 20mph average speed. That’s not a straight 55 either, there were a lot of little rollers, headwinds, and stoplights in between mile 0 and lunch.

We talk a lot about racing, triathalons, and what motivated us to get into cycling while killing time at lunch. From here there are about 15 miles to the finish line, and we roll out at about 10:15.

The pace is slowed due to traffic, more climbing, and the general mayhem that comes with riding through Beverly Hills and West Hollywood. At one point a car almost turned right into my, leading me to loudly share some profanities.

The last couple miles of ALC are some of the more dangerous ones in my opinion, a very hectic urban environment with tired cyclists and weekend drivers.

I crossed the finish line at almost exactly 11:00am and ALC is over.

As luck would have it, I forgot to pre-arrange shipping for my bike. I just kind of forgot that I had to register ahead of time for it to be put on a truck and driven back to San Francisco. Instead I had to pay a bunch of money so my bike could be packed and that I could safely take it home on the plane with me.

I also didn’t realize that gear wouldn’t be there until 1pm, so I had to sit around in the shade chatting and napping until gear trucks arrived.

Once I had everything collected, my gear, my giant bike box, my sweaty ass, trying to get a giant car to carry all of my stuff to a hotel proved to be equal parts annoying and time-consuming. I ended up leaving Fairfax High School at about 3pm, and didn’t get find a shower until after 4pm.

The beauty of ALC as a cyclist is that you kind of just have to wake and ride your bike. Life on the ride is simple: eat, pedal, eat, sleep, repeat. Once ALC is over however, you are quickly reminded at how much other shit there is to do other than cycling.

From a cycling perspective, day seven might have been the most “put together” of the days on ALC. Great teamwork, good legs, and high speeds. I felt challenged and like I left nothing “out on the road” when I was done. The change in skill and perspective from 2019 to 2022 was significant, I can only hope that I continue to improve and 2023 that much better!