A collection of characters, stories, and other elements
In part one, we talked about release anxiety and the twin pressures that create it: the pressure to improve and the pressure to protect. By the end of the post, you likely had a solution in mind, or at least you thought you did: continuous deployment and/or delivery.
Continuous deployment is when engineering teams deploy code changes frequently via automated tools; continuous delivery, something a little different, is when engineering teams build software so that it’s always in a releasable state.
Continuity is the dream: if deadlines cause anxiety, then why not get rid of deadlines? By releasing software continuously, you both ensure each code change can make it in and ensure each code change will have the time to get tested and inspected.
And yet, there are limitations.
Continuous deployment is not a new idea. In a 2009 essay–in one of the first instances of the concept–Timothy Fitz introduced the idea with an all too recognizable narrative:
Alex has just written a refactoring of some website backend code. Since it was a small task, it’s committed and Alex moves on to the next feature.
When the code is deployed in production two weeks later it causes the entire site to go down. A one-character typo which was missed by automated tests caused a failure cascade reminiscent of the bad-old-days at twitter. It takes eight hours of downtime to isolate the problem, produce a one character fix, deploy it and bring production back up.
Alex curses luck, blames human infallibility, inevitable cost of software engineering and moves on to the next task.
This story is the day-to-day of most startups I know. It sucks. Alex has a problem and she doesn’t even know it. Her development practices are unsustainable. “Stupid mistakes” like the one she made happen more frequently as the product grows more complex and as the team gets larger. Alex needs to switch to a scalable solution. (Emphasis added).
The purpose of continuous deployment, then, in Fitz’s definition and the current definition, is to make deployment scalable.
Fitz was perhaps more right than he realized. Traditional deployment practices have only gotten less practical and less scalable.
In our research (see the Big code report), we found that 51% of respondents reported having more than 100 times the volume of code they had 10 years ago and 18% saying they have 500 times more code.
As Fitz pointed out 12 years ago, so-called “stupid mistakes […] happen more frequently as the product grows more complex.” But they’re not stupid mistakes, we need to point out; if something is inevitable and consistent, it’s not an individual mistake–it’s a flaw in the system.
That’s why Fitz proposed continuous deployment as a new system. If you’re deploying continuously, he writes, then you can fail fast. He writes that “the closer a failure is to the point where it was introduced, the more data you have to correct for that failure.”
If Alex had deployed as soon as she developed the change, the failure would have been less severe, the rollback faster, and the diagnosis easier to make. And in a puff of smoke, theoretically, goes release anxiety.
The problem with continuous deployment is that it requires a SaaS environment. If you deploy your product as a cloud-based service, then you can push instantly available changes all day long.
But that’s not the reality for most companies. Despite the seeming dominance of SaaS, on-prem and self-hosted environments are not only still alive but thriving.
According to a Replicated and Dimensional Research study: 92% of companies indicate on-premises software sales are growing. That’s right, not declining or stabilizing–growing. In fact, their research shows customer demand for on-premises software actually equals that for public cloud. Not only is on-premises growing, but according to their research, 91% of customers report being satisfied with their on-premises software.
This narrative violation isn’t the result of split camps, either. It’s not as though we have one camp using the public cloud and one camp sticking with on-prem. (Sourcegraph, alone, disproves this with our products being split among self-hosted and cloud environments).
The same Replicated and Dimensional Research report shows that over 50% of respondents’ revenue came from on-premises software deployments and that 54% reported that over half or more of all of their software sales were deployed on-premises.
In other words, take Twitter discourse with a grain of salt. SaaS might be a hotter topic, as the Google Trends screenshot below proves, but on-premises is very much still alive.
And if on-premises is alive, then fully continuous deployment cannot be a universal solution.
When you control the platform and the environment, as we do with Search (or as all SaaS products do), then you can ship to production multiple times a day. As much as your team might be prepared to ship that often to your self-hosted product, however, the strategy just isn’t practical.
Upgrading enterprise software isn’t like upgrading your laptop’s OS–it’s much more tedious and much more involved. Apple works in a closed ecosystem, whereas companies like Microsoft and Sourcegraph work in an open ecosystem. In an open ecosystem, people deploy their software in specific ways with specific configurations. We, unlike Apple, don’t control the environment our software runs in.
In an on-premises or self-hosted environment, continuous deployment would mean asking customers to pull and test a commit from an arbitrary point in time. The result would be different customers running many different variations of your product, making it hard to replicate versions in which customers might be encountering a bug (and hard to warn other customers about a particular version).
Your customers have, at times, an uncomfortable amount of agency. That release you’re particularly excited for? The one you rushed those changes into? Your customers might not actually upgrade to it for months. If you rushed in a bug along with a fix, that also means that bug might remain in their codebase long past your next release.
You’re on their schedule. Released does not equal fixed.
That doubly applies when you consider how many different customers have customized their deployments and environments. While you can make it somewhat easier by building those variations into your test matrix, as a general rule, the more variations and customizations a customer has, the harder the upgrade process will be.
Keep in mind that release anxiety doesn’t only apply to you–it applies to your customers. If your releases cause anxiety among your team–the people who know the product best–then the release and upgrade process probably causes anxiety amongst your customers, too. They might have to rely on your customer support team and otherwise take a lot of time to upgrade.
If it’s too costly, both in terms of anxiety and effort, they might think twice about upgrading to the next release. One solution to this is a managed instance plan.
So, does that mean we should let the limitations of an on-premises environment control how we release software? No. Here are three reasons, though we’re sure there are more:
There has to be a better way. And there is! At Sourcegraph, we’ve been honing what we call a continuous delivery mindset. We’re not the first to do this, but results from our experience indicate it’s a strategy worthy of more iteration and uptake.
Even if you can’t deploy continuously, you can still embrace the idea that code should always be in a secure and deployable state. Automated tests run in production-like environments can help assure you that a given code change and its environment will operate like they’re supposed to.
Main should always be releasable but that doesn’t mean every “releasable” PR is polished. Break feature work into multiple PRs that your team can then review and merge over days or weeks.
The goal is to deliver continuous pieces of improvement rather than big chunks of functionality. Regularly ask yourself: “Can this work be broken down into smaller pieces?” The smaller the iteration, the better.
At Sourcegraph, one way we make this easier is by using feature flags. Feature flags determine whether a piece of functionality is active or not. You can put work behind a feature flag that hides it until it’s ready for release. That way, it goes through your continuous integration pipeline that runs all your tests. You can continuously ship value without risking unnecessary exposure.
The name of the game is compactness: you want a short feedback loop and a small scope of work. If something goes wrong after you ship a small change, then it’s easier to diagnose and fix. If you only ship big chunks of code, which could contain hundreds of files, then repair will be harder.
We’ve had the privilege of being able to model this on Sourcegraph Cloud. We release continuously to Cloud, meaning the engineers working on the self-hosted version can see and use Cloud’s tight feedback loop. They can get the benefits of continuous delivery before deploying those changes to the self-hosted version.
Not every company will be able to replicate this system–cloud services are costly after all, and your business model might not facilitate a cloud version. It’s worth considering, however, because engineers expect and benefit from a fast feedback loop.
If you can’t, you can at least build a dog food environment; we have one that’s a staging environment for Sourcegraph Cloud. We deploy to it constantly, and give engineers permission to let it be unstable and offer no guarantees for uptime. Our engineers basically have free reign over this environment and can test anything they want.
A lot of release anxiety comes from the fact that release day is an event. Slack notifications are firing and PR reviews are flying–all with the tone of a harried parent juggling the elements of a Thanksgiving dinner before their in-laws arrive.
The goal is to forget about the release date. Continuously deliver value such that release day is (more or less) like any other day. That way, even if something small slips, you always have something to show your customers. This strategy is especially effective if you prioritize the features you most want to get into the next release early in the release cycle.
Your release schedule is your primary way to reinforce this principle.
Recognize that you’re not chained to a monthly release cadence. Continuous deployment might be impossible but weekly releases might be within your sights. The more often your releases, the less eventful they’ll be.
That said, weekly releases might still be too burdensome for your customers. The solution? Don’t tell them. Instead, cut and announce a public release every month and two weeks later, cut an informal release.
Some teams even go so far as to randomize the cut date. While the release date is regular for the sake of their customers, the cut date is announced to the team that week and the actual weekday is unknown until then. This creates a forcing function that actively blocks people from treating release day as an event. What, after all, is an event without a date?
A faster release schedule will also help you maintain your release infrastructure. If you’re only turning the infrastructure code and pipelines on once a quarter, or even once a month, you’re setting yourself up to run into issues. These problems are likely to compound the longer the gaps are between using the infrastructure to deliver a release.
One mental model through which you can look at this is one that both GitLab and Ubuntu embrace: shift objectives, not due dates. GitLab co-founder Dmitriy Zaporozhets sums it up like this: “Release what you have and do it on a schedule.” Instead of cramming in changes or delaying releases so you can fit in those changes, delay changes to the next release.
Releases should be a true snapshot of your current work. Shift from “Will X be merged?” to “Is X merged?” As Jez Humble, SRE at Google, tweeted, continuous delivery is not about “taking whatever crap you have in version control & shipping it into prod as fast as possible so you can test in prod” but is about “making it SAFE to ship your code into prod quickly.”
It’s come to my attention that many people think continuous delivery/deployment is about taking whatever crap you have in version control & shipping it into prod as fast as possible so you can test in prod— Jez Humble (@jezhumble) October 13, 2021
CD is about making it SAFE to ship your code into prod quickly by:
Your ultimate metric of success won’t be a poll of emotions among your teammates but will be how often your customers upgrade. An uneventful release for you is one that’s uneventful for your customers, too. If your customers are regularly upgrading and consistently keeping up with new releases, then you can trust you’ve reached a point where your releases are uneventful.
A key aspect of continuous deployment and delivery that’s just as available to on-prem and self-hosted environments as SaaS is automation and testing. Even if you release once a month, manual work should still be seen as a nuisance to reduce or eliminate. Without automation, you’re reliant on the heroics of individual engineers, which isn’t scalable.
The process of introducing automation is an investment with a high upfront cost and higher sum of returns. At Sourcegraph, in 2019, it took a person an entire week or two to handle a release; thanks to what was then called the Distribution team (and has since split into Delivery and Developer Experience), automation takes care of it in just a few hours.
Automation, however, has its limitations. As Fitz writes in his essay on continuous deployment, “no automated tests are as brutal, random, malicious, ignorant or aggressive as the sum of all your users will be.”
With that in mind, don’t push your software to all of your users in one fell swoop. Instead, use blue-green deployment, a progressive rollout, or canary instances.
Automation gives you the confidence you need to build a continuous delivery mindset and in so doing, alleviate release anxiety.
Continuous deployment and delivery are hot topics, and Twitter feeds and Hacker News comment sections light up with discussions of the most cutting-edge startups deploying every day, every minute, every second. But the reality for most companies, and most engineers, is more complex.
In most cases, real-world engineering practices will always lag behind their ideal state. Back on Earth, we still have to deal with on-prem environments, upgrade cycles, and release anxiety. Reality can be harsh, and even disappointing, but that doesn’t mean those on the frontier have nothing to teach us. Even if you can’t pursue true continuous deployment and delivery, you can still apply ideas those working on the frontier helped forge.
Even if continuous deployment and delivery isn’t within your grasp, a continuous delivery mindset is.