Personal Runbooks

Dec 8, 2022

Staying productive as a software engineer is a challenge made easier with effective tools. Day to day, we’re engaging with a daunting myriad of contexts, ideas, problems, and solutions. Urgency and criticality vary - is this a drop-everything issue, or is it a candidate for pause and analysis before creating a solution? Then a coworker asks for help on something you haven’t thought about for 3 months (which in my experience feels like an eternity as a software engineer). Which tool will ensure our productivity stays high while we encounter so many new situations?

Iteration zero of this tool is one of the most basic productivity tools: the humble notebook. Whether physical or digital, a simple notebook hardened through years of attention becomes a personal knowledgebase, allowing you to apply your software engineering mindset effectively and efficiently. A notebook starts as an empty page. The void on the paper begs the question: what should we put in this notebook?

The practice I’ve used over the years and tried to hone is what I am now calling a Personal Runbook. For those unaware of this concept from the sysadmin and SRE (Site Reliability Engineering) domains, a Runbook is a set of actions outlined in a workflow as a response to some incident or trigger. Folks in operations maintain these runbooks to improve their efficiency when responding to time-sensitive issues such as service outages or other catastrophies. An example trigger in the operations domain could be a threshold being reached on an observed metric (e.g. rate of http response status). When the trigger is presented to a human through alerting, they choose a runbook that suits the scenario based on the nature of the trigger.

Identifying triggers

How can we apply the concept of an operations runbook to our daily life as a software engineer? There are many repeatable actions we take throughtout the day that are a response to some trigger or input we receive. We first need to identify common triggers and take the necessary steps to document them. As an example, here are some triggers that are common enough that I have created personal runbooks for them:

Submit a pull request
Create an API endpoint that accepts a request body and validates properties on the JSON
Prepare a demo of a feature

Considering these triggers, the quicker answer is to simply copy what I did last time. Submitting a PR? That’s simple, and I’ll even use a PR description template and fill it out. The difference between the runbook approach and the what I did last time approach is that the runbook exercises your skills which allows you to gain experience - the thing that every engineer should be seeking as often as possible. It also guides you using the collective experience you’ve gained previously, something that we hope inately happens by our own volition but is not always the case.

It may be better to think of triggers in the case of a product-focused software engineer as outcomes. I have a desired outcome in mind, and I outline the workflow that makes the outcome a reality.

Writing a runbook

Now that we have an idea of the triggers that would necessitate us to reach for a runbook, let’s get into writing the runbook itself. The runbook is simply a document, and from a technical perspective, Markdown is the tried and true approach, though I won’t stop you from using Docbook. The storage of the runbooks is up to you, but a private repository on GitHub with folders of Markdown files is a perfect and simple approach that will be second nature to manage (if you already use git). However, you may want to consider other options if your runbooks will contain information confidential to your employer.

Note on confidential information

Confidential information such as project names or private IP may be alright to store in personal runbooks used for employment, if you keep it securely on your machine. You should never store any secret information such as API keys or passwords. Using the right tooling and scripts, you could securely integrate secrets into a personal runbook, but you should make sure you’re aware of what you’re doing.

For the runbook contents, the most effective place to start is a template. Searching for operations runbook template will give you some starting points, and surprisingly, many of the operations templates can be used in the personal context with some small modifications.

Here’s a template I use for new personal runbooks:

# runbook-title

## Outcome

> Define an outcome. In 1-2 sentences, what should be the result
> of exercising this runbook.

## Prereqs

> Outline anything that must be done before executing this runbook.

## Stakeholders

> Note people or roles that are commonly tied to the process in this runbook.

## Manifest

> The manifest is the set of workflow steps that produce the desired outcome.

- [ ] Concrete step 1
- [ ] Concrete step 2
- [ ] Concrete step 3

## Completion

> Describe the output of the runbook. When using this runbook as a template,
> this is where you can take notes of anything that didn't go according to plan.
> Perhaps a step was missed in the manifest that should be there, or a new output
> from a command or action is possible that wasn't previously.

Based on this template, we can create a personal runbook we can reach for when submitting a pull request:

The contents of this personal runbook are a redacted version of what I actually use. While aspects of it were carried over from previous jobs and roles, I’ve changed it over time to include the parts that matter to me right now. For example, as a new golang engineer, I was owned enough times by the for _, p := range sliceOfPointers { ... } issue that I added it to the manifest as its own step.

Runbook executions

Any runbook can be used in multiple ways. It could first be a reference guide for a process. If you simply need to refer to the process and be able to execute it later, but are not concerned with the output, then storing the runbooks is up to how you tag or categorize your work. Save the files into your notebook and you’re ready to go.

Operations teams also use runbooks as templates themselves. First the runbook is defined ahead of time. Then, when it’s time to execute the runbook, it’s copied and used as a log entry for that execution. This is immensely useful in the area of operational improvement since your historical log of successes and failures is reflected in the notes you took as you executed the runbook.

The organization of that can be described in the chart below.

Diagram showing a hierarchy between a runbook template as a parent of individual runbooks, with children of a runbook being the executions of that runbook.

Runbooks as templates with executions as artifacts

With the goal of storing the executions, we now have a link between the runbook itself and the executions of that runbook. We want our runbooks to improve through the experience we gain running them, so it’s important to include an additional step of reflection in the workflow to modify the runbook when improvements are available.

It may not make sense for every type of runbook to have persistence of executions. With the pull request runbook, for example, I do not record every PR I make using the runbook as a template. That would not only be exhausting but a poor use of time. For runbooks that are more targeted at making things happen, whether that’s building a certain kind of feature or writing a new chapter of a book, recording the execution can lead to continuous and compounding productivity gains.

Long-lived executions

Just like the processes that start and stop on your OS, executions of personal runbooks generally come in the form of short-lived or long-lived. If you’re running one to assess the state of a customer’s data, it’s likely going to have a short-lived execution cycle. Running an execution for development of a new feature could take some time, and you may want to use the runbook execution as a sort of diary as well, including possible solutions you tried and abandoned.

Organization

The scope of maintaining your personal notebook is its own topic entirely, but the personal runbooks should fit into your notebook’s organization model. Again, the most effective tools will have the greatest change to your productivity. We want to ensure that we have the resources we need, when we need them.

Distinguish between what resides in a personal notebook and what resides in an employer-specific notebook. Runbooks that only apply to a specific project will generally live in an employer-specific notebook, but it’s also important to determine if the contents of the runbook are too specific. A runbook that, at first glance, only applies to a specific work project might be useful elsewhere with a few modifications. Think critically about the layers of abstraction your runbook operates at.

Diagram of layers of abstraction presented as small to large boxes, the smallest being "Low abstraction" and the largest being "Highly generalized"

Considering layers of abstraction will optimize your personal runbook organization.

The cost of all runbooks being highly generalized is that the context can become unclear about what problem it addresses. Sometimes the outcomes of a personal runbook should be more specific to ensure that future you has access to it based on the shared context each time you execute the runbook. However, a highly generalized runbook is broadly applicable and serves as the more foundational components in a large set of runbooks.

Becoming reliable

As mentioned previously, runbooks are essential in the sysadmin and SRE domains where repeatable procedures are the backbone of daily tasks. Consider your contribution as an engineer to your team or company: how reliable are you?

I believe reliability as an engineer is one of the paramount traits of incredible engineers. When an engineer can consistently deliver correct and effective results, this is often more productive and impactful than simply “having the best ideas” or being a thought leader. Having both of these traits is of course preferred, but engineers that are consistent in their efforts and, more critically, the outcomes of those efforts will be important assets to their organizations.

Back