Tammy Butow (@tammybutow) is an Engineering Manager in Developer Infrastructure at Dropbox. Tammy manages code workflows—the entire experience of using Go at Dropbox, from coding to shipping. She talked at GopherCon 2017 about how Dropbox engineers build and run large-scale services in Go.
Note: This post was best-effort live-blogged at the conference. Let me know on Twitter (@sqs) if I missed anything. Any mistakes or misrepresentations are my fault; Tammy’s talk was excellent!
Tammy cited Rob Pike’s Go at Google: Language Design in the Service of Software Engineering from 2012 as being a great summary of why Go works well at Dropbox:
“Go is efficient, scalable, and productive. Some programmers find it fun to work in; others find it unimaginative, even boring. In this article we will explain why those are not contradictory positions. Go was designed to address the problems faced in software development at Google, which led to a language that is not a breakthrough research language but is nonetheless an excellent tool for engineering large software projects.” — Rob Pike, 2012
Dropbox’s scale is massive:
As a result, Dropbox demands a lot from their systems, languages, and engineers. The guiding principles and requirements are:
Today, most Dropbox infrastructure is written in Go. Specifically:
Some of the key systems written in Go are:
Many of these are successors to previous non-Go systems.
Tammy shared some stories of how Go usage grew organically at Dropbox.
Before hack week one year, Dropbox engineers implemented rate limiting and throttling separately in each service that needed them. For hack week, a Dropbox engineer decided to build a common service implementation of these things. Thus RAT (Rate limiting And Throttling) was born.
The initial RAT prototype was built in 4 days and demo’d on day 5. Within a few weeks of RAT being built, word spread. Another Dropbox engineer emailed Tammy’s team to see how they could use RAT—from a Python project. This integration went smoothly, adoption grew organically, and RAT quickly became useful. Now several teams across Dropbox use RAT.
Dropbox has 6,000+ databases, and such a large system requires automation and monitoring. At any given moment, databases are undergoing replication, failing over, being promoted from replica to primary, etc.
To manage this, a Dropbox engineer built DBmanager, which is a web UI to quickly see what’s happening among all 6,000+ databases. It also publishes this status information to other systems.
With hundreds of engineers, Dropbox coordinates Go major version upgrades carefully. Tammy didn’t mention any particular pains from upgrading, which suggests the process is smooth!
Some interesting facts:
Every Dropbox engineer goes through the same rigorous Go onboarding process, consisting of:
This onboarding process takes about a week for experienced engineers.
Overall, Dropbox’s usage of Go has been very successful.
One key data point is that there’s no effort at Dropbox to rewrite services from Go to another language, which is a sign that people are generally happy. (Tammy did toss out an intriguing piece of information: there is a little bit of Rust in use at Dropbox. But it’s not being considered as a replacement for Go.)
The biggest pain with Go that Tammy identified was in dealing with race conditions.
Dropbox hires engineers who care about reliability and durability, so this comes naturally to them (even though concurrency is perennially tough everywhere).
Thanks to Tammy Butow (@tammybutow) for a fantastic talk. In addition to being a hacker and engineering manager, Tammy also founded Girl Geek Academy, a movement to teach 1 million technical women by 2025. Check it out!