Distribution team

Sourcegraph Distribution team logo

The distribution team is responsible for making Sourcegraph easy to deploy, scale, monitor, and debug. We solve the most challenging problems our customers face deploying Sourcegraph on-premise in a variety of environments, and on Sourcegraph.com - the largest Sourcegraph instance in the world.

Contact

Team communication

Our team has two Slack channels, one public (#distributioneers) and one private (#distribution-team). Our default is to use the public channel. The #distribution-team channel will only be used for discussing internal team topics, like scheduling, retrospectives, how we do updates, etc.

Goals

See Goals

Details

Resources

FAQ

Members

Other:

  • Stephen Gutekanst is in an experimental role acting as an internal contributor, focused on high-value low-cost wins across the board. His work is not isolated from the Distribution team, it will tracked on a project board, with regular status updates to his manager and any other relevant teams. He will work closely with (and sometimes embedded in) other teams, and get alignment with the product team (and any other team) on his work. This is an experiment between 2020-11-23 and 2021-01-23 and we will evaluate the outcome around that time. Gonzalo Peci will continue to be his manager during this experiment.

On-call rotation

Who is on call? Slack: /genie whoisoncall Distribution

We have an OpsGenie rotation to respond to incidents (events that cause service disruption). You can find more information about being on call in the handbook.

Incidents on-call

Support rotation

Who is on call? Slack: /genie whoisoncall distribution-support

We maintain a support rotation to deal day-to-day requests (respond to ad-hoc, time-sensitive questions from customers or other teams, RFC review requests, etc). This makes it easier for other teams to reach out an engineer on the team who’s priority is answering these questions and helps other engineers on the team remain focused. You can read more about this on the references below. When you are the active support engineer, your focus and priority is responding to requests and questions in #distributioneers, sometimes this might require creating an issue, re-assigning, looping in other engineers, etc.

If for any reason you are unavailable for a rotation, please coordinate with the team for someone to take over.

References:

Contacting the support engineer

There are many reasons to contact the support engineer depending on the priority or urgency of a request. Here are some guidelines that will help ensure we can reply to your request appropriately.

General questions

Send us a message in #distributioneers and notify the active engineer by @ his user in the message. Alternatively @ the engineer in the thread in which he is required. If this is a time sensitive request, please let us know in the message that this is the case

Customer incidents

You can send an urgent notification to the engineer directly from Slack utilizing the following command: /genie alert "ce/p1: Customer XYZ systems are impaired #12345" for distribution-support. If the engineer is unable, a message will be send to the rest of the team.

Please provide a link to an issue in the message.

Growth plan

Updated 2020-11-18

We are looking to grow this team to 8 engineers so we can split into two teams.

Distribution

This team owns how we develop, build, release, deploy, and upgrade Sourcegraph Cloud and at customers on-premise.

Examples:

  • Expand and maintain our customer deployment options (for example: single Docker image, Docker Compose, Kubernetes, AMI)
  • Ensure that we have a fast and automated release/deployment process to Cloud.
  • Maintain Buildkite infrastructure.
  • Provide automated upgrades for on-premise customers.
  • Build general dev tools that aren’t specific to frontend or backend (for example: Codenotify, merge/review bots).

Operations platform

This team is directly responsible for Sourcegraph Cloud uptime and reliability, as well as being responsible for operational infrastructure that enables other engineering teams to observe and monitor the parts of the product they own.

Examples:

  • Provide logging and metrics infrastructure that enables developers to observe/search current and historical data as well as define alerts.
  • Provision and scale of our Cloud resources (for example: GCP).
  • Measure and sustain 99.99% uptime and reliability of Sourcegraph Cloud.
  • Enforce access controls and network security policies for our Cloud resources.
  • Make it easy for customers to share relevant data when submitting bug reports.