The distribution team is responsible for making Sourcegraph easy to deploy, scale, monitor, and debug. We solve the most challenging problems our customers face deploying Sourcegraph on-premise in a variety of environments, and on Sourcegraph.com - the largest Sourcegraph instance in the world.
- Slack: #distributioneers channel or @distribution
- Support rotation
- File issues: team/distribution label
Our team has two Slack channels, one public (#distributioneers) and one private (#distribution-team). Our default is to use the public channel. The #distribution-team channel will only be used for discussing internal team topics, like scheduling, retrospectives, how we do updates, etc.
- Observability at Sourcegraph
- Observability developer guide
- Managed instances
- Collecting and inspecting metrics dumps
- How to set up a separate website maintained by Sourcegraph
- How to simulate k8s admin security restrictions
- How to test the Gitlab native integration locally
- How to make updates to global settings and configuration on sourcegraph.com
- Create GCP commitments
- Update various service tokens for sourcegraph.com
- We’re hiring a Product Manager for this role. The engineering team is owning PM responsibilities in the meantime.
- Gonzalo Peci (Engineering Manager)
- Stephen Gutekanst is in an experimental role acting as an internal contributor, focused on high-value low-cost wins across the board. His work is not isolated from the Distribution team, it will tracked on a project board, with regular status updates to his manager and any other relevant teams. He will work closely with (and sometimes embedded in) other teams, and get alignment with the product team (and any other team) on his work. This is an experiment between 2020-11-23 and 2021-01-23 and we will evaluate the outcome around that time. Gonzalo Peci will continue to be his manager during this experiment.
Who is on call? Slack:
/genie whoisoncall Distribution
Who is on call? Slack:
/genie whoisoncall distribution-support
We maintain a support rotation to deal day-to-day requests (respond to ad-hoc, time-sensitive questions from customers or other teams, RFC review requests, etc). This makes it easier for other teams to reach out an engineer on the team who’s priority is answering these questions and helps other engineers on the team remain focused. You can read more about this on the references below. When you are the active support engineer, your focus and priority is responding to requests and questions in #distributioneers, sometimes this might require creating an issue, re-assigning, looping in other engineers, etc.
If for any reason you are unavailable for a rotation, please coordinate with the team for someone to take over.
Contacting the support engineer
There are many reasons to contact the support engineer depending on the priority or urgency of a request. Here are some guidelines that will help ensure we can reply to your request appropriately.
Send us a message in #distributioneers and notify the active engineer by
@ his user in the message.
@ the engineer in the thread in which he is required.
If this is a time sensitive request, please let us know in the message that this is the case
You can send an urgent notification to the engineer directly from Slack utilizing the following command:
/genie alert "ce/p1: Customer XYZ systems are impaired #12345" for distribution-support.
If the engineer is unable, a message will be send to the rest of the team.
Please provide a link to an issue in the message.
We are looking to grow this team to 8 engineers so we can split into two teams.
This team owns how we develop, build, release, deploy, and upgrade Sourcegraph Cloud and at customers on-premise.
- Expand and maintain our customer deployment options (for example: single Docker image, Docker Compose, Kubernetes, AMI)
- Ensure that we have a fast and automated release/deployment process to Cloud.
- Maintain Buildkite infrastructure.
- Provide automated upgrades for on-premise customers.
- Build general dev tools that aren’t specific to frontend or backend (for example: Codenotify, merge/review bots).
This team is directly responsible for Sourcegraph Cloud uptime and reliability, as well as being responsible for operational infrastructure that enables other engineering teams to observe and monitor the parts of the product they own.
- Provide logging and metrics infrastructure that enables developers to observe/search current and historical data as well as define alerts.
- Provision and scale of our Cloud resources (for example: GCP).
- Measure and sustain 99.99% uptime and reliability of Sourcegraph Cloud.
- Enforce access controls and network security policies for our Cloud resources.
- Make it easy for customers to share relevant data when submitting bug reports.