Sourcegraph monitoring guide

This page documents Sourcegraph-specific guides on developing monitoring for Sourcegraph. For general observability development, refer to the observability developer documentation.

For more context on monitoring at Sourcegraph, you should refer to:

Note: Looking for how to monitor Sourcegraph? See the observability documentation.

image

Finding monitoring

This section outlines how to leverage existing Sourcegraph monitoring for scenarios specific to engineers at Sourcegraph.

For general documentation on finding monitoring, refer to how to find monitoring.

For documentation on how site administrators find monitoring, refer to the Sourcegraph observability documentation.

Alerts

This section describes where Sourcegraph employees can find active alerts for Sourcegraph instances.

Sourcegraph instances

Instances managed by Sourcegraph (Sourcegraph Cloud, k8s.sgdev.org, etc.) have alerts redirected to Slack and Opsgenie as documented in the instances page. Additional details can be found in each instance’s Grafana dashboards (/-/debug/grafana).

If you wish, you can set up Slack alerts for your own team on various instances by adding something like the following to the site configuration (site-admin/configuration) of that instance:

  "observability.alerts": [
    {
      "level": "critical",
      "notifier": {
        "type": "slack",
        "username": "$TEAM - Sourcegraph Cloud",
        "url": "https://hooks.slack.com/services/..."
      },
      "owners": [
        "$TEAM"
      ]
    },
  ]

To silence an alert on a Sourcegraph instance you need to edit the deployed ConfigMap. For example, for Sourcegraph Cloud you need to edit this file and push to the release branch.

Customer instances

The bug report page (/site-admin/report-bug) for each Sourcegraph instance has a page that provides useful information about an instance’s configuration. In this page, there is a field "alerts": that can be used to request recent alert data from customer instances:

  "alerts": [
    {
      "serviceName": "executor-queue",
      "name": "warning: executor_queue_growth_rate",
      "timestamp": "2020-11-28T14:00:00Z",
      "average": 0.6504517025712306, // % of last 12 hours during which this alert was firing
      "owner": "code-intel"
    },
    // ...
  ]

Data for recent alerts and metrics can be requested from customers from their Grafana dashboards (/-/debug/grafana).

Adding monitoring

See the how to add monitoring guide for most use cases. This section describes guides specific to engineers at Sourcegraph.

Creating Cloud-only Grafana dashboards

While all dashboards required to troubleshoot our product should be shipped to customers, our Cloud deployment might require additional dashboards to the ones we ship to customers, for example:

  • When the additional dashboard is not ready yet to graduate to customers
  • When the additional dashboard applies only to our Cloud deployment

Dashboards can be deployed to our Cloud deployment by adding them in json format to dashboards/files in deploy-sourcegraph-dot-com. To learn more, reference the dashboard generator documentation.

Once the dashboard is ready to be shipped to customers, we will need to port it to the monitoring generator to be included in our next Sourcegraph release. Custom dashboards cannot be added to the sourcegraph/grafana except through the generator.

You can use a local Grafana or the Cloud Grafana to create a new dashboard and once its ready, export it by following these steps:

  • Open “Dashboard Settings” (top right cog).
  • Select “JSON Model”.
  • Select the JSON content and save to a .json file in sourcegraph/deploy-sourcegraph-dot-com/dashboards/files.
  • Create a new Pull Request with your changes.

Once deployed, you should be able to see your changes in sourcegraph.com.

Warning: Sourcegraph’s Grafana UI does not allow direct changes due to a CSRF issue (sourcegraph#6075).

Grafana

Sourcegraph uses a custom Grafana image, sourcegraph/grafana, which contains minor changes from the vanilla Grafana image. Learn more about its role in our overall monitoring architecture here, and also see the dev documentation here.

Prometheus and Alertmanager

Sourcegraph uses a custom Prometheus image, sourcegraph/prometheus, that bundles Alertmanager and a wrapper program for managing configuration changes. Learn more about its role in our overall monitoring architecture here, and also see the dev documentation here.

Additional reading

Next steps