Amplitude

How do I get started?

  1. Login to our workspace (named Sourcegraph). If you don’t have an account, shoot a message to @ericbm or request one during login.
  2. Take a look at the data overview to get a sense for what’s in Amplitude
  3. Read the Amplitude documentation for building analyses and/or check out some of our tutorials.
  4. Check out the Product team space for existing dashboards.
  5. Ask BizOps for help if you have any questions!

Why are we using Amplitude?

Amplitude is a product analytics tool that specializes in turning event data into actionable insights and dashboards. Any question about how our product is used is probably easiest to answer in Amplitude because they’re built specifically to answer these questions.

Why aren’t we using Looker for this?

Looker is very flexible in that we have the ability to set it up for any purpose we’d like. The downside of this is that to provide a great self-service experience for questions about product usagee, this would take a LOT of work. We get these capabilities out of the box with Amplitude.

What is in Looker vs. Amplitude?

Anything not based directly on analyzing Sourcegraph Cloud events is in Looker. This includes pings from on-prem instances, anything we get from the Cloud Postgres database and any data from third-parties tools (such as Google Analytics and Salesforce).

Type of analysis Tool Example
Retention/engagement Amplitude Link
Adoption of public code Looker Link
Progress towards OKRs Looker Link
On-prem instances/pings Looker Link
Signup funnel on Cloud Amplitude Link

Any analysis that was conducted in Looker prior to the implementation of Amplitude can still be done in Looker, and existing Looker dashboards and visualizations will still be maintained. Amplitude will help us conduct new and different analysis regarding product analytics.

Data

Overview

Most Sourcegraph Cloud events are being sent to Amplitude. The events not being sent are extremely low traffic events, non-UI events (e.g. backend events for a search that we capture) or ones we have explicitly decided to exclude (such as code insights events because we’re focused on enterprise and Cloud data won’t inform any decisions). The full data map is in Drive.

The data is currently backfilled from .

It’s sent through a script that runs every hour and pulls from sourcegraph_analytics.amplitude_events_v5, which is loaded by this scheduled query.

Adding event properties

Event properties are attributes of a particular event. These are added to the amplitude_events_v5 table as individual columns from this scheduled query. For example, here’s a snippet that extracts event properties from both the argument field of an event, as well as an event itself.

The steps to adding additional event properties to the query/table:

  1. Create a PR to update the amplitude_events_v5 scheduled query with the new event properties.
  2. Once another member of the team approves, open a blank BigQuery query editor, run the query, and then save the results to a new test table (e.g. amplitude_test_20210812).
  3. Create an issue for Data Engineering to upload a sample of the new test table to a test project in Amplitude.
  4. If the data shows up as expected in Amplitude, go ahead and update the amplitude_events_v5 query with your PR.
  5. Update the INSERT statement with the new event properties. You can’t backfill event properties to events that already exist in Amplitude; however, you can backfill event properties if you’re backfilling events also.

If you’re adding a new event property, please add it to the data map.

JSON_EXTRACT(argument,
  '$.code_search.results.results_count') AS results_count,
JSON_EXTRACT(argument,
  '$.code_search.results.alert') AS alert,
CASE
  WHEN name = 'ViewRepogroup:python' THEN 'python'
  WHEN name = 'ViewRepogroup:golang' THEN 'golang'
  WHEN name = 'ViewRepogroup:android' THEN 'android'
  WHEN name = 'ViewRepogroup:kubernetes' THEN 'kubernetes'
  WHEN name = 'ViewRepogroup:kubernetes' THEN 'kubernetes'
  WHEN name = 'ViewRepogroup:kubernetes' THEN 'kubernetes'
  WHEN name = 'ViewRepogroup:stanford' THEN 'stanford'
  WHEN name = 'ViewRepogroup:cncf' THEN 'cncf'
ELSE
NULL
END
repogroup_name,

Adding user properties

User properties are the attributes of individual users. The amplitude_user_characteristics table defines all user properties, and this table is joined in with every event that is passed to Amplitude. These user properties include every A/B test from ab_test_users. See more in the A/B testing doc (coming soon).

If an additional user property is added to this table, it will only be applied to events and users going forward. User properties are sent to Amplitude along with events, so new properties will only be added when a user triggers an event that is sent to Amplitude. When adding a user property, please add it to the data map

Adding events to Amplitude

Amplitude is built on top of our existing eventLogger infrastructure, so we consider an event to be anything logged by this. We use object action framework in Proper Case for our naming. For example, in eventLogger this would show up as ‘SearchSubmitted’, and in Amplitude it’s converted to ‘Search Submitted’.

  1. A product/engineering team member should create an issue with the events they would like added.

For BizOps to do:

  1. Create a PR to update the amplitude_events_v5 scheduled query with the new events. Once this is approved by another member of the team, you can update the scheduled query itself. DO NOT any configuration of the scheduled query (e.g. the schedule and the write preference).
  2. If we want to backfill events to Amplitude that are older than one day ago, we need to run this INSERT statement, which will add specified events into sourcegraph_analytics.amplitude_events_v5. When this is run, all prior occurrences (between the start date up until the day before yesterday) of the new event will be added to the table.
  3. Add the event to the data map.
  4. Within an hour or two of the INSERT statement, the automated script will load these events into Amplitude.

A/B testing in Amplitude

Each A/B test has a user property where the A/B test is true or false (true = they saw the variant, false = they saw the original). See the page on A/B testing for more information about experimentation at Sourcegraph.