Code intelligence team

This is your brain on code

The Code Intelligence team builds tools and services that provide contextual information around code, taking into account its lexical, syntactic, and semantic structure. This includes:

  • An API to provide fast, comprehensive, and accurate answers to important code navigation queries such as Go to Definition and Find References

  • A powerful and flexible language-agnostic model of dependency relationships across projects, repositories, and languages

  • Robust, extensible, and scalable infrastructure to index code across all languages, keep those indexes up-to-date, and efficiently resolve code intelligence queries against all indexed code.

History

The origin of our work is the core of Sourcegraph, first implemented as srclib a code analysis tool developed by our co-founders Beyang and Quinn in 2015.

2016 - We evolved our strategy to utilize LSP (Language Server Protocol) to power the new Sourcegraph code search engine. LSP is a user oriented protocol and requires an implementation of a language server for each programming language.

2017 - Over time we started to experience some issues with language servers. They can be difficult to deploy, slow at runtime, slow to adopt by members of their respective language communities, and slow to develop.

2018 - We added a new basic code intelligence that is built on search-based heuristics to allow us to provide quick and good enough i.e. “imprecise” support for the most popular programming languages.

2019 - While looking for more efficient alternatives to language servers we found LSIF (Language Server Index Format) which provided an index-based implementation of precise code navigation (similar to srclib) with the advantage of being able to build on top of the work done in the LSP community (LSIF and LSP are sister protocols).

2020 and beyond - The Code Intelligence team is responsible for how Sourcegraph understands code and for providing data that powers semantic search, campaigns and code insights. Our API is the point of discovery for the knowledge graph for all source code.

Vision

To be the point of discovery for the knowledge graph for all source code.

Mission

Build a code intelligence platform that understands code in all languages and provides rich metadata to power code search, insights and campaigns.

Goals

This list will be updated regularly with links to the technical implementation details in RFCs and/or Github issues.

Provide precise code intel for the primary programming languages that combined cover > 50% of usage across customer projects.

  • Metrics:
    • Language coverage (%)
  • Planned work:
    1. Resolve outstanding issues for lsif-clang, ship to 3 customers, tracking issue 12349.
    2. Resolve outstanding issues for lsif-java, ship to 3 customers, tracking issue 13017.
    3. Resolve outstanding issues for lsif-go, ship to 3 customers, tracking issue 13015.
    4. Provide best effort auto-indexing for supported languages, RFC-199, RFC-201.
  • Success Outcome: We have accurate and comprehensive precise code intel for C++, Go, and Java. We will work to ship each indexer to at least 3 customers and gather feedback for future iteration. The precision level for each indexer tool is > 80%.
  • Progress: See the Code Intel Looker dashboard.

Create a backend that can perform and scale across organizations of various sizes up to the upper boundaries of our system.

  • Metrics:
    • Request Latency
    • System Throughput
    • Max number of repositories (N1)
    • Max size of project (N2)
    • Max commit rate of repository (N3)
  • Planned work:
    1. Gather current values for metrics Request Latency and System Throughput, create benchmarks for existing system.
    2. Source and track current performance of tools across following dimensions: Number of repositories (N1), Size of project (N2), Commit rate of repos (N3).
    3. Determine the upper boundary of the code intel backend across scale dimensions: max of N1, N2, N3.
    4. Create a testbed environment that allows the team to instantiate & execute instances against various use cases in a controlled method for automation of stress and volume testing, RFC 218.
    5. Develop a prototype for incremental indexing, RFC-170.
    6. Shard bundle manager, RFC-200.
  • Success Outcome: We understand the upper boundaries N1-N3. Code intel backend can achieve latency and throughput targets when operating within upper bounds of inputs N1-N3.

LSIF indexers can operate in monorepos architecture.

  • Metrics:
    • Time to Intelligence (TTI)
  • Planned work:
    1. Create definition for Time to Intelligence metric and determine how this metric is measured and tracked.
    2. Create and send a survey to customers to gather information on monorepos use cases and stats.
    3. Identify the common challenges our customers are experiencing when operating Sourcegraph in monorepos by working directly with 3 customers to index a monorepos, gather feedback and identity problems we need to address.
    4. Update lsif-* tools to enable indexing subsets of a project.
    5. Integrate with monorepos build systems: Bazel and stretch goal is one of: Buck, Pants, OAO
  • Successful Outcome: Users are able to process monorepos with the same TTI as any other type of repository. Our tools provide configuration options that accomodate for complex use cases (cross repository and sub-project code intel).

Provide clarity on capabilities of code intel tools at each release state

  • Planned work:
    1. Define release states and their respective criteria.
    2. Assess each code intel tool against criteria and determine release state.
    3. Communicate changes by updating documentation on lsif.dev and /sourcegraph/lsif-* repos.
  • Success Outcome: We are able to consistently apply release states to our tools as they move through development lifecycle. We have provided clear communication to users, internal teams and stakeholders on capabilities of each tool.

Contact

Tech stack

Precise code intelligence specific services are written in Go and code intelligence features are added to the existing frontend service where possible. Sourcegraph extensions that provide code intelligence are written in TypeScript. We use SQLite databases to store data specific to a particular LSIF index and Postgres to store all other data.

Team communication

Here are some key ways to contact us:

Team process

The code intel team has defined a set of rituals that keeps the team connected around our common goals. As a team we also believe in good meeting etiquette, and to ensure effective meetings for everyone every meeting must have a meaningful topic, agenda, and stated goal or purpose statement defined ahead of time.

On a weekly basis:

  • Team Sync Meeting: Every Monday, the code intel team meets to stay in sync and hear from stakeholders. We use a Google doc for agenda and meeting notes. Prior to the weekly sync meeting, each team member should:

    1. Add any agenda items that should be discussed.
    2. Review any existing agenda items and be prepared to discuss them.
    3. Update the current release tracking issue with a summary of progress for the previous week and plans for the next week.
  • Team Updates: Every Friday, we update the tracking issue with our weekly progress to inform external stakeholders of the state of milestone goals. The engineers working on a goal are responsible for its update.

For each iteration (currently one month long), we follow this process:

  • Planning Meeting(s): Prior to each iteration the team has planning meetin(s) to discuss which items should be prioritized for the upcoming release. We revisit oustanding work that track against our team goals, and go over our backlog of bugs and feature requests. Once the team agrees on a task, we add it to the tracking issue and as part of our team goals for the milestone.

  • Retrospective Meeting: After the release branch has been cut for the iteration, the team has a retro to discuss how the iteration went, and what changes we might want to make to continously improve our teamwork.

We track most of our work using issues on the Sourcegraph main repository. If you have an issue that wants our attention, mention the @sourcegraph/code-intel team or tag your issue with the team/code-intelligence label.

New Engineer Onboarding

Our team is growing and to help our new teammates have the best onboarding experience we have created additional material that builds on Sourcegraph’s general engineering onboarding guide. We think will help you ramp up in all things code intel.

Weeks 1 - 4

Architecture Presentation: An experienced code intel engineer will give a live presentation primarily to the new hire (and any other Sourcegraph teammate who’s interested) explaining the architecture of our systems at a high level.

Pair Programming: Keeping inline with engineering pairing sessions we will support our new teammates by setting aside dedicated time every week for mentoring. This not only helps deliver guided learning but also is great for team and relationship building.

First Tasks: The team has marked issues we believe are good first issues in Github, following the starter task process work with the code intel engineering manager to select three issues that you think are good ones for you to work on in your first 4 weeks.

Technical Design Docs & Diagrams: We know that a picture is worth a thousand words and we will create visual diagrams to supplement our written documentation for the most critical or complex components and processes. This material is great for async learning and can provide a better understanding of the parts that make up the big picture. Check code intel user documentation and our journey with LSIF post to understand the overall view. Browse through available documentation and diagrams in the code intel developer documentation and code intel Google drive and learn how the Bundle Manager works or view Inter-service request diagrams.

Second Month

Inverted Presentation: Our new teammate has now had some time to explore the source code and will use their knowledge to present to the team. You can choose to either explain the architecture of an existing component or present a solution to a problem you just solved. The team will give feedback and clarify where needed and provide more details to high-level designs.

Feedback: We want to hear from our new teammate how helpful each of these onboarding experiences was and we will continuously update our onboarding guide based on new insights. As per the engineering guide for feedback will be gathered in week 6.

Members

Temporary responsibilities while we search for an engineering manager:

  • María
    • Send weekly update
    • Run retrospectives
    • Drive milestone planning (invite Nick to help facilitate)
  • Eric
    • Organize and run code intel team meetings

Ownership

On-call

Growth plan

Updated 2020-09-22

We are hiring an Engineering Manager to lead and grow this team.

We are also interested to hire an engineer who is capable of working on our web application as well as our backend systems. Apply here: Software Engineer - Full Stack