Code intelligence with LSIF

Chris Wendt

Want to use LSIF for precise code intelligence on Sourcegraph? See "LSIF: Fast and precise code intelligence (Sourcegraph documentation)".

Since the last code intelligence update, we have shifted our efforts away from improving language servers. We found that language servers were hard to develop due to the number of build and dependency systems per language, hard to deploy and connect to a Sourcegraph instance, and hard to make fast enough to meet our user's expectations. The Sourcegraph 3.0 release in February included zero configuration code intelligence for the 20+ most popular languages based on ctags and text search. This has provided a much better default experience for Sourcegraph customers, but there is one limitation: results are imprecise unless a customer configures a language server.

Now, we are working on a new way of providing code intelligence that’s both fast and precise. The idea is to use compiler frontends to precompute code intelligence data in a project-specific build environment and then upload that data to Sourcegraph. This has a lot of benefits:

  • Precision and correctness: code analysis is performed in the proper build environment for the project using the same compiler frontend as your normal build.
  • Fast: since the code intelligence data is precomputed, go to definition and find references are essentially table lookups.
  • Precedent: this is how the largest software companies provide code intelligence to their employees.

Here’s what the setup process will look like:

  • Add a step to your CI that runs on each commit or daily (depending on the repository size)
  • In that step, generate code intelligence data and upload it to Sourcegraph (this is similar to other build artifacts such as code coverage information)

What format is that code intelligence data in? We are using LSIF (Language Server Index Format), which is a graph of code intelligence information such as definitions, references, hover documentation, similar in spirit to Kythe. The graph is comprised of vertices for each definition/reference/hover and edges that connect references to definitions.

To learn more, check out our lightning talk about LSIF from GopherCon 2019:

We have found that developing LSIF indexers is much easier than language servers. A one-shot command line tool that runs in the proper build environment and writes data to file is simpler than implementing a long lived remote LSP server. In the few weeks that we’ve been working on LSIF support we have been able to create new LSIF exporters for Go, Python, and C/C++. Microsoft has also already created LSIF exporters for TypeScript and Java.

We are looking forward to working with customers to start producing LSIF data in their build pipelines so that they can enjoy the benefits of fast precise code intelligence.

If you are interested to work on code intelligence at Sourcegraph, we're hiring!

Get Cody, the AI coding assistant

Cody makes it easy to write, fix, and maintain code.