Open sourcing Cody

Beyang Liu

We've open-sourced Cody, Sourcegraph's AI-enabled editor assistant under the Apache 2.0 license. You can view the code and join our livestreams where we'll show you around the codebase and build new features in public. Or just install it and try it out.

What is Cody? Cody is like ChatGPT in your editor, but it knows about your code. Like other AI coding assistants, Cody makes use of Large Language Models (LLMs) under the hood. But where Cody differentiates is in its ability to fetch context from your broader codebase and Sourcegraph's code graph. This enables Cody to ground its answers factually and generate code that mirrors the patterns of your codebase. Cody's not perfect. It can fetch the wrong context and hallucinate, but in our experience it performs far better than other tools that rely solely on LLMs and local context.

If you're a just-give-me-the-bottom-line type of person, you can stop reading now and just install the extension. But for those who want the "why" rather than just the "what", keep reading. Because we are in the middle of an AI gold rush, or as Steve Yegge puts it, "the trillion dollar money volcano." And if someone walks up to you in such a volcano while $100 bills are raining down all around and tells you they'd like to give you something for free that they spent many nights and weekends working on rather just trying to grab as many $100 bills as they can, it's only natural to ask, "But why?"

Wait by why meme

Well there are business reasons, of course, but to get to those, we need to start with the needs of our users. Our users are developers. We, ourselves, are developers, and we understand that devs have a preference for open tools. We have nothing against proprietary software—most of our customers' code is closed source—but as a developer, it feels better to use tools that are open. It's not so much about the money—software engineering is one of the best-compensated professions—as it is about transparency and independence. Devs are professional automators, and you can only automate so much before you start looking at ways to automate the more tedious parts of your own work. And automating yourself means customizing or taking apart your tools and putting them back together in a way that makes them—and you—more effective. Many devs love programming so much that it becomes almost a way of life, and it can feel weird to have a tool you use and rely on day-to-day be completely opaque and closed off from you.

Making a tool open addresses those concerns. And in turn, you get the benefit of having users as contributors who can contribute back great new features and fixes.

Now I want to be upfront and say that Cody is part of Sourcegraph and Sourcegraph is not 100% open source, but open core. Open core allows us to preserve our pricing power for enterprises while still making all our source code public. There are differing opinions here, but we think this strikes a good tradeoff of providing transparency and mitigating workflow dependency risk, while preserving our pricing power as a business (i.e., it allows us to make enough money so we can grow as a business and make Sourcegraph even better). That being said, Cody itself is entirely open source and does not require the rest of Sourcegraph to run, though it will get "smarter" when connected to Sourcegraph's search and code intelligence APIs.

So now the selfish business reasons are clear. Recognizing what we know about developers, it's obvious that our user base prefers open tools. There would be tension here if our business was making money off of individual devs. But Sourcegraph The Business sells enterprise software, not individual licenses. We actually made the decision early on to never focus on monetizing individual devs, because we believed the bulk of our economic opportunity was in enterprise sales. Our investors agree, and indeed a large reason we took venture capital was so we could skip the part where we had to worry about charging individual devs and go straight to selling to companies. From our business's perspective, each individual dev that uses Cody for free is a potential opportunity to demonstrate the value of Sourcegraph to a company that's willing to pay for it.

So there is this virtuous cycle of improving the lives of individual developers, letting devs contribute back to an open-source tool they use daily, and generating more enterprise sales, which can then be fed back into improving the lives of individual developers.

There is one other big reason why we've open sourced Cody. I've saved this one for last, because if I mentioned it first, you might think I was being disingenuous. But now that I've explained why open sourcing Cody is both good for our users' self-interest and our own, I'd like to talk a bit about a more general interest. This is to say, open-sourcing Cody feels like the right thing to do. Cody's magic arises from combining Sourcegraph's code graph (the "source graph", if you will—see what we did there?) with the power of Large Language Models. Large Language Models owe a significant debt to open source code, and it's a debt that is deeper than it seems at first glance. You see, it's not just that LLMs used for code generation were trained on code. There's actually a growing body of evidence that shows the emergent ability of LLMs to reason (the so-called "chain of thought" ability) arises only when LLMs are trained on huge amounts of code, not just natural language. Natural language training data provides the ability to sound human, but it is the programming language training data that provides LLMs with the ability to be logical. So the apparent intelligence and quasi-sentience of state-of-the-art LLMs like GPT-4 and Claude is actually an encapsulation of the collective wisdom of the open-source universe. Crazy, right?

Now listen, I get why OpenAI, Anthropic, and others haven't open-sourced these models. ChatGPT isn't just the product of open source—it's also the product of many millions of dollars of GPU compute cycles and hiring the best ML engineers in the world, and I respect their need to recoup that cost and generate a return on their investment. Sourcegraph is more than happy to pay money to those that spent the money to train these models. But for our part, it feels right that the AI coding sidekick that everyone ends up using should contribute its own source code back to the amazing ecosystem from whence its reasoning abilities sprang.

Anyway, that's why Cody is now open source. We're making lots of improvements and we've only just begun to scratch the surface of potential. Try it out (community, enterprise) and let us know what you think—the best AI coding assistant available today is now open source—help us keep it that way with your feedback, pull requests, and word-of-mouth support!

Get Cody, the AI coding assistant

Cody makes it easy to write, fix, and maintain code.