strings: the Sourcegraph blog

A collection of characters, stories, and other elements

How we analyzed hundreds of repositories to ensure they had open source licenses

Justin Dorfman

We’re excited to announce that our Open Source Program Office (OSPO) is now up and running! One of my first tasks was making sure our open source software is in compliance with the Open Source Definition (OSD).

As of June 2022, we have 428 public repositories, but 75 (over 17%) active repos were missing a license.

This wasn’t a great state of affairs, as other devs may need to know the license before using/incorporating our software into their workflows.

License not Found

I kept seeing “license not specified” which alarmed me.

I started to review the repo list manually and realized “Hey, wait a minute, I think we have a tool that does this!” What a perfect opportunity to use our Batch Changes feature.

To add the missing licenses to all the repos, I created a YAML file with the following:

 

The magic starts on line 5, which uses our code search engine to find our Github repos without a file named “LICENSE”.

  - repositoriesMatchingQuery: repo:^github\.com/sourcegraph/ -repohasfile:^LICENSE

From there, line 8 copies the Apache license file.

  - run: cat /tmp/apache2.txt > LICENSE

Once it’s in the repo, lines 187-192 commit the change to the “add-license” branch, with a prefilled commit message.

changesetTemplate:
  title: Add Apache 2.0 License
  body: This is a batch change to make this open source project OSI/OSD compliant.
  branch: add-license # Push the commit to this branch.
  commit:
    message: Add Apache 2.0 License

It’s a few minutes of work to create and test the script, but it saved me hours of manual effort.

Demo

There’s two ways to use Batch Changes:

  1. Use src (our CLI tool)
  2. Use our web app

For this demo, we’ll use the web app.

After adding the YAML file to the Batch spec, you can kick off a Workspace preview and see the repos that match the query (e.g., those without a LICENSE file):

Batch Spec

Batch spec / Workspaces preview

We can click “Run Batch Spec”, which takes us to the execution step.

Run Batch Spec

This shows you the change that will be made to each repo. When we click a specific entry, we’ll see its Changeset template:

Changeset template

The last step is kicking off the automated Pull Requests. What’s convenient is that you don’t need to run all the Changesets at once – you can pick and choose.

For example, if you had 130 repos to change, but what if needed to prioritize the LSIF repos? You’d simply search “lsif”, select the repos you wanted, and click “Publish changesets”.

Publish changesets

Select and Publish changesets

Publish changesets

Automated Pull Requests Statuses for selected LSIF repos

Once the Changesets run, your inbox will start blowing up with the updates.

Publish changesets

Success!

Conclusion

This is just one of many things you can do with Batch Changes – check out https://github.com/sourcegraph/batch-change-examples for more examples. Over time, we’ll highlight popular use cases that have been useful.

If you have any questions, I’ll be hanging out in our Discord.


Thanks to the following people for helping with this post: Erik Seliger, Daniel Marques, Malo Marrec, Kalid Azad, Marcos Placona, Fabiana Castellanos, and Tammy Zhu.

About the author

Justin Dorfman is Sourcegraph’s Open Source Program Manager and is responsible for fostering the adoption of code intelligence in the open source community. You can chat with Justin on Twitter @jdorfman or our community Discord

Try Sourcegraph for free today

You'll be searching your own code in 10 minutes. You can run it self-hosted (all of your code stays local and secure).