Creating one CI to rule them all, with Fedor Korotkov, founder and CTO of Cirrus Labs

Why can’t one CI scale alongside a company–from startup to enterprise? In this episode, Fedor Korotkov, founder and CTO of Cirrus Labs, joins Beyang Liu, co-founder and CTO of Sourcegraph, to talk about how, as a student back in 2009, he developed a photo app that earned him almost $2,000 a month, share the time he applied to be an intern at Twitter but ended up with a full-time job, and explain how six months of “funemployment” led to the building and founding of Cirrus CI–the one CI to rule them all. Along the way, Fedor explains how Cirrus CI, with Kubernetes, can spin up a new container in two seconds.

Click the audio player below to listen, or click here to watch the video.

Show Notes

Cirrus CI: https://cirrus-ci.org/

Introducing Cirrus CI: https://medium.com/cirruslabs/introducing-cirrus-ci-a75cd1f49af0

Introducing Cirrus Terminal: a simple way to get SSH-like access to your tasks: https://medium.com/cirruslabs/introducing-cirrus-terminal-a-simple-way-to-get-ssh-like-access-to-your-tasks-1def0449065d

Intellij: https://www.jetbrains.com/idea/

Pants: https://www.pantsbuild.org/

TravisCI: https://travis-ci.org/

Jenkins: https://www.jenkins.io/

FreeBSD: https://www.freebsd.org/

GitHub Actions: https://github.com/features/actions

Flutter: https://flutter.dev/

Transcript

Beyang Liu:

All right, everyone. Welcome back to another edition of the Sourcegraph Podcast. Today, I'm here with Fedor Korotkov, the founder and creator of Cirrus CI. Cirrus is a new—I shouldn't say new; it's been around for a couple years now, but a kind of newish compared to the other offerings in the market—continuous integration service that is growing quite rapidly. I think it's still kind of a best-kept secret among the developer community at large. It's not as large as some of the other older CI services, but it has around … Fedor, was it 70,000 users or so?

Fedor Korotkov:

It's used by … it's installed in like 75,000 GitHub repositories, and we run millions of CI tasks a month. So it's not as big as GitHub Actions and other CIs that are very popular, but we're slowly but steadily growing towards that.

Beyang Liu:

Got it. So, if you're listening to this, 70,000 repositories have Cirrus installed, and I feel like it's up and coming. And one of the reasons I wanted to have Fedor on the show is because I think Cirrus has made some interesting architectural decisions that make it especially appealing to a wide range of applications.

We're going to get into all of this later in the show, as well as … Fedor has a background prior to starting Cirrus. He worked on dev tools at a variety of great companies: JetBrains, Twitter, Airbnb. So he's coming at this problem from a wealth of experience. But, like I said, all that is to come.

Before we get into that, I want to go way back, Fedor, to your beginnings as a programmer. So, what was it in the beginning, when you first got your hands on that first programming language or computer, what was it that got you into the world of code?

Fedor Korotkov:

Oh, that's a fun story actually. I grew up in Russia and I grew up in a family. So my father is mostly like physics. So he is working with different appliances with … he worked—he's now retired. And my mom was working on maps that are used in … like military maps, special ones.

In the beginning I was really into physics. So I was like, "Yeah, physics is great. Math is great." And by probably 10th grade—so it was two years before university—I was like, "You know, physics doesn't make much money. So for my future family..." And I was like, "Yeah, maybe I should try something else."

One of the things was computers and computer programming. So I went to regular courses for basically … it's not programming courses; it was like … I cannot say it's like an introduction into computing, but basically we started with Excel and programming, with … what was it? The basic script in Excel to do fancy graphs. And there was a little bit of C++ and so on. And I really got into it, and I ended up getting into university on the major, I think it will be translated as “applicable basic computer science” … like applicable math.

Beyang Liu:

Applied math?

Fedor Korotkov:

Yeah, applied math. And yeah, it has been … I was very lucky to get into one of the best universities in Russia, in computing. We had very good professors. They all … if you’ve ever heard of competitive programming among universities, you might have heard of AMC … ACM?

Beyang Liu:

ACM, yeah.

Fedor Korotkov:

Yeah. AMC is like …

Beyang Liu:

It's the meme stock.

Fedor Korotkov:

Yeah, yeah. Meme stock, definitely. So yeah … our university, the ITMO University, won the most medals. And I was lucky enough to have the same professors that taught these teams to teach me all the fancy computer science. I cannot say that it was useful during work, but it was definitely useful as a brain exercise. So I developed all of that: like problem-solving skills and that kind of applicable things to work.

Beyang Liu:

Yeah. That's impressive that you got into those competitions. I feel like I tried doing that a bit in college, but then …

Fedor Korotkov:

Oh, no, no. No, no, no. It was too hard for me. It was too fast.

Beyang Liu:

Oh, okay. Same here.

Fedor Korotkov:

Yes. So we had a very interesting process during studying. These competitions, they are usually like five hours and you have eight or ten problems to solve. In our case, we were doing something like, for example, directional graphs. We got the theory, and then for the next two weeks we have like 40 problems to solve. It's the same automated testing infrastructure that is used during the competition. So it's a similar format, but there is no time pressure. You can sit at home, use Google, your notebook, and even discuss with other people.

Of course, there's some fancy infrastructure to see that you are not just copying someone else's solution, but it was a very interesting method to do teaching. But it helped me a lot during all the interviews at Twitter and Airbnb, because I was trained to do these weird problems.

Beyang Liu:

Awesome. Okay. So you had a very theoretical and fundamentals heavy, it sounds like, education. What was the path from there, getting into the professional world?

Fedor Korotkov:

After the first year of university, we had holidays and I was like, "What should I do on holidays? I need to learn something." And I was like, okay, we were studying Java; we were studying C++ … all the concurrency programming, primitives and theory. And I was like, "Mm, I want something easier. Let's do maybe some web work, maybe like HTML and JavaScript." I looked at it and I was like, "Nope, I am not doing this. This is horrible." It was like … what was it? It was like 2008. I was like, "No, that's not what I want."

I ended up learning ActionScript and Flash. So it was still web technology, but the language was a bit more to my taste. I learned that, but, in my opinion, learning something new is always better when you have some target, like you want to do something. So I decided to do some games for a Russian social network. We basically have an analog of Facebook called VK in Russia, and they have also a platform for games. So I did a couple of easy games, simple games, that people started playing. Then I learned Flex. It's a framework for …

Beyang Liu:

I remember Flex.

Fedor Korotkov:

Yeah. I think it was …

Beyang Liu:

It was like released with Adobe Air back in the day too.

Fedor Korotkov:

Yeah, this is the one. And I think it was one of the greatest things ever. They're like creating the UIs in XML with layout and all. It was ahead of the time, that's for sure.

And, yeah, I learned that and as I was learning it, I decided to do a photo editing app because I was really into doing photography. I had a film camera and all of the fancy stuff. So I decided to do an online editor for … I was like, "Okay, I'll do the Photoshop." And then I'm like, "No, that's too hard. That's too hard."

So I ended up just having an online editor with a couple filters. Back in the day, you could have like red eyes from your camera. And there was red eye removal, so you can remove red eyes. And looking at this right now, I think, oh, it was basically like Instagram. You open your photo, and you can apply some filter, some fancy one. And yeah, I should have got a newsfeed or something, like a mobile app. It was before Instagram.

Beyang Liu:

How popular was this app that you made?

Fedor Korotkov:

So it was the first big photo editing app on the VK platform. In the end, it had like 2.5 million installs.

Beyang Liu:

2.5 million?

Fedor Korotkov:

Million, yes. 2.5 million. And there were advertisements as well. As a student, I was earning money on the advertisements like this. It was crazy times. It was like Wild West before all the big companies. So I was just a student, like second course, created an app. It was earning me $1,000 to $2,000 a month back in Russia in 2009. I'm like, "I got this." But then at some point I realized that my code was just shitty. I don't know if you can say shitty on the podcast?

Beyang Liu:

Yeah, you can.

Fedor Korotkov:

But yeah, it was like spaghetti code. There was like no good object, like area. I think I didn't really use version control. But I realized that, thank God. And I was like, "Okay, I need to get some professional experience and go to a company."

So I went to a company nearby the university as a Flex developer. So it's like working, like doing this, like Flash. And at this company I learned about JetBrains because shortly after I joined the company, we started using IntelliJ, and I was like, "Wow, IntelliJ is so great." The tool was great, and I learned more about the company. I learned that it's one of the best companies in St. Petersburg, in my hometown, back in Russia. So I was like, "Yeah, I need to apply for a job there."

So I ended up at JetBrains and then spent there a couple of years. And just right after my master's graduation, I moved to San Francisco to work at Twitter. Yeah, the Twitter story is also very funny.

Beyang Liu:

Yeah. How did you land at Twitter from JetBrains in St. Petersburg?

Fedor Korotkov:

So, again, I don't think it's possible right now, but back in 2000 … what was it? 2012?

Beyang Liu:

Yeah.

Fedor Korotkov:

I learned … no, it was 2011. I learned that you can apply for an internship to work in a Bay Area company, and they will pay for the tickets. And three months you will expend in the Bay Area and work. And I was like, "Yes, I want to do this." I just didn't realize that it's possible that you, as a student in Russia, can go to do an internship in the Bay Area, in Google, Facebook, Twitter, companies that you only see in the news.

So I applied for internships at Google, Facebook and Twitter—the only companies I knew basically and I used. And only Twitter responded. So I did a couple rounds of phone interviews, and then they were like, "Okay, come to an onsite interview." And at that point I realized that they were not interviewing me for the internship position, but they just interviewed me for the full-time position.

Beyang Liu:

What happened? Did they just like mix things up or …

Fedor Korotkov:

I've never asked.

Beyang Liu:

You were so good that they're like, "We’ve got to get this guy."

Fedor Korotkov:

I've never asked. So hopefully, yeah. Hopefully they thought so. But I didn't want to jinx it, so I just didn't ask. I was like, "Yeah, sure. I'll come for an interview. You are paying for the tickets, right?" "Yeah." And, yeah, I came for the interview, and then … not shortly after it—took a little while to get all the visas and stuff—but then I joined Twitter. And after a couple years … it was a very interesting experience.

So, back in Russia, I worked most on the client-side applications. So it was like their Flex, which is an application which you run on the browser. Then JetBrains is like IntelliJ. It's desktop applications—very complex, of course. There is a lot of different things, but it's still not the server side. It's not like distributed systems. It's not eventual consistency. You don't do much load testing of the server. It's a completely different beast.

So, when I joined Twitter, I worked in a couple of teams that were heavily backend concentrated. But one of my probably best experiences there was working on direct messages. So our team was rewriting direct messages that Twitter was using from an old Ruby and MySQL setup to a fancy Scala and key value storage type of thing. I learned a lot about like high load, about these problems where you had several data centers and so on and so forth.

But when I was working, I had a background in developer tools, like IntelliJ. I was obsessed with the … not obsessed, but I was into productivity. So at Twitter, I was teaching new hires how to use IntelliJ effectively at Twitter. I was collaborating with the engineering effectiveness department. And, at some point, I just …

Beyang Liu:

And engineering effectiveness was a developer productivity team inside Twitter?

Fedor Korotkov:

There were so many names. It was developer productivity. Then it was engineering effectiveness. So, yeah, developer productivity, effectiveness, awesomeness team department.

So I joined the Pants team. Pants is a build system that is used by Twitter. I think they moved to Bazel now, but I'm not 100% sure. It's been too many years ago.

And, yeah, I worked on developer tools. I worked on this build system, which was building the huge, humongous Twitter monorepo. And I was also part of working on the IntelliJ integration with it, because of my previous experience.

Beyang Liu:

What did the IntelliJ integration do?

Fedor Korotkov:

So, IntelliJ integration … just that you can open a Pants project in IntelliJ. It will get all the dependencies correctly. It will be like …

Beyang Liu:

Oh, I see. So just the auto builds, that sort of thing?

Fedor Korotkov:

Yeah, like internal intel. It will basically convert Pants' presentation of the project into IntelliJ's representation of the project.

Beyang Liu:

Got it.

Fedor Korotkov:

So you'll get all the fancy completions, refactorings. So you won't have red code. So IntelliJ will know where to get the sources from, and so on and so forth.

I spent two years at Twitter. And then I joined Airbnb, because I wanted to have more impact. And Airbnb, at the moment, they'd just started the developer tools, productivity. I don't remember the name. But, basically, a developer tools team.

Beyang Liu:

Got it.

Fedor Korotkov:

There were like only two people.

Beyang Liu:

Got it.

Fedor Korotkov:

And they were also migrating from Ruby to JVM, to not Scala but Java. And I was like, "Yeah, I have expertise, and I want to help do it the right way, without spending too much time on the mistakes that everyone does." And at Airbnb, when I joined, there were less than 300 engineers. It's still a lot, but compared to Twitter where I think it was almost 2,000 when I left.

Beyang Liu:

Wow.

Fedor Korotkov:

It was a big difference. It was a… spirit of a startup. Because at the developer tools team, we had more projects than we had people.

Beyang Liu:

Wow.

Fedor Korotkov:

And you need to support your internal GitHub enterprise. You need to support your internal deployment system. Then you need to support some other tool. And it was a constant battle between getting time to do more work towards a better future, and the battle with the time that you need to spend to keep things not in complete fire.

Then it was all in San Francisco. At some point during that time, I met my girlfriend, now wife. And we …

Beyang Liu:

Nice.

Fedor Korotkov:

We decided to move to New York, because she lived in New York prior to the Bay Area. And I also spent lot of time in New York, and we liked the atmosphere here. And we moved here. We decided to move. And I decided to take a fun-employment. Because I realized that I don't want to be doing remote work at Airbnb.

Right now, the situation is different. With COVID, now everyone is remote. But back then, Airbnb, as an engineering culture, was very centralized on the West Coast. It was Bay Area, and Portland, and Seattle. And I was definitely thinking that it's not going to work. It's hard. There's a lot of offline meetings, where people are not used to calling in too.

Also, as I just told you, I was basically working all the way from the university. So I never had more than one or two weeks’ vacation. And I decided to do six months of fun-employment, back in the Bay Area, because I didn't want to move to New York and then have six months when I will just go to all the museums, go to all the stuff. I wanted to basically just spend time working on some things that I want to work on.

Beyang Liu:

There's some ideas in the back of your mind that you wanted to work on.

Fedor Korotkov:

Yeah, ideas. And, in the end, I had Cirrus CI. We moved to New York in December, and then January 2nd, I launched Cirrus CI publicly.

Beyang Liu:

Oh, wow. So you did not waste any time at all. I mean, you said it was six months of fun-employment, but it sounds like you dove right into … Okay.

Fedor Korotkov:

Yeah. I mean, yeah, in like three months, I had a prototype.

Beyang Liu:

Nice.

Fedor Korotkov:

And then I spent another three months to add all the security and tools, to make it not an MVP, but a real product that is useful.

Beyang Liu:

What was the motivation for you in wanting to work on a CI system? Were you dissatisfied with the CI systems that you had used and observed in the past? Or was this just you wanted a fun project to sink your teeth into?

Fedor Korotkov:

For sure, that I was definitely unsatisfied with the different CIs. I won't name names. But at Airbnb, our team, we were responsible to support … what was it? Like four different CIs, because of different needs.

Beyang Liu:

Whoa.

Fedor Korotkov:

The mobile team used one CI. The Monorail team—that were deploying and running the humongous Airbnb Monorail application—they were using another CI. And there were two more CIs, I don't know why, but for smaller projects.

And the reason is, when your company grows … at the time, there was no single CI that can grow from a small team to a big engineering …

Beyang Liu:

I see. So you had the small individual … I remember Travis was hugely popular in the open source world, and for side projects. But then if you wanted to build it on a larger project, there was Jenkins, you had to set that up, or that sort of stuff.

Fedor Korotkov:

Yeah. So basically there was no single story. Because you will start as a startup, maybe using Travis or Circle. And then at some point you outgrow the capabilities, and you have enough resources to support or even build your own CI, or customize some existing CI to your needs.

That's why I was on two teams that were doing exactly this. We were customizing, working on making sure that this continuous integration and deployment process at the company was smooth and fast. But there was always migration. You start with something that works for small, then it would probably migrate to something that works for a medium size company. Then, if you're lucky enough to grow the company beyond some point, you just develop your own CI.

Beyang Liu:

So you wanted one CI to rule them all, that you could use when you're a single person or a small team, and scale all the way through into enterprise, vast scale.

Fedor Korotkov:

Right. So there were two motivations. One is exactly this: one CI that can seamlessly work for a team of 4 or a team of 4,000. It doesn't matter. It scales with your company.

There is a great example with Cirrus: there is a company called SmartTrend. When they started using Cirrus three years ago, they were just a handful of people. Now they are like 50 engineers. And they recently sold the company for like $2.8 billion.

Beyang Liu:

Wow. And they used Cirrus throughout? They didn't have ...

Fedor Korotkov:

Yeah, throughout. And they had no issues with scale, and just scaled with their needs.

Beyang Liu:

What gave you the confidence that you'd be able to build such a CI? Because, normally, when people say … when they hear you're starting a project, the number one important thing is focus. You’ve got to focus on a specific problem. You’ve got to focus on a specific market segment.

And so if I were following the conventional wisdom, I would say, "Fedor, if you're going to build this CI from the ground up, you don't want to tackle both the large enterprise and the individual use case at the same time. You've got to pick one and make it work well for that first." But here you're saying, "No, I wanted something that could work for everyone." What gave you the conviction that you had an idea that would tackle the entire market?

Fedor Korotkov:

I think it comes to my … when I think about a problem, I don't think about necessarily how to solve it, but I think about what is the ideal solution for this problem? And, at that time, I was thinking, "What is the ideal solution for … what is an ideal CI that I want to see?"

And, well, first thing was it should scale from a small company, from a handful of people, to thousands. Then I was like, "It should also be … I cannot say cloud native, but it definitely should utilize all the fancy cloud technologies that were available already at the moment."

Because the architecture of traditional CIs that people were using, like Jenkins, hasn't changed much in probably 25 years. There are, of course, new things, like Jenkins AKS, that is more modern, but it's a way to stay relevant kind of a thing. And it's better than the old Jenkins, that's for sure. Then, from this standpoint, I was like, "Okay, if I want it to have this and this, how should it look?"

About the confidence that I can build it, I don't know. I didn't have the confidence. I just had six months to try things. And it appeared to somewhat work out. Then the next six months was just working on the feedback from the users. And after the first six months, there were the first paying customers. And, after that, it's just steadily grown, and been like that for three years now.

Beyang Liu:

That's so interesting. The more I see, the more I start to believe that oftentimes the best way to solve a problem … it doesn't take a lot of strategic thinking or anything like that. All it takes is a single bright brain, and time and space to just think deeply about a problem, and iterate on it for a while, and get that solution to the liking of that single individual. And then, oftentimes, that yields a product, that it turns out a lot of other people like as well.

Fedor Korotkov:

Oh, that's for sure. I mean, my best ideas about CI were usually during showering or during driving. When I started working on Cirrus, I was thinking about it 24/7. I'm glad my wife understands it. So when we're in public, and I sometimes zone out and think about something, she's like, "Come on, stop it."

Just at some point, I can be sitting at the park, and then I'm just like, "Oof," and then I'm freezing. And then I'm like, "Oh, oh. Yeah, yeah." And then… so I'm analyzing. Yeah, that's a little bit weird.

Beyang Liu:

That is so true.

Fedor Korotkov:

Yeah. But then you end up with great ideas. And hopefully it's somewhat popular. I'm always very modest about saying Cirrus is popular, Cirrus is the best and so on. But it's somewhat popular, so people really enjoy it. There is really good retention. So people, when they start using it, they don't stop using it. This is the best compliment that you can do, to just continue using it.

Beyang Liu:

That's awesome. So I want to talk a bit more about the—you touched upon this a bit—about the different approach that Cirrus takes to CI, and specifically the design decisions and the architecture. Can you talk a bit about, from your point of view, what has been the evolution of CI systems? And what is the "old way" or the conventional way of doing things? And what is the way that Cirrus is approaching it that's different?

Fedor Korotkov:

Yeah, sure. I have a really good blog post about it. I think maybe we can include it in the show notes to the podcast.

Beyang Liu:

Cool. Yeah.

Fedor Korotkov:

But, essentially, I'm not that old. So when the CI system started, I only did historical research. So my research ended up suggesting that the first CI system was basically introduced back at Microsoft, for either Windows 95 or Windows 97. And so it was back in the '90s.

Beyang Liu:

And they forgot to include it for Windows ME, which had a lot of bugs.

Fedor Korotkov:

Probably. Yeah. They just outgrew the CI system. And they had scaling issues, as everyone has.

Beyang Liu:

Got it.

Fedor Korotkov:

And if you think about CI, what does it do? You have your source code somewhere. You probably should use version control, so you have your source code somewhere. And there is a system that, when you do a change, automatically runs some scripts. So you can think of it as: back in the '90s, you have your servers. They are waiting for new changes in the repository, and then they just run scripts, as you could run it locally on your machine. But because it's a server, it has more memory, CPU, and so on.

And then if you think about a generic CI system that people are using right now, most likely it's pretty much the same, with the exception that this CI will run the same scripts inside of a Docker container. But the idea is having these so-called workers—machines that wait for you, for new jobs, for new builds that they need to run, and build them.

So this is like a traditional idea. They were basically the only thing that … there are two things that were changed from the '90s. First, people used Docker containers and, second, they are probably—but not all of them—using autoscaling for these workers because these workers are not like physical workers in the data center, but virtual machines.

And sometimes it's not autoscaling. Sometimes it may be some person just changes the number from like 10 to 15, because someone told this person that there is queuing in the CI system. And they're like, "Okay. I just move it from 10 to 15," then this person forgets about it. And you spend on CI one and a half times more than you used to.

This was a big concern for me at Airbnb, because we were running hundreds of the beefiest CI machines for CI. And because it was very West Coast-centric, people were working maybe 40, 50 hours a week and for the rest, these machines were mostly idling.

Beyang Liu:

They didn't spin down. They were just on all …

Fedor Korotkov:

No, no. They didn't spin down. And I proposed we should fix it. We spent—I probably can tell now—we were spending six or seven million dollars a year, only on the machines for the CI, which were only loaded a third of the time. So you can easily … but at the time it was not a concern for management.

So, as I mentioned, the only difference is that it's probably scaling somehow and you run things in Docker. So the main idea behind Cirrus, from the architecture standpoint, was to utilize these cloud services. And if you think about it, this traditional CI worker, the only thing this machine does, is quickly provide you resources to execute your scripts, like in the Docker container or just …

Beyang Liu:

Yeah. So real quick, this worker model. So it was basically created at Microsoft for their original CI system. And you're saying that even in the age of … Sorry, was that?

Fedor Korotkov:

Yeah, yeah. Go ahead. Sorry.

Beyang Liu:

Okay. So the idea is that I have these worker machines that are just always running and then they accept jobs submitted to them where the jobs run—all are part of my testing suite, and that's been the existing model. And it really hasn't changed, even in the age of Docker and Kubernetes, where now we're running things inside Docker containers that may provide a more stable or consistent environment in which to run these things, but it's still the old worker model, where you have these machines waiting to accept requests. You submit the request to that, it spins off a worker and the workers execute the test, right?

Fedor Korotkov:

Yeah, that's right. So if we previously had multiple pools of these workers—for example, if you're a Java person, you can have a pool with Java version 6, and then you have a pool of workers with Java version 10, and then you have a pool of machines with Java 10 and SSDs, for example. And then each team in your company can specify which pool to execute their tasks on.

Then now you will probably have just a single pool of machines with Docker, and then people can specify the Docker image. But in a sense, it's pretty much the same. It's a bit fancier, but the same idea that you have these workers, first of all, that you need to maintain somehow. You need to make sure that there is this CI agent that can run these things. Then you need to make sure that you have the proper version of Docker installed if you are running Docker, and so on and so forth.

Beyang Liu:

And this is painful because, first of all, it's expensive to keep these workers lying around, especially when a lot of the time they're not doing anything. And then, two, if you want a machine with different specs, or you want to test it in a different environment, you have to fit it into one of the existing pools that have been provided for you by, say, the CI team or whoever spun up this infrastructure.

Fedor Korotkov:

That's right. And if you think about some exotic cases where you need, for example, for your CI you need a different version of kernels, like a Linux kernel to test when you need GPUs or SSDs. Then some exotic case that doesn't fit the regular Docker container and run things. Then you want to run FreeBSD, for example, then how to do it? I need to … you either maintain yourself—your team should maintain this pool of FreeBSD machines—or ask someone and wait for things.

So this is very inconvenient when you think about upgrades and how to maintain it and especially about how to do it effectively. And this is a lot of boilerplate, which is … small teams, they just cannot afford to do this. Of course, when they're a big enterprise, a big company, you have expertise in maintaining your own infrastructure. When you start, you use some CI as a service, you use something else as a service, everything as a service.

So the main idea behind Cirrus was not to use these workers at all, but to use … because the worker just finds you available resources to run your scripts. And back in 2017 when I was thinking about this problem, I realized that Google Cloud, at the time, could start a new virtual machine of your own configuration with your own image that you want, in less than a minute. And a minute delay for CI back in the day, was totally acceptable.

Then you think about Kubernetes. Kubernetes is a system … specifically, one of the main responsibilities of Kubernetes is to run your containers effectively. To effectively find a place, put it on a Kubernetes node and run it.

So at the time I realized there was no need to have these workers at all. You can just use the API of, for example, Google Cloud to spin up virtual machines if you need virtual machines, or use Kubernetes API to spin up containers, if your CI only needs containers.

This is the main architecture difference that allowed Cirrus to be the first CI as a service on GitHub to provide Linux, Windows and macOS CI all at once. Because back in 2017, there were no GitHub Actions; Travis ran only on Linux and macOS, the same as Circle, and then there was AppVeyor for Windows.

Because of this architecture, you just changed the image that you want to spin up your virtual machine. So you can use a Linux image, you can use a Windows image and that's all, and it just spins up and works. And because it's on demand, it's as efficient as it could be. There's no more efficient way to run things.

Beyang Liu:

What about startup times, and also caching? Because I could see the argument being made: if you have a long-running worker, it can start up faster than a cold start VM. And then maybe there's a local cache that it keeps around that could help make things more efficient too.

Fedor Korotkov:

Yeah. That's for sure. There is a trade-off for virtual machines. So if you want to run a virtual machine, then there is probably 20 to 40 seconds for the spin-up time, which is mostly acceptable if you need a VM.

For containers, the situation is different. Containers you can use Kubernetes, and Kubernetes, in our testing, can spin up a new container in 2 seconds, if all the things are aligned together. But even if not, I constantly measure P99 for the startup time of the containers and on Cirrus it's 7 seconds. So 99 tasks of 100 will start in less than 7 seconds.

Beyang Liu:

Wow.

Fedor Korotkov:

And there is no work from our side. We just utilize Kubernetes. And, in our case, we utilize GKE— Google's Kubernetes, but Cirrus also supports AWS, CKS, Amazon Kubernetes, and even Oracle Cloud Kubernetes. So you can connect your own infrastructure and run your CI, on your own cluster within your own infrastructure, which is also super beneficial. You can get all the discounts you have from the cloud provider. You have more control over configurations, like some like edge cases and so on.

Beyang Liu:

Yeah. It sounds like a super versatile approach. Because essentially any computing environment that you can spin up via a cloud API—and there's so many offerings nowadays—basically, you can use as a CI or a testing environment via Cirrus.

Fedor Korotkov:

For sure. So for example, back in the day, like FreeBSD, people came and told me, like, "Whoa, we would like to have FreeBSD CI: and if you can maintain your Google Cloud images for FreeBSD, and Cirrus can use it.”

And over a couple of days we added FreeBSD support and Cirrus is the one that provides it. And it literally took us to re-compile some Go binary—so cross compiles like FreeBSD—and just tested that everything works on FreeBSD. It was a huge win and all because of this architectural approach in the very beginning.

Beyang Liu:

That's awesome.

Now, how does it compare to something like GitHub Actions, which … you mentioned Cirrus predates GitHub Actions. GitHub Actions came out of the market and it feels like with Actions they've tried to, to a certain extent, make it feel like it's not running in a worker somewhere. I think what GitHub is trying to go for is it's just this thing that runs in response to a pull request that you open and you don't have to worry too much about where it's running. You just get the readout. Has that changed the positioning of Cirrus at all? Or the differentiation of Cirrus with respect to the rest of the market?

Fedor Korotkov:

I don't think … of course GitHub Actions right now is the major CI on the GitHub platform. That's for sure. There were actually two versions of the GitHub Actions. There were GitHub Actions before the Microsoft acquisition and GitHub Actions after. So I was very worried about the GitHub Actions prior to the Microsoft acquisition—back then there was Actions better, that I was part of. But then the acquisition happened and basically GitHub Actions right now is just a fork of Azure pipelines.

Beyang Liu:

Interesting.

Fedor Korotkov:

Yeah. So ...

Beyang Liu:

So we kept the brand, but it's completely different from the old Actions?

Fedor Korotkov:

In the very beginning, if you name the dash, you will see this… it's like “Azure blah, blah, blah, kernel.” So it's like the Azure pipeline.

So GitHub Actions, they are great. And they perfectly fit the needs of 99% of the people. But there are always some caveats and one of the caveats of GitHub Actions, and there is a similar approach with Travis, that in GitHub Actions, you have a standard VM that is used for GitHub Actions. So there is a two-core-four-gig-of-RAM VM built by GitHub and with certain packages that you have. And then you have GitHub Actions which are written in JavaScript or TypeScript—you can choose your own. You can do Docker containers as well, but most of them are written in TypeScript and they run on this VM. They have a similar approach because this VM … they have a pool of VMs. Then they get the VM from the pool, execute the build and kill it.

But first of all, they are one size. I think they're introducing more sizes. So more than two CPUs, then there is a niche. But my main problem with the—it's not only GitHub Actions, but these kind of CIs—is that they have a base virtual machine that people use that can change. And it's the 21st century: we're always talking about infrastructure as code, reproducible infrastructure and so on and so forth. And I think many people were in a situation that they had some side project, or some project at work that they have, that they haven't touched in a couple of months. They come, they do some pull request, and it just fails. And you just don't know why. It just randomly fails because some third-party package got updated.

Beyang Liu:

In theory, it shouldn't fail because it's all containerized and blah, blah, blah. But if you're changing out the underlying thing, at the end of the day, it's the abstraction boundary's not perfect. Right.

Fedor Korotkov:

It's not perfect. Basically, you can think about that you have a moving base. You just moved away. I had so many issues when I worked at Twitter. We were working on the build system and there were so many issues with reproducibility of this compilation. And I got fed up with all these issues.

So that's why I was like, "No, it should be as isolated as possible." This notion of reproducible builds and a build system. It should be also internally: your builds should be reproducible, your CI environment should be reproducible. You should be able easily to go back a couple months, even years ago and run CI and it should be executed the same way as it was before, not with some random failures.

Of course, Docker containers, they help with it. But if you go through, for example, GitHub Actions documentation, they don't use—by default—they don't use containers even though you can.

Beyang Liu:

Got it.

Fedor Korotkov:

Then you can have these weird problems with underlying things changing.

Beyang Liu:

So despite GitHub Actions being right there and free to use for most open source projects, and Cirrus, by the way, is also free to use for open source, right?

Fedor Korotkov:

Yes.

Beyang Liu:

Okay. So, despite GitHub Actions just being almost embedded in GitHub at this point, there's still 70,000 repositories that have installed Cirrus. They're using Cirrus as their build system.

Fedor Korotkov:

Yes.

Beyang Liu:

Okay.

Fedor Korotkov:

Yep. And it's actively growing. It's active enough. Also, the idea in the beginning was to be very efficient. So Cirrus is using all the fancy spot instances, like preemptible VMs, to be as efficient as possible. It's also… we're a very small team, so we don't have a marketing team. We don't spend on advertisements. It's all organic growth. We're trying to be as tight as possible, in terms of money.

Beyang Liu:

Yeah. That's incredible. Can you talk more about that 70,000 user set? What are some of the more interesting projects or companies that are using you today?

Fedor Korotkov:

So, first of all, there is a variety of use cases. So this 70,000 is not on the open source, but a vast majority of it is open source. But there are teams. As we were discussing, Cirrus is very versatile, so some people … with Cirrus, you can use it for free for open source projects. Then you can use it with the per-second billing on infrastructure, that we manage for your private repositories, as a company.

For your personal, private projects, we have a very affordable plan of $10 a month, to use it for your personal, private projects. And then the ultimate thing to do is to bring your own infrastructure. So there you can connect your cloud. You can also connect any hardware that you have to Cirrus, to run CI on it.

And we have different companies that use it for different use cases. For example, one of the first breakthroughs was Flutter. It's Google's mobile framework for writing mobile applications at the time.

Beyang Liu:

The Flutter team at Google?

Fedor Korotkov:

Yeah. The Flutter team.

Beyang Liu:

The entire team?

Fedor Korotkov:

Yeah, the entire team.

Beyang Liu:

Wow. That's a huge project, isn't it?

Fedor Korotkov:

Yes. There are hundreds of engineers, I think, right now. And they exactly started as I described. They started just using it for free, for their running open source. Then they outgrew the free limits. So they started paying for using infrastructure managed by us. Now, they're probably hundreds of engineers. So they now have dedicated engineers working on CI infrastructure. So they're now running on their own Google Cloud part. And then this is one of the first breakthroughs.

Then, I mentioned Smart RAM. I'm super excited for them. I remember when they just started, they had some issues. And I personally was helping them to navigate and improve their CI scripts. I also improved Cirrus for future users that recently got acquired.

Then there is also an interesting recent use case with PostgreSQL. So, for them, they needed to test a specific version of the Linux kernel. And you cannot do it on GitHub Actions, or other CI as a service. Because, first of all, sometimes they just want to do some patch to the kernel then build it, and run their own integration tests. So they're utilizing Cirrus to run it. So they basically build their own VM images, and run their tests on these images.

And then I mentioned FreeBSD. So a lot of Rust projects are using Cirrus to test on FreeBSD. Then Notable. Who else? Red Hat is using it as well. So they're using …

Beyang Liu:

Red Hat?

Fedor Korotkov:

Yes. Red Hat. They're using it also to run … so Red Hat has Podman. It's an alternative for running OCI images in a rootless way. So it's a bit of a fancier thing than Docker to run things, because you don't need root access. And they started using Cirrus a while back, because of a similar reason.

They wanted to first test Podman on specific versions of operating systems, with a specific kernel version. Because it's highly integrated and they just enjoyed it. And I'm really glad that they provided a lot of good feedback about their use case for running CI.

Who else? There is SonarSource. SonarSource is a … you should be probably familiar; it's …

Beyang Liu:

Yeah, yeah, super familiar. So these are all great companies and logos. It's pretty incredible that you have such a diverse set of teams using Cirrus. Can you talk a little bit about what's coming up in the Cirrus roadmap? Anything exciting that you're able to share?

Fedor Korotkov:

Yeah. I don't want to jinx it too much, but there is definitely coming … we are pretty late to the game, but we are launching capability to basically SSH into the running instances. Because Cirrus supports that many environments, so you can have Azure, AWS, Google Cloud, or Oracle Cloud. You can have your own hardware under firewall. We had to be a little bit tricky about this terminal support.

And, yeah, it's coming in the next few days. There will be a blog post we can probably also include. And we implemented it as an open-source Go package. So basically this package will allow anyone to integrate SSH access to their Go application. So, yeah, we can include it.

And one other thing, I think after Apple released their Apple Silicon …

Beyang Liu:

M1?

Fedor Korotkov:

Nowadays, everyone’s like, "We need to have ARM. We need to have ARM." And there were a lot of requests about ARM instances. And there is definitely something coming very soon.

Beyang Liu:

Cool.

Fedor Korotkov:

So I need to mention that Cirrus already supports ARM. So you can run on your own hardware or your own cloud. So you can have Oracle Cloud running ARM or AWS running ARM.

And we are very soon introducing managed instances. It will be, of course, free for open source. And open source will be able to make sure that their projects are very optimized for ARM processors.

Beyang Liu:

Cool. Would it be safe to say that the two biggest benefits of Cirrus are … there's kind of two, and they're separate, at least in my mind. One is just developer experience, the fact that all this is configured by hitting cloud APIs. And you don't have to manage a bunch of infrastructure, in order to stand up CI and scale it as your team grows. So developer experience, one.

And then, two, it's just the versatility of compute environments that you're able to support, because of this model. Is that a good encapsulation of the core value prop?

Fedor Korotkov:

You're on point. You're on point. Yeah, that's pretty much the two biggest things that I'm very proud of. And maybe one other thing I need to mention: we recently developed a tool called Cirrus CLI, which is, basically, it's open source. It doesn't know anything about Cirrus CI as a cloud thing. And it can locally run your Cirrus CI.

So even if people will decide to move from Cirrus, there is no need to convert existing Cirrus configuration into some other CI system, because Cirrus CLI can run locally.

Beyang Liu:

Oh, wow. So I can use this to run my build locally?

Fedor Korotkov:

Yes.

Beyang Liu:

I feel like that's useful even if I'm not concerned about the switching thing. Because there's so many times where the thing failed in the build environment, but when I run the test locally, it seems to succeed. Now you're saying you could use Cirrus CLI to reproduce the CI environment on your local machine.

Fedor Korotkov:

Yeah. Only Docker images, and this …

Beyang Liu:

Yeah, yeah, of course.

Fedor Korotkov:

… and the most popular instances that Cirrus runs. That's why we were very late with SSH access, because you always could use Cirrus CLI to run your CI locally to reproduce things. And 99.9% of the cases, you will be able to reproduce it.

But then there is, of course, problems when you don't have, for example, FreeBSD access, or you don't have macOS. And your tests are only failing on one particular virtual machine, for example. Then it's more convenient to just rerun things in the browser you have. With the terminal access in the browser, you can figure things out and then fix it.

Beyang Liu:

Cool. Anything else that you want to call out? We're coming up on the end of the hour here. So are there any final calls to action, or things you want to highlight about Cirrus, for people listening?

Fedor Korotkov:

No, I was just really glad to join you on the podcast, and tell you and all the listeners more about Cirrus. I'm very excited. I'm biased, of course. But I think there were very interesting design choices, and just overall architectural decisions we made to make Cirrus very unique. And it will be great if some people will check it out. Maybe they will share my excitement about it.

Beyang Liu:

If they want to try it out, maybe install it on one of their open-source repositories, how do they go about doing that?

Fedor Korotkov:

They can just Google Cirrus CI, and it's as easy as installing a Cirrus CI application from GitHub marketplace. And we have a very nice integration with GitHub. We support annotation. So all the failing things will be highlighted right on the PR.

We support fancy GitHub Checks API, where you … in most cases, you don't even need to go to Cirrus UI, because everything is available on your pull request, or UI inside GitHub. So, yeah, it's super easy. And then we have a lot of examples of how to use, and we support, as we discussed, a lot of use cases and scenarios to test things.

Beyang Liu:

Awesome. Well, thanks so much, Fedor, for taking the time to be on the show. Real pleasure chatting with you.

Fedor Korotkov:

Thank you, Beyang.

This transcript has been lightly edited for clarity and readability.

Creating one CI to rule them all, with Fedor Korotkov, founder and CTO of Cirrus Labs

Show Notes

Transcript

Start using Sourcegraph on your own code