Episode 1: David Cramer, creator of Sentry

David Cramer talks about creating Sentry as an open-source side project, maintaining it while working full-time at Dropbox, and ultimately growing it into today's leading application monitoring tool. We chat about the emergence of new computing platforms, his thoughts on what's truly new and what's just marketing-speak for old ideas, and how he sees the landscape of monitoring evolving in the future.

Show Notes

Follow David on Twitter: @zeeg

Sentry: https://sentry.io/, https://twitter.com/getsentry

IRC: https://en.wikipedia.org/wiki/Internet_Relay_Chat

mIRC: https://www.mirc.com/

Logging tools: Splunk (https://www.splunk.com), Kibana (https://www.elastic.co/kibana) from Elastic (https://www.elastic.co/)

Datadog: https://www.datadoghq.com

Application Performance Monitoring (APM): https://www.gartner.com/reviews/market/application-performance-monitoring

Software Development Lifecycle (SDLC): https://www.tutorialspoint.com/sdlc/sdlc_overview.htm

Facebook's Scuba (internal performance monitoring tool): https://research.fb.com/wp-content/uploads/2016/11/scuba-diving-into-data-at-facebook.pdf, https://www.facebook.com/notes/facebook-engineering/under-the-hood-data-diving-with-scuba/10150599692628920/

Observability: https://thenewstack.io/observability-a-3-year-retrospective/

OpenTracing: https://opentracing.io/

OpenTelemetry: https://opentelemetry.io/

Distributed tracing: https://opentracing.io/docs/overview/what-is-tracing/

BDFL: https://en.wikipedia.org/wiki/Benevolent_dictator_for_life

Kubernetes: https://kubernetes.io/

Chef: https://www.chef.io/

Puppet: https://puppet.com/

BSD license: https://opensource.org/licenses/BSD-2-Clause

BSL (Business Source License): https://mariadb.com/bsl11/

Switching Sentry from BSD to BSL: https://blog.sentry.io/2019/11/06/relicensing-sentry

Django web framework: https://www.djangoproject.com/

django-db-log: https://github.com/dcramer/django-db-log

Monoliths, Microservices, Service-Oriented Architecture (SOA), Serverless: https://rubygarage.org/blog/monolith-soa-microservices-serverless

"Test in production" vs. "Don't test in production": https://opensource.com/article/17/8/testing-production

Blue-green deployment: https://martinfowler.com/bliki/BlueGreenDeployment.html

Five Whys: https://en.wikipedia.org/wiki/Five_whys

GitHub Actions: https://github.com/features/actions

Transcript

This transcript was generated using auto-transcription software and the source can be edited here.

Beyang Liu: Hello, and welcome to the Sourcegraph podcast. Today I'm talking with David Cramer, the creator of Sentry, a popular application error monitoring tool used by companies like Microsoft, Atlassian, Dropbox, and more. David tells me about how Sentry began as just 70 lines of code he wrote to answer a question in an IRC channel, how it caught on and has evolved with the emergence of [00:01:00] new computing platforms and paradigms and how Sentry thinks about product marketing and its longterm vision today. It's the story of how an open source side project became one of today's most widely used tools for tracking and fixing errors in production. Stay tuned.

Alright, I'm here with David Cramer, former engineer at Dropbox, and now founder and CTO of Sentry, an open source tool for monitoring and resolving errors in production. David, welcome to the show.

David Cramer: Hey, thanks for having me, happy to be here today.

Beyang: So to kinda kick things off, what's your quick life story as an engineer, starting with how you first got into programming?

David: So I'm very much the stereotypical self-taught. I think the first thing that I really did was ... there used to be this thing, and I say used to be because it's not really well known these days called, uh, IRC. And specifically there's this Windows client for it called mIRC, which had the most terrible scripting language you can possibly imagine. And so I just started writing scripts while I'm [00:02:00] chatting on the internet. Uh, I don't even remember what half of them did these days. The other half is probably not stuff you should, you should do as a professional, I would say. But I don't know. It kind of like, it got me interested in, I could build things, right? And then sort of in tandem, I was, I was a big gamer at that age. I think there's a lot of kids and I found that you could, you know ... There rather, I found you could engage a lot of folks by sort of building gaming. Like, you could basically take your hobby and merge it with this other hobby. And like, what I did was I ended up building like gaming databases. So I think the first significant one I did was for World of Warcraft, but they said like, I data mined a lot of the code or the, the details of the game and publish them on the internet. And this is right when WoW was coming out and getting popular and stuff. And I think that was like the first real endeavor I had where I'm like, "Oh wow, there's actually something serious here that I can do that's actually really interesting and fun versus I just failed this silly script or something." Um, and then, you know, you kinda just continue that path and one thing leads to another.

Beyang: H- how did [00:03:00] you, uh, end up at Dropbox?

David: So, I'm from the Midwest originally, from Nebraska. And when I started doing this gaming stuff, which was when I was like 19 or something, I actually ... I took a job, I don't know if it was legally allowed to be a job, but I took a job with a company that was in like, they were ... it was a bunch of kids, we were all the same age living in Germany. And we were basically building one of these World of Warcraft websites, one of these databases and a company's called Curse, it's now part of Twitch, which I guess is part of what? Amazon, I don't know. It's just snake eating the other, but, uh ... And that actually ... so that took me to Germany for a little bit. I worked with that company a couple years and there's only like five of us at the time. And then it took me to San Francisco. And so I did that. Eventually, left that company, went to work with one of my buddies, Well, at San Francisco realized life was not nearly as interesting outside of SF, especially if you're really into software at the time. And, you know, I came back, worked for another tech company called Disqus, which, you know, isn't really around much these days, but it's, [00:04:00] you know, big internet comments provider, and did that for a while. Eventually, decided ... And that's also where I started Sentry formally, and then decided it was time for something else. And I kind of limited my options of the kind of job I wanted and it was based on what I enjoyed doing. And that was like, I enjoyed doing things that have like high reach, so kinda interesting scale problems. And I just wanted to work with Python, the programming language. And so, I was basically like, "Well, I can join Instagram or I can join Dropbox." And one of them requires me to commute to South Bay. So, so Dropbox [laughs] was a good choice for that.

Beyang: Awesome, awesome. And, you know, I wanna dive into kind of the backstory of Sentry a little bit, but before, uh, we get into that, I wanna first describe, uh, to our audience what Sentry is. So, you know, if you're describing Sentry to an engineer who's never heard of it before, uh, what is it?

David: So I probably have an infinite number of descriptors for it. Like, the way I usually try to reason about it is it's kind of a way to take it away from engineering first off. [00:05:00] But you load up the Uber app, it crashes, what do you do as a, as a user? Well, personally I would just love Lyft 'cause I have no brand loyalty. It doesn't really matter. And that's actually pretty common, right? And so when you take that in mind, that's the problem we're solving for that's the reality of the world, right? And so we're like, "Okay, like the advantage of a company in that situation is to be able to respond to those customers, like issues very, very quickly." So what we do is we collect all that information about that crash or other application errors, and we send that to the engineering team that is actually responsible for that application. So it really helps them diagnose and, and hopefully fix errors very quickly. I, I think the classic they know is, if somebody knows what Sentry is and they experienced an error in the world, they're just like calling out the company, it's like, "Hey, are you using Sentry?" Like, "Why aren't you fixing your bugs?" Kind of thing. So I think it implicitly suggests that it's a proactive way for software teams to actually, like, resolve their issues, which is true, but that's still up to the software team at the end of the day.

Beyang: Yeah, makes sense. So w- when you say, you know, the, you know, the Uber app, if they're [00:06:00] using, uh, Sentry, they'll kinda send the errors over and their development team is able to kinda d- dig into those. Does that require kind of instrumenting the application with, uh, some sort of Sentry clients side agent? Or does it do smart things, uh, to kinda automatically figure out what to do?

David: Yeah. So we really try for a frictionless process. So I ... I'm the type of builder that is very pragmatic. I think it's interesting to solve interesting problems, just like anybody, but frankly, I'm super lazy and I just wanna like, get to like the cleanest, fastest solution I can. And, and that's the same for like when we build something for our customers. So it's like for Sentry, it's very much like you install our dependency, you tell us the API key, which we give you. And then most of the time it works from there. There's a lot of advanced stuff you can do that requires you to put in effort. But for the most part, 80% of the value is just out of the box, you don't have to do anything. That's actually been one of our big selling points for a very long time. And that, that's kind of the same for every device. It gets a little bit trickier, you know, if you're, say, writing compiled code or something, that's seriously complex, you have to set up some [00:07:00] of the processes. But for the most part, it works the same.

Beyang: Cool. I, I feel like laziness is almost like, uh, a key developer value. It's, it's good to be lazy in a lot of cases.

David: I think it's especially good if you're building productivity, or tooling, or you're sort of a product owner, because-

Beyang: Yeah.

David: ... because it's gonna hopefully get you to like the, the best solution you can with the least amount of cost.

Beyang: Yeah. Just, just the essentials.

David: Yeah.

Beyang: Cool. And kinda t- taking a step back, you know, nowadays there's, there's a whole kinda landscape of production monitoring tools, you know, things that help people diagnose errors in deployment environments. Can you kinda give us your view of that landscape and where Sentry fits in all that?

David: Yup. So I think there's a lot of ways to slice it, but the simplest way is there's sort of historical solutions. I bucket them into a few categories. The first would be logging. So you can think of that, like a Splunk or a Cabana-

Beyang: Mm-hmm [affirmative].

David: ... but pretty well known to folks. Uh, second would be sort of this monitoring persona, which is, [00:08:00] you know, graphs at the end of the day, it's usually graphs monitoring your system metrics. That's Datadog would be the market leader. And then the third is what we would describe as an APM vendor, which would be New Relic. And the APM vendors are a little fuzzy 'cause they actually try to do everything these days. But the core of what they started as was, "I will tell you why your application is slow," which is the value that they originally gave you, right? That landscape's kinda changed over the years 'cause like you'll note, I didn't say anything about error monitoring in there. So, you know, once upon a time ... And that kind of all works and people just dealt with the fact that it wasn't ideal. And I think a lot of the changes over the years from how we developed software, sort of forced us to think about like, "Are these tools solving problems?" And then, you know, the error monitoring, I wouldn't call it a category, but they are monitoring functionality the way Sentry provides it, you know, which is quite drastically different than logs. That's become more of a necessity when we've had a lot more front ends, so mobile browser, desktop applications that really need this rich monitoring. And on top of that, like sort the, the breadth of the internet and the depth, I guess, of an individual property has gotten much bigger. [00:09:00] So you end up with a lot more data, a lot more errors, a lot more customers. And so it just gets harder to reason about all the data. So, so kind of like, I think there's this new wave of things. So that, that was the old style, right? APM, all this other stuff. And we sorta think the new wave looks so much similar to the old way, but the difference I would say now is there's application monitoring. Don't get that confused with APM and then there's systems monitoring. And so systems, you can think of it as infrastructure monitoring at the end of the day. It's easiest to say, like, "If I am running the platform layer that is, I'm building the layer that lets you ship your business logic, I need systems monitoring. I'm dealing with the servers, I have CPUs, I've discs, I need to rotate disc. I need to do all that stuff, right? All that makes sense. If I'm building a business layer, the business logic, and especially when you think about serverless and these kind of paradigms, I don't care at all about that platform layer. I only care about it when you force me to care about it, which should be never at the end of the day. And so we think there's a solution that's needed to really focus in on that, to focus in on this idea that you don't control the machine. And so, what's the relevant information to surface? And so that's really what we, what we say we're doing, what we're [00:10:00] trying to do, but like one version of that is error monitoring. And frankly, I would tell you the first version of that is error monitoring because it goes super deep and the way we do it, not only does it go really deep, but it's tightly coupled to the commit graph and the source code, which is really all you can control as a developer. And so we think there's a lot more, um, concerns to address there, but that, that's fundamentally how I see the landscape. And it's like, we really just say, you know, we're building something for application developers, which, you know, that's a little bit of a spiel, like, it's not entirely truthful, of course, but you can kinda get the idea. And the cleanest separation is, if you say, "Well, we're building something for front end developers." Because people historically have thought of them as a, as a different persona-

Beyang: Sure.

David: ... and they are. But, like, so much of our business logic is on the front end these days, you can no longer ignore it. So it's a, it's a way that we center the gravity around to kinda what we think is most important. And frankly, a lot of the, like the infrastructure monitoring tools, they're, they're really mature and they're really, really good-

Beyang: Yeah.

David: ... but they don't solve the rest of the problem. So that's kinda how I see it and it's just those two scopes in the landscape.

Beyang: Yeah. Uh, that also makes sense. The focus on front end developers is interesting because, you [00:11:00] know, when you think about traditionally the types of people who are digging into errors in production, it's like the opposite of those folks, right?

David: Yeah.

Beyang: You know, very oppsy, backendy folks, right?

David: Yep. And I think it's a ... it was like a necessity. So I don't know if you've ever worked at a company like this, but for sure, Dropbox was like this. If you wanted to know about problems in production, you're certainly not getting access to the server to go look at logs, right?

Beyang: [laughs]. Yeah.

David: And frankly, logs are actually pretty sensitive these days, right?

Beyang: Mm-hmm [affirmative].

David: Especially with things like GDPR. So you're probably just not getting access period. And, you know, Dropbox is a little bit ahead of the curve, just like every, you know, giant tech company has been, they had their own sort of Sentry, like, thing internally. Now, it wasn't nearly as good as Sentry. And I'm not saying that just because I built Sentry, but they'll tell you that too. But they knew that they needed something like that, right? And big companies recognize that problem. And so that, that's kind of how we thought about it, ultimately.

Beyang: Yeah, it makes sense. Yeah. As a front end developer going to Sentry, like what, what kind of data am I looking at? Like what does that interface, uh, look like? Is it ... A- am I just kind of [00:12:00] doing like open-ended data exploration on some' like SQL table that I have with like pretty graphs? Or is it, is it something else?

David: Yeah. So that's actually ... In our case, we're very specific and intentional that we say, we're trying to give you answers, not, you know, search tools. So we're not, we're not trying to let you go exploring caves. We just wanna say that's where the treasure is and here's the zip line right to it kind of thing. And so what we give you is as much as we possibly can, which again, that's a little bit of a gray area these days, especially privacy controls. But the baseline is we give you a stack trace, which most engineers should understand, but it's just like sort of the program instructions that were va- invalid. We do that in sort of specialized ways though, which is important. So in JavaScripts, yeah, just JavaScript code, it looks like a normal stack trace. In compiled code, it's very different than a normal stack trace. But also, when you go to like cross language, we actually blend those together as well. So for example, if you're using React Native on mobile, will actually give you the JavaScript and the iOS stack trace sort of merged together. And so that's w- that's just one minor example of something we do that like makes it [00:13:00] easier for a human to understand what's going on. So we take that and then we actually, not only did we give you sort of that rich view, but we give you surrounding source code, which means you don't have to jump to an editor or to GitHub or something else to understand what's going on. Well, often in some languages give you things like context, locals, which are basically variables that are defined during, uh, that exception, which is hugely valuable for debugging because that's your inputs at the end of the day. And then we'll try to give you a little bit of sort of user behavior, but PyHo, please don't ban us again. Uh, but we give user behavior in the sense of like, here's sort of the steps that led up to this book, right? So it might be the person clicked these three elements or something, or they navigated from this URL to this URL, to this URL and that trigger this book. And that's actually really useful in JavaScript, but actually isn't useful at all in other languages. So we actually have taken a stance of like, "Let's specialize a little bit language by language 'cause it's, it's ... is actually different in each of those." And then we allow you to send whatever else is helpful. So for Sentry, for example, we sent a lot of information about like the kind of customer. [00:14:00] So that might be like we say, "Well, this is a super important customer or it's a free customer, or ..." I'm trying to think, what we else ... what else we even do there. But we also send like diagnostic information, like this is the device that happened on, this is the operating system, this is the browser name, things like that, right? We used to famously tell folks you no longer have to guess if it's Internet Explorer that's broken now, you know, it's Internet Explorer, [laughs] which was funnier when people still use Internet Explorer [laughing] and knew that ... what that meant. But, yeah. So it's really about like, "Can we just get you the right answer instead of you having to ask more questions?"

Beyang: Yes.

David: Which for the most part we can these days.

Beyang: So i- it sounds like, you know, there's some, uh, kind of instrumentation monitoring tools that take the approach of like, "Hey, we're just gonna give you a bunch of key metrics like, m- m- that are derived from the application layer of the systems layer. And then you're supposed to go and figure out how to piece that together. It sounds like you're not doing that. You're kinda really designing for this, uh, front end developer persona first and foremost. Like, if you were in a front end developer shoes, what are all the pieces of information that you would want [00:15:00] access to when you go diving into some production issues. Is, is that right?

David: Yeah, that's a good way to think about it. We basically just say, if you can't change the machine, what's the root cause? If you're out of memory, you can't add memory, how do you fix it? Well, you look at the code that caused you to run out of memory or the change in the code that caused that behavior to happen, right? So we actually say our goal ... And we don't do this all the time and it's still ... it's, it's a really hard problem. But our goal is to say root cause is to commit, the commit that created this issue at the end of the day. And that's, uh, that's true for 95% of failures at the end. Like yeah, you'll have incidents where say AWS goes down and that affects your service, but that's, that's not really what we're optimizing around. We're optimizing around velocity of, of, I guess, shipping new software.

Beyang: Yeah, makes sense. Are there other tools that, uh, you see people using Sentry in conjunction with, or that you would recommend, you know, you- deploying alongside Sentry?

David: I think many. So the ecosystem these days, or rather a developer's world is very complex, right? Like you yourselves are sort of what we [00:16:00] described as like an adjacent tool and, and we always thought about it. There, there's a lot of stuff that can play well, I guess in an ecosystem that can make a developer's life better, which usually just translates to efficiency, right? I think the four core pillars of that are, uh-

Beyang: Okay.

David: And we sort of look at it like what's the old style SDLC, Software Development Life Cycle, which I didn't know what that meant by the way before four years ago. [laughs]. But we looked at that-

Beyang: That's-

David: ... and we're like [crosstalk 00:16:20]-

Beyang: ... that's a marketing term. [laughs].

David: Yeah, yeah. It's sort of like, well, okay, what you do as a developer is you write code, so you develop, you test the code, which, you know, is CI most of the time, these days, right? You release the code and then you need to monitor the code. And there's a lot of different paradigms of that. Some people call, monitor, learn, which would be more like a product analytics angle. Uh, but we just think about those form, we're like, "Okay, Sentry is the solution to monitor it, but we're not the solution for any of the rest of it, right? And so there's all these other adjacent tools that I think make a ton of sense there. And, and, you know, things like Sourcegraph certainly help the speed of development like to help much earlier in the life cycle. We helped much later. But we always kind of say like, "We, we picture this like slot machine kind of [00:17:00] thing for a visual and you, you spin the wheel and Sentry's always gonna be a solution for monitoring. You don't need anything else, but you can swap out the rest of the components. So we're actually a big believers that the ecosystem matters a lot because one company can't solve all the problems.

Beyang: Yeah.

David: But we also believe that we can solve all of the monitoring problems for an application. We [laughs] don't yet, but we will overtime.

Beyang: That makes sense. You know, what do you make of, you know, you mentioned log aggregator, there's also things like Prometheus, which give you kinda like time series overview of like key metrics, there's distributed tracers. Are, are those all kinda tools that you eventually want to bring kind of under the umbrella of the Sentry use case? Or are, are any of them kinda like outside of, of your eventual scope?

David: So you might call me an anti-thought leader [laughs]. I am not on the bandwagon of observability. I'm not on the bandwagon, distributed tracing.

Beyang: Yeah.

David: I believe they're all just marketing chimps, but I will say that Sentry has implemented distributed tracing for its customers for a new performance product. And it looks wildly [00:18:00] different than anybody else's product. And why that is, is because we're very focused on what the output is. And I will also tell you that we have metrics that you can alert on and we don't collect any metrics. And so what Sentry has done, and this is an impossible problem to solve. So this will certainly not be our full forever solution, but we basically suck in a ton of events and the most abstract way. Just imagine we suck in a lot of events, some of them are errors, some of them are what we call transactions, which are a segment of a trace, uh, but it's a request response life cycle. We pipe those into the system and then we extrapolate from there. We extrapolate, is this a problem? Is it a new error? Is it a unique error? And we extrapolate, is it slow? Which is just the metric that's accumulated from all those together, things like that. And so I'm a big believer in that model. Unfortunately, like if you just can imagine the amount of data in the world that it doesn't scale, of course, but that's kinda how we thought about it. So we will serve those things. I don't know about logs. I think logs have multiple uses, but I think traces in particular are just a means to an end. Like, the way I described traces to most folks is Open- [00:19:00] OpenTelemetry, which is what the format's called now. It's just an open source version of New Relic schema. Ultimately, that, that's the ... that would be my intention behind it. It's not some new kind of problem in the world. And so we've taken that idea because it's really good work. And I think the tracing product, they've got a lot of smart people behind. We've taken that and we said, the product that people need is New Relic. The data might come from tracing, but the product output looks more like New Relic. And so that's what we're building. There are some advanced questions you can answer by using like end-to-end traces. But for the most part, you're a mobile developer, you're not looking at what's going on way down the stack in the database, right? Like, y- you have no idea what that even means, you just need to know is my mobile app performing well. And so that's like a really important lens we put on it, but like to circle back to logging, I think that's an interesting one because logs actually, they amount to a few things. One, they're like this duct tape solution for any problem you have. Like, if you look at elastic, they're just like, "Put everything in a log and/or Splunk," probably the same thing. And you can make metrics exist, and you can aggregate things together, do all these crazy stuff, which is not a super effective [00:20:00] solution. But we basically say logs are for audit trails. Like first and foremost, it's for compliance and it's for security. You must have that stuff. It's not, uh, it's not an opinion, it's not a discussion point, that's like a mandatory requirement for, uh, all organizations. You can't change that, right? But then there's also this idea of logs for debugging. And that's really where it gets interesting because technically speaking Sentry collects logs with errors. We call them breadcrumbs, just like user click, this user click this, that's just a log at the end of the day. And so we thought a lot about, and we frankly, will probably build it, just allowing developers to send us logs as well, because it's very helpful in the debugging process. But I think the challenge is, and rightfully so, nobody wants to pay, you know, duplicate fees for, for all of this stuff, right? So if we're collecting logs and they're sending them to say Datadog, and Splunk, and all these other things, they're not gonna be super thrilled about all the money they're just throwing away for no reason. So I think that's a tricky area, yeah, but I, I do think tracing is just, you know, it's, it's a means to an end more than it is a, you know, different product.

Beyang: [00:21:00] Got it. That, that makes sense. Can I, can I get you to talk more about observability? 'Cause, you know, f- full disclosure, another person we're gonna have on the show is Charity Majors of Honeycomb. And I think they, they basically coined the term. And so, you know, I'd love to get your kinda candid thoughts if you're, if you're willing to share on, on that kind of idea as, uh, as a movement and the issues that you can see with it.

David: I would say you've never had to force somebody to do something that's very valuable for them. M- maybe never is the wrong word there. And this idea that we're changing, what monitoring is, I don't think it's true at all. And so I'll give you an example. If you collect traces and you do no work, so say you put them in there by default, right? And that's the idea of observability as it exists. And then you can go ask questions. Well, all your data's useless. You still have to go add new instrumentation to make it more useful. And that completely breaks the idea that we're doing anything different than we ever did before. We're still reacting to problems. And then adding additional telemetry at the end of the day, it's just the reality of the world. Now, you could say, [00:22:00] "Well, one of the things we're enabling because Honeycomb is based on the concept Snuba ... or Scuba, sorry, Snuba is our internal version, Scuba from Facebook, which every big company has a Scuba, right? And we now have a Scuba ourselves. And that model is really good. That's ... but that's just an infrastructure service at the end of the day. It's just a very fast log search is the way I think about it. And that's a really powerful thing that if you said that was observability, just being able to completely just to like 100% just the side of ask questions, your data, that's great. I, I truly believe in that. But observability to me and a lot of these things you see is just marketing and we, we never been great at marketing, which is probably why we don't do any of those things, but we've also never needed it. Like, we don't really have to convince developers that error monitoring is valuable. We didn't even have a word for it for a long time ago. Like, we're just like, I don't know, we, we aggregate exceptions or something, right? And I think it's just, I don't see what people are doing is any different than what they were doing 20 years ago, ultimately. And that's my issue with it. And it's the same with APM. Like, we're so- we're trying to solve the same problems that New Relic solved [00:23:00] 10, 15 years ago, whenever it was born, right? Now, they don't solve them that well, these days the solutions were still really good ideas. And we're just trying to like make a better version of that really good idea. And that's where I think people ... Because it's, it's better to be different than it is to be the same or similar. And I think people are too focused on that and I just don't care about that. So I'm just like, "I don't know, we're kinda building APM at the end of the day. It's the same thing it's always been, it's like a category of stuff that sorta does these things and observability is just monitoring at the end of the day," and you know ... And, and that's kinda my like gripe, so that is, uh, I don't care about talking about it, I'd rather just be building a great product for people to use. And whatever you wanna call it, you can call it, but well, that's where I lie.

Beyang: Yeah, that makes sense. And you said something and I just wanna kind of like it back to you just to kinda see if, if you, if you agree with this. Like I, I kinda got the sense that your, your impression of observability is it's trying to say that like, "Hey, there's, uh ... we have this completely new way of thinking about, uh, telemetry and instrumentation [00:24:00] basically like, you know, send every single event you could ever care about, uh, towards service." And so it's all accessible. You never have to worry about, you know, having forgotten to instrument something, have to go back and add that piece of instrumentation late- later, it's just this giant, you know, infinitely wide data blob, which you can kind of explore at your leisure. And what you're saying is that's kind of a pipe dream that you're always gonna add, like some amount of instrumentation. Uh, there's always gonna be kind of a shortcoming in, in what you're able to collect. And then the process is just like, you know, add the instrumentation that, you know, you need now. And then at some point, it's gonna fall short and then go back and add more instrumentation later to get that data into, you know, whatever production dashboard it is you're using to debug those issues. Is-

David: Yeah.

Beyang: ... is that about right?

David: That's very much how I feel about it. And I will say the, the improvement we've gotten as an industry is the old way to add instrumentation by default was logs, right? Very unstructured. It was just human texts at the end of the day. And we've at least improved upon that where a lot of [00:25:00] people do like JSON-based logging. So just key values, right? Which is valuable, of course. And tracing being built into libraries is actually very valuable because now we actually have real annotations. Now, the problem is if their annotations aren't good, you're still back in the starting point where it's not instrumented well. Now so I just think it's an unsolvable problem and you have to give people the right tools. And so the ... I think the tech that everybody's building is really good. And, you know, we use ClickHouse, which is open source. So you can think of that as like an open source in a very naive way. It's open source Honeycomb, or open source Scuba, or something like that. That tech is super powerful. Like, we've replaced our entire logging infrastructure with ClickHouse at this point. And we can ask real time questions on it, which is great. But we still have to record the right sort of preemptive answers those real time questions in it at the end of the day. And that, that's, that's where that falls down. But it's still a good iteration on technology-

Beyang: Yeah.

David: ... and so I think that's important.

Beyang: You, you mentioned that the kind of the fundamental problem that you're trying to solve is kinda the same one that the New Relics and the, the Datadog's of the world are going after. Why do you think it is that they've been unable to [00:26:00] effectively solve that problem?

David: So I guess my ... I do wan- I don't wanna say arrogant, but my, like, real view of the situation is a lot of people build stuff for themselves and that's no different than me. Like, Sentry was just me solving my own problem. I think the difference for me, and a lot of other folks is, uh, I know infrastructure, but I'm not just an infrastructure person. I'm very much full stack and I like building the end user experience, right? And so we quickly biased towards like, you know, we actually want this to be about the end user, the customer, versus we're building another systems monitoring tool and that's fine that people do that. But what you'll see is a lot of folks that are building new generations of technology, they end up being very biased and they're just solving infrastructure monitoring, like with another flavor of the week kind of thing. And that's probably not giving enough credit, but, but I think that's the problem. So we actually ... So we're launching our performance in July and we set JavaScript first. If we can't solve JavaScript, it doesn't, it doesn't happen because that's the most important language in the world right now. And that's not even subjective. And it's also much harder to solve for [00:27:00] then say, Python. Like, we solve Python in like one week and JavaScript we've been at like for multiple months trying to make it work well. And I, I think it's just people stick with their comfort zone. I think technology also changes and big companies rarely keep up. So that's part of it, you know, like a lot of these newer sort of systems monitoring companies wouldn't exist if, if these big companies were able to keep a product that worked really well, right? And I think data is on a really, uh ... Datadog's done a really good job of their core, but if you look at say Datadog APM, I'm like, "It doesn't really do anything." It's just like you slapped APM on this thing that actually doesn't do a lot for developer. And for me, that's really frustrating because I gotta deal with this idea of like, "How is it different than Datadog?" And I'm like, "Oh, it's, it's another one. How's it different than logs?" Like, back to square one. But yeah, so I think a lot of it's like technology changes ... paradigms change a little bit, but not all that much. But you know, I think it's mostly just like scale of data concerns. So-

Beyang: Yeah.

David: So I think we'll keep seeing that over, you know, probably the lifetime of, of software, but-

Beyang: Would it be fair to say that [00:28:00] kind of the evolution of the production monitoring space of tools is kinda dictated by trying to solve the same essential problem, but for like each new computing platform or like paradigm that comes along, it's just kinda updating the, the model to fit to the ... to that specific world?

David: I think some of it. So I'm gonna give you an example of it's very much not that. So I'll get, I'll give you one on the ... So let's take microservices. I still refuse to believe anybody truly operates with microservices because the complexity is just, it's, it's ridiculous like, right? But service architecture has been around forever, right? And microservices, hypothetically in, like, in a macro level are no different than service architectures, especially at like a Google scale, right? And so that problem does not change anything fundamentally. Like, the problem is the same problem has always been, you solve it in the same kinds of way or kind of ways, right? On the counter side, and this is a real issue, so we have an internal service that uses GraphQL, which is Facebook's, um, graph query language that some JavaScript [00:29:00] people love, um, which also introduces a ton of complexity. And one of the first things you see with complexity is all of a sudden, any version of tracing makes no sense 'cause you have no aggregation points. 'Cause if you think about any kind of like performance overview, you need to be able to group things together, to be able to understand a meta h- a higher level characteristic. But the core idea of GraphQL is you can just specify whatever you want in the query at any given point in time. So there is no aggregation point at all. And so then you end up with this question of like, does there have to be a specialized solution to that? Or do we need to modify what we're doing? You just fit with a standard. And I actually don't know the answers to those. Like, there is actually like specialized solutions for GraphQL performance monitoring, which is mind blowing, like, "Why should that have to exist?" And now that's why I like, like certain curr- current generation of the world is like things like OpenTelemetry if they truly remain open, which, you know, that's always, uh, a risky area, of course. But, like, if they truly remain open and they get adoption, so they actually become a standard, well, maybe then we [00:30:00] start adapting to standards instead of sort of forcing ourselves to recreate everything. Because like, obviously as engineers, we just like rebuilding things over and over-

Beyang: Of course.

David: ... which sometimes it's good-

Beyang: Yeah.

David: ... but often it's, it's two steps back and one step forward.

Beyang: Yeah. You know, on, on OpenTelemetry, I think one of the questions is standards kind of evolve to fit the reality that has kind of emerged. And then when that reality changes, it's often like ... standards are kind of a lagging indicator, right? There's always ... It's, it's often like an individual company that, you know, identifies a need that's unmet by the existing standards in, in the existing ecosystem that's tied to that standard and goes after that first. Do you think that OpenTelemetry is general enough kinda like future-proof?

David: Yeah.

Beyang: Or do you think that, you know, it's ... it has evolved for the kinda, like, the current trend in, you know, microservices or service-oriented architectures, and after that wave crests, then it, it might become obsolete?

David: So I think for what it does, [00:31:00] it's hard to predict obviously, but I think it, it does a good job for what it does and I don't see it being replaced. Now, the problem is it doesn't do everything. So we wrote up a spec, which was very similar to open- uh, not similar to OpenTelemetry that's, that's, uh, giving our stuff too much credit, but it was basically a distributed tracing spec. It was a distributed tracing spec based on our way of thinking, which is actually quite different than what OpenTelemetry achieves. And, uh, the way I would describe that is, OpenTelemetry is about small annotations, lots and lots and lots of small annotations. Like, I read something the other day, it was on Hacker News where I think somebody on behalf of Twitter was suggesting that they had traces with a million plus spans. And that's ludicrous to reason about like our stuff won't even support that right now. And when I think about that, I'm like, "Okay, that probably makes it more useful, but the problem is when you think about spans, they're very, very useful 'cause they suggest like function call at the end of the day." But if you try to use spans for actual profiling, like C profiling, like you're gonna have hundreds of millions of these and it won't work at all, like it's too fat of a [00:32:00] data structure. And then on the counter side, if you need really rich information, which is what Sentry collects, it doesn't fit into that model because it's too big. Like, if you had a million of these spans and all the spans are like 10 times the size, they are now, no systems that are gonna be able to handle that. And so I do think there's still gotta be middle grounds. And I think OpenTelemetry, the core concepts will stick because the core concepts have been around forever and they make a lot of sense, right? At least, the core concepts of distributed tracing and things like described in the Dapper paper, OpenTelemetry has evolved to have a bunch of other things. And I, I don't know if I have enough, in my opinion, to say they're good or bad, but it does feel kinda like all specs where it just gets bloated over time, and there's a bunch stuff tacked on to please other people. So I, I don't know. We'll see. I do think it's good either way. It's good to have better instrumentation built into libraries, but I worry that there's still an adoption curve, that's never gonna get past where, yeah, we'll have this pretty good standard, which could solve a lot of problems, but it never will fully adopted probably if nothing else, because to [00:33:00] like implement the other side, the collector side, that's where a lot of opinions come into place and that can completely change how valuable the data is. So I don't know. And then I, I feel like there's not much else out there that's any kind of standard, right? Like even Sentry stuff, like at some point, uh, big company that I can't name suggested like, "Oh, we should open source the, the error schema and then they'll help us collaborate, blah, blah, blah." I'm like, "Why would we do that?" Like, w- what is there for us to gain as a company? We're like the market leader open source and it just helps our competitors. And, and I think on the counter side, I was like, uh ... Okay, this is a little bit of a segue, but one thing I always loved about Python was this concept of BDFL that Guido had for Guido. An, I love it because it said, "I have an opinion, it doesn't matter if I'm right or wrong, but at least we'll keep going in the same direction I guess Linux was a little bit of this way too, right? And we basically said, that's what Sentry is. Like, it's open source, but it's not community built. It's very much like we decide what the direction is. And I think that's truly, truly important. And I think this is often where things become crippled and they just stagnate, is when there is not just like an overriding force of saying, "No, we're doing [00:34:00] this." Who cares? Like just deal with it. If you don't like it-

Beyang: Yep.

David: ... that's your problem. And so I would worry that about OpenTelemetry, but we'll see-

Beyang: You, you kind of please everyone. That's the-

David: Yeah.

Beyang: ... that's the fallacy you, you fall into it.

David: And you can see like, Kubernetes, I don't know, like, is Kubernetes gonna exist in five years? It's hard to say. It's like super complex and there's a lot [laughs] of stuff that keep getting crammed into it. Like, um-

Beyang: What's, what's your take on that, prediction? Now, [laughs] will Kubernetes exist in five years? Go.

David: If it does, there's going to be another layer above it that anybody can understand.

Beyang: Yeah.

David: 'Cause like it's gotten so complex and so hard to reason about, and, and at this point I'm like, "Okay, now we're, we're falling back." And it's almost just, uh, a different version of Chef or, or Puppet or something else. It's just gone back into that bucket where it's just so complex and, and it's very case by case and ... I don't know. We'll see. But I, I feel like we create a lot of complexity in a lot of things we do as, as software engineers and it's sort of ... especially on the open source side. And we don't often do a lot to simplify that complexity. We just [00:35:00] find ways [laughs] that add more like, NPM is a good [laughs] example of this. But, uh, yeah, I don't, I don't know. As I, I remember talking to somebody, I was getting coached on doing a press interview one time and they're like, "Well, what do you care about? Like, what do you have an opinion on?" And I'm like, "Honestly, the, the biggest opinion I have is that we've made it really hard to write software." Like, when I started, you were able to FTP a PHP script onto a server and it was live. You didn't have to do anything else, it was just ... You could even edit the script or the, the file on the server and it just updated in real-time. Now, I've gotta like install a bundler, I've gotta have multiple services running, I've gotta figure out how to deploy this to AWS. So like I need all this other stuff, which, you know, for scale, a lot of that's good, but we certainly did create a massive barrier to entry for a lot of folks.

Beyang: Well, why do you think that is? Do you think it's, it's like necessary complexity just 'cause you know, modern applications had to be more robust, they have to reach more people and users? Or do you think it's, you know, there's some- there's something unnecessary about all the, all the stuff that you have to kinda set up on the, the deployment [00:36:00] side these days?

David: I think it's both. Some of it is certainly necessary. Like we've developed much better technology that, like, we have really rich, fast acting user interfaces, for example, right? And like that needs new technology and new hardware and all these other things. So I think some of it's needed, and then we do simplify some of it, but then other side of things we're just stubborn and we like reinventing things. And, and you can see infinite versions of that. And so I, I don't know. I, I think open source on one hand makes this better where if sort of predominantly there's a great open source solution to most core problems that are not your business logic, you're probably gonna use that frankly. Like why wouldn't you? And so I think that's valuable. But I s- but the counter side of that is also like, even like Sentry, when we switched our license to BSL, they're like, "Oh, just Fork Sentry." And I'm like, "Great, it'll be a bad version of Sentry in the world." Won't be called Sentry, but it'll be a bad version of Sentry. And I'm like, "That, that'll, that'll be outcome." And all of a sudden now, we fractured again, we went from like solving the problem very well, then now we have, you know, two solutions that may or may not be solving it really well. [00:37:00] And I think that's always where we break off. And I think a lot of that ... I don't, I don't know. So like I'm not formally educated, like maybe part of this is education. It's like academics and like you need to go through the like, like more like computer history, but the problem is it changes so quickly. And the benefit of tech is you don't need that barrier. Like you don't ... you're not required to go through that. So I don't know. It, it's kinda one of those things that I, I don't know there's a right answer.

Beyang: Yeah, yeah. And I also tend to think that like, you know, w- w- the history behind things, a lot of it ends up being narrative to kind of reinforce like present objectives too, I think.

David: Yeah.

Beyang: Like there's always that kind of tendency bias, not necessarily conscious, just to tell the story of the past, uh, in a way that favors what you want the present and future to, to look like.

David: That's definitely true.

Beyang: Tell me about open source. So you, you mentioned that, you know, if, if there's an open source version of, of something that you wanted to play in a production, you'll probably want to use that. And I, I definitely think that ... like that's something that a lot of developers feel like, I'd much prefer to use something that's open [00:38:00] source than proprietary. And, you know, Sentry itself is open source. How did you kind of arrive at, at the decision to do that? And are there parts of what you sell that are not open source and how do you kind of decide what the, the boundary line between those are?

David: So, so Fork Sentry was ... it just started as open source because I would just open source everything 'cause i- it was fine, I didn't care. Like I never intended to build a business or anything like that, I, I like the open source community. It was, it was really nice to be able to engage with people that were interested in your work, right? And so that's kind of how it all started. And sort of, because of that and because we ... it's self-hostable, right? Like we actually just went down the path of like generally speaking, we can't use proprietary software because we're not going to be able to give that proprietary software to our customers and we want them to be able to use our product. Like we're optimizing around people using our product, like not profit or anything else at the end of the day. And I think over time, the open source ecosystem has shifted a lot in a sense of like when we raised money five years ago, everybody, except ... [00:39:00] I, I'd actually think literally everybody except our, uh, seed investor, uh, which was Dan Levine from Excel. And maybe even him frankly, they're like, "How are you going to make money on this?" And we already ha- we were profitable, we were making a healthy amount of money at that point. And they're still like, "How are you going to make money on this?" And I still truly believe this. My answer was, "It doesn't matter. If we can't make money, nobody else is going to." Like, that was very much our conviction, right? And it was more like, "Just stop asking me this, we'll figure it out. Like you don't ask other people that question, like they don't make any money and you give them a lot [laughs] of money." So it doesn't really matter. But ... So we started out as that. It was like, "No, we just like open source. Like we believe open source is a really good model and we don't see that, that there's a risk to it." And you know, over the years I've changed a little bit. The next question was like, "Well, why won't AWS just run this." Like if you build a big business, why won't they just do what they did do to Elastic, right? And we're like, "It doesn't matter, we'll solve that problem if it happens." Like we'll just change the code base and say, "If AWS crash," or something like that, right? Like we jus- it's just like, that was not what was important to us. It's [00:40:00] like, we're just trying to build something that like all developers can use. And that was the fundamental goal forever and still is, it's written into our values at this point. And we actually recently, I don't know, sometime last year changed our license. It used to be BSD and we changed it to BSL, which is just a proprietary license. It's proprietary, but it's somewhat known, right? And so if you call it open source, a bunch of people on the internet will yell at you. I frankly think they're ... they should stop giving people a hard time and they should value the fact that free software is still free. But, but so that was our bridge. And that bridge, like we created purely because ... And I guess quick caveat DSL, you can do anything you want with Sentry, keep the license intact, you cannot sell a competing product to Sentry. And after ... I forget what our timeline is, two or three years, it becomes a patchy too. So it's kind of, uh, like, uh ... In my opinion, that's a good license. And everybody-

Beyang: Got it.

David: ... that is doing open core or something similar should maybe consider BSL. [crosstalk 00:40:52]-

Beyang: So, if I wanted-

David: ... license.

Beyang: ... if I wanted to deploy like Sentry within my organization, I can spin it up for free and not have to worry about being bound by any [00:41:00] proprietary license, as long-

David: Exactly.

Beyang: ... as you don't launch a competing product. Okay, got it.

David: Exactly. And that's important to understand w- or why that is important is because we don't actually have any proprietary features or code as part of the core of Sentry. We have ... we do have proprietary code, but it's our infrastructure and things like that. Like what you see on GitHub is literally what we shipped to production multiple times a day, right? We shipped some other stuff like billing code and stuff next to it, but that's, that's why we wanted this because like we can still give it away for free, which was our goal, was accessibility. It wasn't, "Oh yeah, you should fork and use our code." And like, steal is legally the wrong word, but like you didn't build it, why should you have all these rights to do whatever the hell you want with it? You shouldn't. It's entitlement, right? And so that's ... We just took a moral stance, we're like, this is what we believe is right and just and we'll do it. And so we made that change. But frankly, we mostly made it because, I won't name names, but like startups randomly taking our code and not crediting us. And I was just like, "I'm tired of this." Like I don't have the like the mental energy to deal with these annoying companies. It wasn't because we were afraid of AWS or anything like that. I think they'd rather partner than [00:42:00] try to run it, so it's not a big deal. But yeah, it's, it's just, we've always had that conviction. And so like by default, basically everything at Sentry has been open source and every company I've worked at has strongly encouraged or given freedom for open source, both, you know, Dropbox and Disqus we're really, really big on that. Which I valued a lot and I think often goes unappreciated. But it wouldn't be cool to see more companies ... And I ... everybody hates on Facebook these days. But like, I'd like to see more companies contribute in ways like Facebook has. Like React is phenomenal for the industry, right? And they've done so much for that space. And so, so hopefully we get to see more and more of that. Because like once you've got corporate sponsorship from that kinda way, right? It's not necessarily a community building React, it's Facebook.

Beyang: Yeah.

David: Like you're actually able to build a really great product. So, so yeah. But I don't know. I think it's a, it's a great industry, it's just open source that's generally hard to monetize. So I, I think-

Beyang: [laughs] Definitely.

David: ... that's the challenge for folks.

Beyang: I think there's, there's a lot of, especially developer tool creators who want to kind of make the right trade off there, you know, like developers love open source for a good [00:43:00] reason. You know, there's nothing worse than having a tool that breaks in some way where you can't dive into the source code and understand what's going on. But at the same time, you know, gotta pay the bills, gotta, you know, build an engine that actually sustains a business so you can continue to invest and, and improve the tool.

David: Yeah.

Beyang: And that's, that's kind of an ongoing challenge. Kind of going back in time to the origin stories of Sentry, so you mentioned that you, you started it while you were at Disqus. Can you talk through, you know, what motivated you to create it and you know, what, what the project was like in, in those early days?

David: So Sentry going back to even mIRC days, 'cause this is how old that is, I was in a ... so Django, the web framework, I was in a Django users channel. So similar to what's the mailing list would be, and somebody asks very simple ... Like I would just help users, and somebody asks how to log errors to a database so they can put them on a dashboard. I'm like, "Hmm, why would you do that?" And I'm like, "This is how I would do it." And it was like 70 lines of code and I [00:44:00] published that, I called it Django DB log, very, very clever name. And, and that was that. And then I just like hacking on things, so over the years I added a few features. So I'm like, "Oh, this is kinda cool, I guess." And it, it was also during the time when Django was booming and I became somewhat prolific in those days in Django, in the sense of, I built a lot of things that, uh, Django folks used. And this was one of them. And when I joined Disqus, they were using this. And so I'm like, "Oof." And actually the first month, it might've been the first week, I actually took the Disqus service down and it got even worse because of Django DB log because of the original Sentry. I just ... It couldn't scale in, in scenarios where you had catastrophic failures. And so it just took us longer to recover, right? So it's like, "Okay, let's fix this. Like let's actually make this work at the scale that Disqus was." 'Cause they were pretty significant in request volume at the time. And so that's when Sentry was born. It still had a different name, it was Django Sentry, but we'll, we'll ignore that detail. And then I was at Disqus for about three years, uh, company always was pretty small when I was there. I, I think maybe when I left, it was 60 folks. So, you know, you can think of small as [00:45:00] like we had a lot of ... sort of autonomy and freedom, right? And I would just generally build a lot of open source stuff and I would try to improve our tools and things like that. It's sort of my side projects. And Sentry, I just kept hacking on. And the thing that I think ... And people don't give open source enough credit, they often complain like, "Why should I, I contribute," or something like that. And then you talk about resumes and ... And for me it was like, well, I'm contributing 'cause like there's a bunch of companies that think what I'm doing is super cool. And I value that, that like makes me happier. And in this case it was companies like Eventbrite and Mozilla and Instagram. And, and all of us were like up and comers to some degree at the time, and all on Django. And so that was just exciting to me. So I got to network with a lot of peers that were really smart. And then over the course of three years, I asked my now co-founder to help start like Sentry the business per se. And so we did, I spun it up right before leaving Disqus. And then, you know, ran that I think for ... I don't know, between two and a half, three years, something like that during my transition from Disqus through Dropbox.

Beyang: Got it.

David: And then-

Beyang: So you're running the business on the side as you were a full time engineer at Dropbox?

David: [00:46:00] Yeah. And at first it was great. It was not a ton of work. You know, as you get bigger and success per se comes to you, it gets much trickier, uh, especially with an infrastructure monitoring tool. Like you're collecting a lot of data, there's a lot of services that are running, so ... and especially when you're bootstrapping it. Like I'm actually very proud to say asterisks of whatever the, the corporate filings say, like I put no money into the company and nor did my co-founder, nor did anybody else. It was bootstrap from $0. We didn't have to pay our own salaries, which was very fortunate. But it was effectively profitable from day one, day one until, you know, sometime in VC lands. And then ... So we ran that for a few years, right? And I think, you know, when I was leaving Dropbox, I was there a little less than two years, Dropbox had grown really fast at that time. It'd become like, to me, what was a big company, it was over 1,000 employees when I left. And it was like 250 ish when I joined. And at, at that stage of the company, and for me at, at my, I'll call it, my maturity level at the time or my professional approach at the time, like I didn't like that kind of company. Like it wasn't empowering for me. I didn't feel like I could get anything done, I had to deal with a lot of politics. [00:47:00] I was like, you know, "This is not for me. I have this other thing that's become, uh, an intense second full time job. I can match my salary now, so I don't even care." And so we just like myself and my co-founder who's at GitHub at the time we just quit our jobs. And we were like, "Cool, we can do this full time." And then like, ultimately as soon as you do that, you take it much more seriously. And we looked around and we saw all the competitors that had sprung up. And I'm a very competitive person. And the first thing I thought is I'm like, "We're certainly not going to let them take over, like take what we built kind of thing." Right? None of them were open source or anything, and so we were like, "We have this moral high ground. And we've been doing this for so long, it's sort of like, we've earned this opportunity." Right? And so then we went out and we raised money. And when we did that, that was the turning point. That's when we said, "We're not raising money, just like, so we can hire some people right now. We're raising money to build this into the biggest thing that can possibly become." Like, it was very much like, like we never talked about what an end was. We just said we built it bigger, and that's it. And so that's kinda what we've been doing ever since. And that's been five years now since we raised that seed ground.

Beyang: [00:48:00] Yeah, that's awesome. Can, can you talk about ... So, you know, you, you mentioned that Sentry kind of grew out of Django roots and that was kind of, uh, an era of web development that was, you know ... It was like the, the age of the monolith, so to speak. And the landscape of course, has evolved over years, both in the form of, you know, like web development. Now there's more multi-service applications, serverless is a thing. And also, you know, additional computing platforms like mobile has grown there's things like IoT now, edge computing, if you consider that a thing that didn't exist a couple of years ago, How has, uh, Sentry evolved and adapted, uh, with the times?

David: So we've always iterated on what we do from, uh ... I think about it from a data collection or instrumentation standpoint. We've always tried to adapt to frameworks and stuff. But what I actually think is super interesting is our core design is still ... honestly, it's what you need still today in all paradigms. And I think that works really well for errors. It, it kind of makes sense, like all errors are kind of shaped the same. We [00:49:00] did have to make some changes along the way. Like, uh, a tangible example is when you are running an iOS code, there's a lot of threads running and it may not be the main thread that's crashing or something. So we had to adapt the system to handle those kinds of cases. Or JavaScript, all the compo- all the code is compiled and minified and obfuscated, right? And now it's transferred like from two different languages into like the, the like code you're actually running. So we had to adapt the system to transform everything, right? And that's basically the model we still run today. It's like, we just have transformers that brings it back to the core and it works for every application in the world. And I actually say that like, literally, like we have so many different kinds of applications, to me that's like really exciting, like, like console apps, browser, desktop, IoTs like firmware, errors are coming even up to us every ... like nearly every web app in the world. Like it's, it's just really cool that, that core works so well. And you know, we're a monolith ish ourselves and we actually even make sure Sentry supports monoliths in a really good way as we try to make sure it supports service oriented in a really good way. [00:50:00] And some of that was just because, you know, we decided to go down this route, so we had to solve our own problem first, right? And so I think that's been, that's been really good for us. I think some of the challenges have come down to it, like I mentioned, the GraphQL thing, which is tricky. When you get into performance, it's much, much harder. Like a crash is kind of a crash, no matter how you look at it. With the caveat of like, what's a crash in the browser, right? That's a little fuzzy, but you know, you can reason about it. But what is a ... So we call our performance stuff transactions, so a request response, right? That's a transaction. What's a transaction in a mobile app? I don't know, it's kind of whatever you think it should be, frankly. I mean, it's a view, you know, maybe it's some other action that you take or something. And so we just ... Like, we spent a lot of time trying to reason about how can we create a model, just like our error model is, that works for all of these paradigms. And that's why I went back and I was saying like, you know, everybody solves for the easy problems first, they solve for infrastructure. It's a very s- simple known quantity, JavaScript, desktop and [00:51:00] mobile, IoT is even worse. But like, those are really complicated thin- like scenarios to solve for, especially if you want a truly universal solution. And, and you might argue universal doesn't matter, but it does matter because applications are so complex these days and, and the same argument for tracing is true here. Like your mobile app is communicating with your API. I don't need to understand the API as a mobile developer, but I do need to know that it's the API fault and I need to know it's ... And the only way I can truly know it's the API's fault is because the API changed and the root cause can be, can be diagnosed in the API itself. And so because of that unique connectivity, which is why you actually need that universal platform. And so that's been a lot of our challenges. But I think the core of like ... And going back to, like, I think we're just recreating new solutions for the same old problems. And I truly believe in that because Sentry is like, I don't know if it's 11 or 12 years old, but it's still the same as it was, you know, that long ago, fundamentally at its core. So-

Beyang: Yeah. Do you, [00:52:00] do you view performance monitoring as a different problem from error monitoring or, you know, are they one and the same? And if they are one and the same, you know, how, how do you decide what actually is an error versus just, you know, tolerable slowness in, in an application?

David: So I would say they're different, but I would reason about ... So to give you insight, like we set Sentry's goal, if, if what Sentry does, is it says, "Hey, here's an error. I'm telling you about it because you caused it, here's the commit that caused it." Like if that's fundamentally what we're trying to do today. What we're shifting towards is, "Hey, you made a change to your code, here's all of the things that went wrong." Like if you can imagine anything that can go wrong from that, we want to be able to identify any of those problems. I want to call them all errors though at the end of the day. They might be defects or problems or something like that, or in Sentry terms, issues. But they all exhibit very different characteristics. And I think some of it's frankly, up to the business's side, what it means, right? Like my analogy is always like, if the Uber app [00:53:00] crashes, obviously we know that's bad. There's no questioning that. But if the Uber app sits at a loading indicator for 30 seconds, is that bad? It's probably the same as a crash. For all we know it's going to sit at that loading indicator forever and it's never going to change, right? That might not even be slow behavior though at that point, that's just something is not working and isn't causing an error and isn't slow. And so that's actually another thing we think a lot about. Like if signups go to zero and you can correlate it with a release, that is a software monitoring concern. And that's something that most people don't try to address with their monitoring tools. But it's such a fundamental, like not only is it super critical, but it's such a fundamentally easy thing to, to do too. It's just like, "Okay, requested a sign of pinpoint flatlined or successful request at the very least." We made a change related must be, you know, there's a high correlation, right? Like, and so that's fundamentally how we thought about it. So I, I, I think if you live in APM land, you say it's all the same thing 'cause it's performance. But in an abstract way-

Beyang: Yeah.

David: ... if you're a developer, performance generally means latency.

Beyang: Yeah.

David: It doesn't mean [00:54:00] errors. So ...

Beyang: And so like when, when you're defining, you know, what constitutes an issue or not, is that something that you define in kind of the application code? Like, "Hey, if this condition happens, send, uh, like a thing to the Sentry API," or is it something that is more defined like after the fact, like you're collecting all this data, you know, you're, you're collecting things like request response times and you notice a spike in, in one particular metric?

David: Yeah. So for errors, it's very straight forward. It's like, it's like a try catch statement effectively. So like everybody kind of gets that. For performance, we basically collect it all. We're going to have some sampling algorithms down the road, but right now it's just collected all, you can client side sample, if you want. And then you can ask questions. So going back to this observability idea, like we have the data so we can, we can monitor well. We might find we don't need to store the data. But there's a lot of value in having access to that data. And ideally, it's cheap enough to store it. So our solution is just like errors are in there, transactions are in there, and we collect some other kinds of data, like some security reports and things like that. [00:55:00] Um, we just call them events internally. So just, we collect a lot of events. And the advantage that we have with events and doing the way we do it is, it's all structured. There's all ... There's meaning to everything. And that's such a powerful idea. Like never build a platform, just build structured data, like build schema. Schemas are so helpful in everything in life. Like, like abstractions are, are valuable, but like, you know, you can't have everything be an abstract, yeah.

Beyang: Yeah.

David: AI doesn't truly exist these days [laughing], so-

Beyang: It's just pattern matching [laughs].

David: Yeah.

Beyang: Yeah. Can I get your like opinionated take on best practices and anti patterns in production monitoring? Like if, if I'm, you know, leading a small team inside a large organization, or, you know, if I'm developing a new application myself, what, what are the common pitfalls I'm going to fall into and what should be kind of the framework that I try to use when I think about like what tools I should use and how I go about having my team debug production issues?

David: [00:56:00] Yeah, I don't know. I, I think there's this concept test in production. I hate that. [Laughs] Like one time I had somebody we had hired, and they're like, "Hey-"

Beyang: Can, can you, can you explain that concept first to just people, yeah.

David: Okay, so test in production. I'm just gonna take it in the literal sense. Like, [laughs] we didn't really test this code well, we shifted to production and it broke, and that's okay.

Beyang: Yeah.

David: That's not okay, that's never okay. I had to explain to an employee one time that that's never okay. It doesn't matter what our software is for. Like our software is for detecting those bugs and making it so you can quickly remediate them, right? But to quickly remediate, you still have to do a deploy, you have to deal with customer fallout. So it's never okay. And that's why I don't like this idea of test in production. Like it's a really like cheeky way to say that you should be willing to accept risk, right? It's, it's move fast, but don't break anything. Like that's, everybody's objective. And it says [laughs] ... And I, I don't know, that's such an important concept to me, so ... And what I'm alluding to with that is just like CI and CD is super important. Like with, with any new software or any new team, the first thing I would do is [00:57:00] focus on a fast release life cycle. And the unfortunate truth is that's really hard to maintain, like as applications get bigger and more complex, as infrastructure gets bigger and more complex, it's really hard to do accurate testing or fast testing or both. But I think that's the most important thing you can do from a sort of-

Beyang: Okay.

David: Just prevent the, the need for production monitoring. But then I would say you adopt first application monitoring before any kind of systems monitoring. Like, yeah, you might need systems monitoring, you might need metrics and stuff, but you can get away a really long time without any of that. Like, I truly believe if you throw a Sentry into your application, you might not need anything more until you're like a pretty significant size, the organization, because errors are going to be your number one enemy at that point. Like you're going to catch a lot of things in development, but you're not going to capture, capture everything. And so I think it's important to do something there. I don't know though. I, I frankly don't spend a lot of time. Like I set up Sentry on new projects now and I'm like, "It kinda works." I usually have to tweak a few things and make it fit, I may need a little bit more, but ... [00:58:00] Because everything else it's like the barriers are so high. Like we have this CI service that we run, it's public, but I'm pretty sure we're the only consumers of it. And it runs all these different services. It runs a cue, and it runs, uh, like a database and they're all offloaded to somebody and there's no ops people. It's literally, I just do it in my free time. And it's always breaking. And it's never breaking because of code anymore, it's breaking because like really cryptic stuff is going wrong. And I'm like, it is so-

Beyang: Yeah.

David: ... hard to get the right instrumentation in there.

Beyang: Mm-hmm [affirmative].

David: And so, I, I don't know. And that's why I'm like, that can't be the solution to the problem.

Beyang: Mm-hmm [affirmative].

David: They have to be better ways to solve.

Beyang: Makes sense. What are your thoughts on deploying on Fridays and kind of, uh, the risk that occurs?

David: I'm of the mindset that avoid incidents, but don't avoid them. But it's like security by obscurity, right? It's the same thing. Don't deploy on Friday 'cause it might break everything. Don't break everything. Like that's a better solution than don't deploy on Friday. Like, [00:59:00] I don't know. 'Cause we actually had this problem a lot as a young company because we actually have three offices, um, one in San Francisco, one in Vienna, which is plus nine, I think, uh, and then one in Toronto, which is, you know, plus three for us or minus, I guess actually. And so they will deploy in the middle of the night and still today at 120 plus people in the company, they will sometimes take down everything. For the most part, they, they're fully staffed and they can handle it. But those les- those situations suck for both the customers and the company, but you, you have to accept them. And, and you accept them because then you say, "How do we let this never happen again?" And that's super important. That's like a super critical part of incident management, right? And I, I think that's, that's what you do. You say, "It's fine to deploy on Fridays." Don't deploy after 5:00 PM because nobody should have to like, you know, get up just because there's some'. You ... I think those rules are kind of okay, but I also think it's fair to empower people and I don't know. I'm a believer and, you know, hopefully we're all adults. You know, clearly with current events in the world, that's not [01:00:00] true. But I would like to believe that that ideal is fine to the sense where like anybody at Sentry can literall- anybody with a corporate email address in Sentry, uh, unless we change it, can deploy Sentry. And actually, I think they can deploy any application in production. Now, they can't necessarily change the application, but they can initiate the deployed lifecycle. And we say, "That's okay," because CI is responsible for saying this is safe to deploy. If it goes into master, it better be safe. And so I, I think that's really what you gotta aim for. And, and monitoring is your safety net to the point where we sort of said, our 10 year vision is like, we'd love to get monitoring to be part of testing. Like if you could Canary every change and actively use monitoring, so you, you still write tests obviously. But if you could just use real monitoring, real production monitoring, where you could diagnose those things that aren't tested, like you could do it before it ever happened to your customers. That's the ultimate solution to the problem, right?

Beyang: Interesting. So would that, like in your CI pipeline, you kind of stand up like an instance of your application with all the monitoring stuff, you'd [01:01:00] hit it with kind of, uh, a test workload and then you'd observe the, the metrics. Is that kind of the idea?

David: Yeah, that, that's like, uh, a practical way you could do it right now today, right? Like-

Beyang: Yeah, yeah.

David: ... we built this thing during a Hack Week where all it did was, it was basically a web scraper. It just tried to ... I forget what I called it, inhuman, I think. It just tried to simulate some really poor user behavior, like really dumb AI, right? And we actually ran it. And so it would inject Sentry under websites, and they're like man in the middle of the websites, and it injects Sentry, and then it would just click around and fill out forms and stuff, right? And the demo for Hack Week was kind of interesting because like some sites, it was really hard to pull off. Like, I don't know what airlines are doing, but, but for example, there's this pizza company in Canada called Pizza Pizza. And if you loaded this up on their website, it found all the bugs. Now, as a user, you load up the website and you instantly find all the bugs, it's just broken all the time. But like, I'm like, that's, uh, like, uh, that's a visible version of what the future can be. And so I think what you're talking about is, is applicable today. But I think there's a real future with technology, if we can figure out state management and databases where we can actually fully [01:02:00] Canary production applications where, you know, it's like blue, green and stuff today, right? But if, if instead it's blue, green, it's literally like blue blue, and all your customers are over here and they're also replicated over here, but that's a different version of your application, but they're not actually experiencing the results. Like that's the ideal world. But that technologically w- or like from a technology standpoint, that's obviously like very, very hard to achieve. And it wouldn't be just us responsible for that at the end of the day.

Beyang: Yeah.

David: But, you know, it might be possible in our lifetime to, to see something like that happen.

Beyang: Yeah, makes sense. You, you mentioned your kinda 10 year vision and I wanted to kind of ask you about how you arrived at that 'cause you know, in our conversations, both, you know, this one and, and earlier, I've picked up a strong ethos from you of being like super down to earth, being driven kind of by like the immediate pains that you understand. How does ... How do you derive kind of a 10 year vision out of that? Doesn't that require you to be a little bit more like head in the clouds, highfalutin a bit. So, so how did you arrive at, at what, what your vision is?

David: Yeah. [01:03:00] So, you know, an exercise you get whenever you pitch VCs is like, what's your mission statement or whatever that, that, that garbage is. Which frankly I think is a waste of time, mostly because it changes every pitch deck. So it doesn't matter. But I, I don't know. Some ... At some point, one of my friends, uh, told me to read this book, which I did not, but it's about root cause analysis. And it's something like the five whys. It's, it's by some manager. And it's like, you ask yourself why like five times to get to the root cause kind of ... that's the idea. I never do it five times, but I'm always like, "Okay, why? Why is this, this? Why is this, this?" And I use that to work backwards. And so we basically use that to like root cause what's our mission statement. And we very simply said, "Okay, what is Sentry? What's monitoring?" Okay, monitoring as part of the application life cycle development, life cycle. Okay, development is like s- an efficiency thing. And then we're like, "Actually like what software for?" Well, this is the simplest way to reason out. Software only exists for productivity, there's nothing else about it. And so we started there. And then we said, "Okay, if that's [01:04:00] true," which we just have to agree, it's true 'cause it's hard to argue against that, "But monitoring must now only exists for productivity because it's only there to enable software development at the end of the day." And so we were like, "Well, how would we get monitoring to make us more productive?" Well, the tactical version is you ship bugs in production, you can never stop that. And so we make it faster to remediate those bugs. And then the better version would be, we well like, "What if we could prevent those bugs from happening?" Because that's a much better like outcome in life. And so that's really where we ended up at like, is there a real way we could get there that somebody that's technical and well-educated, well-informed could believe in? And I truly believe that we can get there via what I'm saying. And, and, uh, it's not that much of a reach, it's just impossibly hard these days, But I'm very confident that there's at least, you know, a few companies in the world that could do what I'm talking about right now today. And that's super exciting. So, but that yeah, so we just kind of worked there. And I actually think that was important because it, it helped guide us. Because if you think about your goal is that, well, I want to be able to put monitoring into [01:05:00] CI and then identify the bugs that the tests are not catching. Well, you first need to be able to identify all the problems, and that's fundamental to monitoring, right? And so you can sort of do those goals in parallel, you can do one before the other kind of thing. But that's how we reason about it. So it, it was just like this forced mental exercise, which is frankly really, really good, but it's also a little bit abstract in that it's not like we're actually working towards that right now, even though we could say-

Beyang: Yeah.

David: ... we sort of are, right?

Beyang: Yeah, I was, I was ... My next question was going to be practically speaking, you know, how do you use that vision and, and for like planning purposes? It, it ... Does it just serve as kind of a backdrop or do you actually find yourself saying like, "Well, you know, we should prioritize this over the other thing this quarter, because that brings us a step closer to, to our vision"?

David: I only use it from the point of view to help people understand what we could achieve. Like what is a big picture item, right? When we think about planning, we have much more core, uh, tenants that we follow. Um, so for example, when [01:06:00] it's front end first, like we're very hyperfocused on, like we must solve for the front end customers first. You know, market share is one of our core tenants. We call every developer, it's part of our values. We actually have a bunch of values that are not specifically about people. We have one that's about people and then everything else is just about how we operate. So one is that Sentry is for every developer, we're here to enable every developer. And that can mean many things, but the easiest way to reason about is like, we want to solve the problem for every kind of developer in the world. And so that dictates a lot of like, is this guiding that? And, and that's kind of how we've reasoned about it. But it's very easy to say, well, like, "We need to be able to monitor for more problems because we can't monitor for the problems." We can't bring monitoring earlier into the life cycle, it's just useless at that point, right?

Beyang: Yeah.

David: But we never actually talk about the big grand vision other than like, "Hey, you're the new hire, here's a crazy idea," kind of thing.

Beyang: Yeah. You, you mentioned that Sentry is, is for every developer. You know, I talked to a lot of developer tool creators and one recurring problem I [01:07:00] hear about is, you know, let, let's say you sell into an organization and the people who bring you in are, you know, super gung ho about your product. But then there's kind of like the remaining developers at the company who may have never heard of you or, you know, don't know what you're all about. Is that a challenge for a Sentry just like spreading awareness inside your customers? And if so, how do you spread the word?

David: It definitely is a challenge. This is something we're iterating on. So Sentry has always been product led and that's frankly, because I didn't know how to do anything else. And so we're starting to be a little bit more on the sales and marketing efforts. And I say that because our version of, of that story that's successful is like some of it's going to be organic, it's going to be product led. But a lot of it is like bigger companies, and, and those bigger companies value different things. And so for example, we've started trying to like offer our customers sort of more hands on learning, which is like, "Hey, we're happy ..." Well, well, like I mean, I guess not in person anymore, but it's like, "Well, come in. We're happy to sit down. We can talk a little bit about roadmap, we can show you a little bit more how Sentry can work in these different scenarios, train some folks on it." [01:08:00] I think that's been a really good way. You know, that's a very, I don't wanna say enterprisey, but it's more of, uh, a high touch solution to the problem, which is often expensive. On the other side, I don't know, like it's tricky. And I think it's especially tricky with developers because we're all somewhat empowered to choose our own technology these days, uh, especially because we make a lot of it affordable and people can just swipe the card, which is great. But we're also all empowered to build our own stuff half of the time, which also creates some, some dilemmas. And so I, I think there's still a lot of fragmentation. Like we're definitely the, the market leader in our space by any number you would look at, and it's by a large margin. But what you'll find is even in some of our accounts, some of our competitors are in there in like small little silos. And I'm just like, "But why? Like, why would this be possible?" And you know, sometimes our competitors do a good job, but there's other competitors that are just like, like below the bar in every way you would measure them from like, uh, product usability, from price point, all this stuff. And I'm like, "Why wouldn't you just consolidate? Like why would you not just put it all on ours?" And, [01:09:00] and so to some degree, I think building a developer, uh, business, you just sort of have to accept that it's probably a war of attrition. Like you just have to accept that over time, you're going to win developers and lose developers. And there's no real change in that because people always want to explore new tools and we let them, and there's always going to be changes and developers are also like always interested in new technology. So, so I don't know. It's a tricky problem. I'll let you know once we've ... we solve it. But, I think it's just-

Beyang: Yeah, definitely. And I, I, I think there's also like a strong culture among developers of kind of preserving that freedom of choice, right? Like any tool that tries to lock you in, if people you doing that, that's, that's a huge negative, I think.

David: Yeah. Now, I, I, I fully agree. And, and even the people like, frankly, most people should not care that Sentry is open source ish. Like, it doesn't really matter to them, they're never going to run it themselves. But we hear like many times it's like, "Well, we picked Sentry because it's open source." I'm like, "Yeah, but you're using our assessors, why do you care?" And it's like, "Well, like maybe one day, if like your company shut [01:10:00] down, we can then run Sentry ourself." I'm like, "Well, it's never going to happen, so why do you care?" Like, like it's, it's almost unreasonable the ... why they make the choice. But at the same time, it is like, it's an important part of it all.

Beyang: Yeah. So yeah, most great tools that I'm aware of, especially developer tools, are ones that you kind of build for yourself. Uh, you solve your own use case, scratch your own itch first. As Sentry has grown as a business, do you find yourself having to balance building for yourself with also listening to feedback from users who might not have the same opinions or view things as you? And, uh, if so, how do you do that?

David: I would say yes and no. In the sense that I will outright tell you that we're not going to do something because it's ... I'll probably be nice about it, but generally speaking, it's a bad idea.

Beyang: Mm-hmm [affirmative].

David: I will still do that today. Like if you asked me if we're going to build something, I'll say no, yeah.

Beyang: What, what are ... Can you name some ... Are there any, any examples that come to mind of things that you said no to that people had asked for?

David: Um, I'm trying to think of something in recent history. I mean, I did [01:11:00] just tell everybody we're not building anything to solve China. That's a little bit different of a problem, but it's-

Beyang: Oh, like a Chinese market.

David: Yeah. I'm like, I don't care if [laughs] there's a bunch of customers asking for us to support China, it's not happening kind of thing. It doesn't matter if the customers like sort of requesting or demanding a feature, that's not where our focus is right now. And we are big believers in that, that idea of focus, right? I would say we bend a little bit. So we have a team that ... you know, we have a ... it's a collective of teams, so one is enterprise and they're just there to solve these sort of edge cases for our largest customers. And that is very much traditional product. But most of Sentry's investment is R&D, which I am hyper opinionated that R&D has to be opinionated. It can not be like a traditional product management, uh, scheme where you're like, "Customer, tell me what you want? Okay, I'm going to sort of build that for you." It's gotta be like, "I have an opinion on what the right thing to do is, and we're going to build that and we're going to prove that works or we're going to prove that does it." And that's a ve- that's a very particular way of doing things, but it's a way I believe in when you're ... if you're ever going to do something serious and that actually innovates, [01:12:00] right? So, so I'd say we have a mix of both, but, but yeah, generally speaking, we're fairly open with our community. Like one thing I really valued early on Sentry is we used to GitHub for all of this kind of stuff. And unfortunately we moved away from that, but where like people could ask about stuff, we'd publish it on there, we, we would respond to folks questions on there. Like it was kind of like all out in the open, our roadmap and development. And, and I know GitLab does this, but when it was on a single repo, it just seemed ... it was very tangible and easy to reason about to the point where like, if a customer opened a feature request ticket through like Zendesk or a bug report, we would just open, uh, a ticket on GitHub for them and link them that ticket. Or we'd open ... even better, we'd open the pull request and we'd link them to the pull request that fixed it. And that was such a great solution, but it was almost always ... I mean, it's, it's, it's almost always just been like, "No, we're not doing that." Actually, I can remember a bunch of times now, [laughs] like there was a big company that asked us to implement some weird analytics in the UI and I'm like, "No, like why would we do that?" [laughing] And they probably would have paid a [01:13:00] lot of money, but it was okay.

Beyang: Yeah, yeah. It makes sense. So you, you gotta like factor other feed- feedback into account, but at the end of the day, you gotta, you gotta be, be opinionated 'cause that, that kind of ... You, you know, it's hard to be focused without being opinionated, I guess.

David: Yeah.

Beyang: Cool. Are there any other tools or technologies that you're excited about on the horizon?

David: What am I excited about? I, I don't even know. I have a hard time keeping up these days. I've really enjoyed the modern iterations of JavaScript, barring the complexity and stuff. Like I hack on a bunch of things in my spare time. I just recently picked up TypeScript, which I had mixed feelings about. But I haven't really used many tools in recent history. I love GitHub actions. I also love that GitHub does not charge us when they probably should. So hopefully they don't fix that.

Beyang: Free compute [laughs].

David: Yeah. But, uh, that's probably the only product I've used in recent history that I'm like, "Oh." But it, it's not perfect by any means yet, but-

Beyang: Yeah, yeah.

David: ... but it's also a soft spot for me like CI and CD. Yeah, I don't know. [01:14:00] Like honestly most of my days are spent like around my calendar now and, and, [laughs] and sort of planning and communications and stuff-

Beyang: Yeah, yeah.

David: ... unfortunately. But yeah, I don't know, uh, a lot of my personal time has been like messing around with like smart home type stuff and just toying with technology again, so ...

Beyang: Awesome. So for those listening, who, you know, they're, they're interested and they want to try out Sentry, how would you recommend people get started with it?

David: Go to the website, click sign up, and do what it tells me. Hopefully as long as you can deploy quickly, you'll be up in a few minutes.

Beyang: My guest today has been David Cramer. David, thanks for being on the podcast.

David: Thanks again for having me.

Episode 1: David Cramer, creator of Sentry

Show Notes

Transcript

Start using Sourcegraph on your own code