Michael Stapelberg shares with us a multitude of experiences and contributions across the Go and Linux open-source communities. Highlights include creating the popular window manager i3, building Debian Code Search, and researching fast package management for Linux with distri. Thorsten Ball, author of Writing a Compiler in Go and Writing an Interpreter in Go, joins. The three of us talk about the importance of developer experience to open-source communities, how code search changes how you work, and how to decide when to build something new.
i3 window manager: https://i3wm.org
wmii, inspiration for i3: https://wiki.archlinux.org/index.php/Wmii
Wayland vs X11: https://www.secjuice.com/wayland-vs-xorg
Meson build system (vs Autotools): https://mesonbuild.com/Comparisons.html
Debian Code Search: https://codesearch.debian.net
Google Code Search: https://en.wikipedia.org/wiki/Google_Code_Search
Russ Cox blog post on Google Code Search: https://swtch.com/~rsc/regexp/regexp4.html
Planet Debian: https://planet.debian.org/
NVMe SSDs: https://en.wikipedia.org/wiki/M.2
Russ Cox blog post on why SAT solving is hard: https://research.swtch.com/version-sat
pdiffs and why they should be disabled by default: https://debian-administration.org/article/439/Avoiding_slow_package_updates_with_package_diffs, https://people.debian.org/~stapelberg/2013/11/27/pdiffs.html
Project Atomic: https://www.projectatomic.io
Distri mailing list: https://www.freelists.org/list/distri
Linux From Scratch: http://www.linuxfromscratch.org
This transcript was generated using auto-transcription software and the source can be edited here.
Beyang: all right. I'm here with my colleagues Thorsten Ball, and we are joined by Michael Stapelberg creator of the i3 window manager, Debian code search, and many, many more opensource tools in the Go and Linux communities. Michael, welcome to the show.
Michael: Thank you. Thank you for having me here.
Beyang: so, you know, we have a lot to cover, in, in the next hour, just because you're very prolific, in terms of your, work. but before we get into all of that, I always like to start things off on kind of a personal note and ask people, what was kind of your earliest memory as a programmer, if you think back to the very beginnings.
Michael: Yeah, I'm not sure if I would call it like. you know, the earliest memory as a program because I wouldn't consider myself a programmer back then. But the earliest memory that I have that relates to programming is that I was using one of these, teaching computers, which were kind of popular in the nineties.
and they were running basic, right. Because that was, It's supposed to be approachable and understandable. So yeah, earliest memory I have of something that is programming is I read this program in like illiterate booklet. Right. I was typing it into the computer and what it was doing is it was converting Celsius temperatures to foreign high temperatures.
But at the time I didn't understand what Fahrenheit could possibly be or why temperature would have different units to be expressed in. Right. So, yeah, it was entirely unclear to me what this program could possibly be doing. But still I was interested enough to type it in and run it with a couple of examples values.
so that is my earliest memory,
Beyang: That's awesome. How old were you when you were doing that?
Michael: I think about nine years old or so.
Thorsten: That's that's usually when you need to convert between Celsius and Fahrenheit, right.
Michael: Yeah, exactly. Yeah. Some people only need to do it as a teenager, but for some it's tough. It's a little early.
Beyang: And, and from there, was it like, did you just continue programming after that first experience? Or was there a, you know,
Michael: No, I'm.
Michael: That was distinctly a gap. which is why I was mentioning that I wouldn't call this like the start of my programming career. Right. Because as I mentioned, like, I didn't know what was going on, neither on the actual, like I would say in quotes, business, logic level logic, probably nor on the actual language level or anything else.
Right. So, so, I typed it in, I was confused. I stopped caring for a couple of years. and then eventually I got back into it. cause I was helping out like, the local youth outreach community. they needed some computers and I could help them out. And then it kind of spiraled from there.
Beyang: That's awesome. So, you know, the thing I want to start with, kind of selfishly is, the window manager, which you created, and I use a, it's a window manager for Linux. It's a tiling window manager. and I guess to kick things off for, for those people who might not know what a tiling window manager is, could you explain, you know, how tiling window manager is different from kind of, you know, the, I guess like regular window managers that ship with the standard Ubuntu distribution or Mac iOS, or, you know, most kind of consumer oriented operating systems.
Michael: Yeah, for sure. I think first of all, we need to establish what is a window mender at all before we can't get into the nuance of what is a tiling window manger, because many people will not be familiar with the concept like windows, users, and macro as users on these platforms, you don't have the option to easily exchange a window manager, but on Linux things are split up between the X server, which does the actual visual rendering.
You know what you have on screen, and the window Menger which determines where your windows are actually placed and how big they are and crucially, what sort of interface you have to interact with them. So a typical window manager actions that every one of us is familiar with is maximizing a window.
Or closing a window using the X button or pressing a keyboard shortcuts so that you either close the window or you switch between windows, like all tap, or you know, things like that. The window manager I'm on Linux. Now the tiling window manager is a specific variant of it. And the tiling refers to your windows being arranged, like tiles on your screen.
So all of the available space that you have in terms of available pixels are divided up. So if you would just open a single window using , it will be like full screen light. It will span the full screen. Right. and then as soon as you create another one, A screen will split into two halves, but, all of the space that you have is always in use.
So it's like very efficient. And in fact, tiling, Wyndham, and just general are sort of code for like, you know, minimal window mentors that target a very specific demographic. Right. It's really, on the website, we say. I three is targeting advanced users and programmers. Right. we're not trying to convert like the everyday user, like a casual user of computers.
That would be better for them because it isn't. But for people who do like a lot of window manipulation, and to just want something that gets out of the way and as minimal, I three is a great window manger choice.
Beyang: Yeah, it makes sense that if you're a programmer, who's got, you know, a bunch of terminal windows, open editor, windows, a bunch of, you know, a web browser, a windows and various other things open all the time. And you're tired of kind of like moving your mouse. Finding like the coroner to like drag something and you just want something more automated and, in a lot of cases, more keyboard driven.
Michael: I mean, especially, you know, different people have different, but what I kinda, observed sometimes, especially when people are not so familiar yet with the ecosystem is that they just have multiple windows open in their day to day development workflow. Right. So when they make a change in their editor, maybe they switch to a terminal window, they run a bill, come out there and then they switched to a browser window and then they actually reload the page in there.
and then. You know, for every chain, for every iteration that they do, they repeat this, they switch between these three windows. Right? If you do three windows in row already, you can see that a keyboard shortcut might be a good idea,
Beyang: Yeah. I mean, I'm at the point now where I, when I do a screen share with someone else and they're using, like a non tiling window manager, it just it's slightly painful for me to watch. Cause it's like every, you know, second spent like moving the mouse and dragging. I was like, ah,
Michael: Yeah. You really do get used to it.
Beyang: Yeah. so, you know, you mentioned, you know, Linux has this ecosystem of tiling window managers. There are several others. I think that, predate, as well. dwm, awesome window manager X monad. can you talk about your motivations for creating and what kind of sets it apart from the other window manager is out there for Lennox.
Michael: Right. I think as it often is when you are frustrated with a program and you decide to set out and write a, another program, a replacement program, or write, you know, an improvement on this, maybe you don't actually survey the entire space first. Right? So in my case, if you asked me when I started out D three project, like, how does it compare to.
let's say awesome. Or ex Mona, I would be like, I've never tried these, right. I don't know how this compares. but in retrospect it's fairly clear and, there is actually a legitimate niche for it, which has also proven by its popularity. so the specific motivation that we had was we were actually using a different window mentor at the time, which was WMI.
Right. And if you have used or seen WMI, you will instantly recognize that is visually similar to it. for example, the stacking layout or stack layout is very, very distinctive visually, right? You can recognize it anywhere. And then WMI, there's a very similar idea though. It's a little bit different in nuances.
and that is really like the immediate. A push over the edge. So to say that was the impetus of is that we were unhappy with a nuance of how WMI had changed between versions. and we decided, well, maybe it's time to finally change the code a little bit, send some patches, change all of these things that have been bugging us for a while.
And like now this is really, you know, enough has, has, has changed that it really makes sense. So we tried digging into the code, and we didn't have the best experience. So, this was partly because, you know, in the, in the X 11 space, there are many different concepts. It's a very old space. documentation might be, you know, from the 19 hundreds.
which, you know, it's, it sounds funny if you grew up during that time, but now that's like, you know, 20, 30 years ago. so yeah, this is all pretty dated stuff. for example, they have their own string and coatings and things that are nowadays I'm thinkable, right.
Thorsten: I want to tack on and ask a follow up question to something you said, you said, when do you get frustrated with the tool and you decide to build a new one? You don't serve it. The whole landscape. I am. I'm not bad at getting frustrated with tools, but I am bad at finding the right spot to say, now's the right time to build something.
Right. I I'm often, like, you know, you get a new, let's say a little script or shall ideas, whatever. And then you realize, Oh, I should've done this, you know, years ago, you know? is there any, you know, how do you approach this? How do you, when do you decide. Okay. this looks like fun, but here's a legitimate
Thorsten: This is what I need to solve.
Michael: This is a, this is a great question. And there's the famous XKCD comic where, the comic compares, how much time you spent on writing the automation versus how much time you save using the automation. and usually there is like a very clear point and then many people go like exactly the wrong way.
yeah, this is easy to happen, to, or to have happen. I think personally there's a mixture of different factors, right? Like you can say that, You know, maybe you've already invested some time surveying the landscape a little bit, which would probably be a good choice to begin with if you have any sort of frustration at all.
Right. But just get aware of the different possibilities. Maybe you've asked people, maybe you've exhausted the easy ways. Right. And then the other factor that plays a huge role in. What I want to do is my motivation to actually work in that space. Right. for example, during the development of five, three, I have actually created my own X 11 visualization tool, which was a terrible choice.
Right. In terms of, time being spent is exactly what a friend of mine has told me when I started that project. Right. Like I was talking to him and I was like, So I'm considering doing this visualization project and he was like, that sounds like a terrible idea. I think you're gonna, you're gonna regret investing the time.
Right. Which is exactly the question you're asking. and, but it turned out that was actually a great idea because for visualization, I think it inherently has a benefit, which is that you see things in a different way that you haven't seen them before. And that is just such a huge game changer so often.
so I had like, in this specific example of the visualization tool, what I did was I set myself a time limit. And I said, you know, if, if on this Saturday and maybe on the Sunday as well, I can get as far as milestone XYZ, I'm gonna keep. At least, you know, spending a little bit of time on that tool because I think then something useful can come out of it.
And by the time I was convinced that it was worthwhile to invest a couple more days. because, you know, the, the typically that I observed with visualization tools is exactly what had happened there. Like I had a whole new way of looking at these, exchanges of data between the extrovert and the client.
Right. You can think of it kind of like Wireshark. but with a lot more X, 11 specific knowledge of, you know, how events are bundled together and what references, what and how you want to navigate between them. So it's really like a domain specific visualization tool and these payoffs so quickly. Now, circling back to I three, however, that was a different story, right?
Like for the first month of development, I couldn't even use it myself properly because it was just so unstable and unfinished. And then in the month after that, I did use it myself, but, it was still so unstable. I would crash from time to time. Right. so clearly, I have invested, so, so, so many years of my life in, into
Right. did it pay off overall? I think I had fun doing it and it was a good experience to do it. Right. and for sure, looking back, back at the time, I was like a little bit naive leave and a little bit too arrogant. Right. And then you make these choices, but it doesn't mean that they're necessarily wrong.
Right? Like what I, what I think is, as long as you have fun doing it, Maybe it's not the worst thing. Right. And then, you know, even if the cost benefit ratio isn't exactly into your favor, you know, just enjoy the process.
Thorsten: Yeah. I was going to say, you need to dare a little bit and like, try and see, like how much can I get done and, you know, set yourself a limit, a time box, and then say, how, how much would it take to actually automate this or to improve this? And then, you know, just, they're basically
Beyang: For me as well, you know, that, that fun aspect of it, which also kind of ties into. You know, how much am I going to learn from this? And will the knowledge I gain from doing this, like, you know, pay off and in the future. I also factor that into ROI calculation. Oftentimes if, if the like kind of naive ROI calculation, doesn't justify it, I need something.
I really want to do it anyways. I need something to push it over the edge.
Michael: Yeah, for sure. And there's various ways to like fudge it like that. But circling back to the origin story, this was exactly the problem that we had when we wanted to work on WMI. Right. It just wasn't fun. Like the code was very tense. It was not easy for us to pick up as outsiders. It was not well commented.
we didn't understand many of the abstractions and concepts, et cetera. And we tried improving it for awhile and we did get some response, like some of our patches that added documentation were accepted, but after a couple of weeks, it just showed that this is not the most fun way of tackling this problem.
And what if we could just constructively do something and brainstorm and go from there. And then it just, it was fun. So we kept doing it and then eventually we had a whole window mentor.
Beyang: And when you say we I'm like how many people were, on the project at that point.
Michael: Yeah, I think, it was, it was always two of us though. The other person is kind of a secret co-founder if you will. in that, he has mostly just, inspired me and given me ideas and questions and was just. Participating like that, in the early phases of the project. and then, at later stages, there were other people who were instrumental the development.
there was somebody who contributed bar, which is the bar at the bottom or top of your screen. That used to be a separate project. We eventually merged into . there were other core developers who have stepped up, in the years, you know, after it really took off. So to say like we started in 2009 and then.
Over the next, like one, two, three years, it was sort of, you know, word of mouth and it was spreading, but then eventually, like it really took off. and that's also when we got like, I wouldn't say a core team of like three to four people obviously changes over time. Some people get busier, some people leave the project for whatever reason.
but yeah, we've had some very good people, contribute significantly to the project.
Beyang: now you said it was, it was kind of directly inspired by w M III, but you weren't super aware of like, DWM awesome. We're gonna measure X monad at the time, but later on the distinctions between and those women managers became clear, what, what would you say are kind of like the big, first order of differences?
Michael: Yeah, for sure. So, DWM awesome. And exponent all are sort of automated tiling, whereas can be categorized as manual tiling in the sense that, in DWM and also, which is a descendant of DWM, you have these layouts and then you open more windows and then they're automatically arranged in. You know, maybe things such as a big window on the left.
Yeah. Then everything else is smaller windows on the right. Or, you know, some layers go as far as building a Fibonacci spiral out of windows. I don't think you need to go that far. but in I three, nothing like that happens. So you are responsible for moving the windows around, like for establishing your own layout, but at the same time, the layout is also more dynamic.
It's not as rigid. Right. So that's the first difference in, in sort of the look and feel, which is very important, obviously. if you're talking about a program whose only interface is that particular look and feel, but then also the other interface to program obviously is its configuration file. and the secondary resource for that is the documentation that tells you what you can do in that configuration file.
and also the community, right? Yeah. And one factor that we had heard so often from people who came from awesome or especially ex Mona and other window managers, where, yeah. It strikes me all the three that you listed are actually programmatically configurable. Right? DWM to the extent that you actually, she need to modify, to see source code and reconnect pilot.
whereas awesome is in Lua and exponent, obviously Haskell, right. But people are so tired of it. All of the needs to program and express your configuration in syntax that they're not familiar with. Like it's great. Yeah. For the Haskell community to have a window manager, you can configure it. So like, I'm sure that makes people very happy, but for people outside of that community, it just feels so awkward.
Right. So one of the big differentiators of is that the configuration is plain text. And it is understandable. and in fact, the way that we, that the only reason why I claimed that is understandable is because we have spent conscious effort between the version three, which was the initial release and the version four release, which was the only major release break that we ever did.
to revamp the whole configuration file and the way we approached it is, I was studying at the time and at the university asking, the other students, I was handing them like a printout of this. And I was like, explain to me what this does. Right. So it was handing them the conflict file example.
And they were like, Oh, so I would guess that this could, and then they would explain what the feature might do. Right. And I was like, Okay, so this one is clear enough. This one is obviously not clear. We need a better, and for this, we need a better common tier. We need to add a pointer to a more documentation there, et cetera.
and then you end up with a contract is really approachable to many people. and just that initial hurdle of, you know, you have a program. But you can't quite figure out how to customize it, to make it do what you really want. Right. That is a big hurdle. And if you can keep that down by using an approachable config file and understandable documentation, then that brings so many people on board.
Beyang: That is a really good way of kind of testing out the ease of use of your configuration language I wish or projects to took that approach. And maybe we should do that for the source graph configuration format.
Thorsten: I was going to say that was there ever explicit user testing done for other programming languages? You know, like this, here's a piece of code. Tell me what you think this will do,
Michael: Yeah. I mean, it sounds like such an obvious idea that there must have been people who have done it, but I'm also not aware of any big projects that can say that they have made decisions based on that sort of testing.
Beyang: So, I guess looking forward, for , the product seems like it's mostly in maintenance mode right now. It's, it's pretty mature. are there any kind of future features, that you're, you're looking forward to.
Michael: Yeah. I would say maintenance mode is a fair characterization, right? Like for most people, the changes between releases are not going to be significant. obviously there still active development going on, but, there's not going to be like any. No super weird changes or changes that changed the program drastically.
Right? So we are like, whenever people come and make a feature request or a request to change of any sort, one aspect that we consider is like, how much will this change? The mental model or how much mental overhead will there be to. Educate our users about this new thing, or do we deem it so important that we want to add it at all right.
Because everything we add makes it harder to understand what the program is about and what it does. It makes it less focused. so there's not going to be like new features, but there's still plenty of opportunity to like, address some sharp edges. Right. myself. the, the way I see my role is, that, you know, the, the project.
It does what I need it to do. Right? It does what many other people needed to do. The thing that it now most crucially needs to be is stable over the years until the situation stabilizes between X 11 and, and Wayland. and if ever Wayland completely replaces X 11, then we can totally retire. Right. For you.
but before that, there's always going to be somebody who has this weird environment, right? Like for me, myself, the way I work right now in the working from home situation is that I use, EMX over X, 11 forwarding over SSH. Right. So I just cannot do this using, using Wayland. and so I know that for, for the computing environment that I have, it's going to be years before it is going to be any sort of switch.
so I think, you know, there's not going to be big changes in X 11 either. So. I don't think it makes sense to sort of have big developments. There have been changes there. You can see that that's not where the attention is. So that's not where we need to spend a lot of energy either. myself, what I, what I do is I oversee the project as a whole, obviously.
I'm the person who needs to step in when there's conflicts. but also I'm the person who does things that other people don't want to do necessarily. Right. So for example, this development cycle and I have three, we are switching from auto tools to Mason. which is a change that I'm really excited about.
It fits with the theme of this show as well, because it's about developer tools. Right. and, yeah, this is sort of, you know, the, the bill system changes is worked at, it's both very opinionated, so people don't necessarily feel like they should be going in there and making changes. You know, what, what if I don't like them or whatever, like maybe, maybe I would block them in the initial years of five, three, we had explicitly said that, you know, having playmate files is a virtue.
we have since changed out approach. but now I'm the person who just does task like that. Right. Like I have my subject areas in which I'm an expert I can help out in these, if everything else sort of fails. Right. but in the day to day, I'm busy enough with my other projects.
Beyang: yeah. Actually real quick on, is it a it's out of tools to Maven? Is that the, the,
Michael: no to Mason. Mason is the, yeah. it's the gnome community, mostly preferred built tool, right. That's where it sort of got big. but it's kinda kind of pathetic in how you configure it. but it's a much more high level. Then all of the others. Right. So if you're used to either audit tools or , or a w what else have you asked cons?
I don't know. I haven't used Eskom as much. but, Mason is really like, it's more high level and much more feels like, I don't know a blazer Basel or Facebook's book or whatever other, you know, high level build tools. And do you have like that? so yeah, it knows about dependencies, but more crucially and also understands what is actually C code and what are libraries and how could they fit together. And like you no longer need to construct command lines manually that, you know, call a compiler. it's much higher level than that.
And that actually allows it to deliver features much more quickly. Right. that is one of the, one of the big distinctions that I see between languages, such as C where the development environment had stagnated for decades. and then languages such as go where they come with a feature enabled by default.
That really makes it stand apart, right? Like profiling by default debugging, symbols by default, cross combination by default all of these sorts of things, their heart and see, right. And miss on helps us. I'd go a step further than we could previously with audit tools
Beyang: makes sense. I feel like I could spend an entire hour or more talking about, and all this sort of stuff, but, I want to get also to your involvement, in the w and community in the Linux community, more broadly. so I, I guess, you know, you, you were a major contributor to, to Debian, for, for a long period of time, can you talk about how you got involved in Debbin?
Michael: Yeah, absolutely. so I was using Debian, because a friend was very familiar with it. like he, he knew the answer to every question for a long time. and he introduced me to it and then eventually. I sort of started the process of becoming a Debian developer, which is like a, a longer form of process.
I think they've shortened it significantly since then. But back in 2012, it was a long process. And then I started became a deputy developer before, or the friend who was sort of my mentor. nowadays he's also a developer. so all as well, but yeah, it was, it was popular in my bubble. and yeah.
Yeah. I liked what I was seeing back at the time, you know, as, as it goes, right. I'm not necessarily surveying everything when you get introduced into, into something. so I don't know if I had known about say fedora, would I have preferred it? I can say it. Right. It's a different timeline. So to say,
Beyang: Makes sense. And as part of your contributions to that community, you contributed a lot to the developer tools. And, I think like developer experience has been a high priority of yours. and one of the tools that kind of exemplifies, this attitude is you are the creator of Debian code search.
can you talk about what Debian code search is and you know what it lets you do as a deputy developer?
Michael: Right. So one of the things that I noticed at the time was that it was very hard to sort of get an overview that is scoped to the entirety of Devin. So it was easy enough if you had, if you were working on a specific pack package, let's say I was working on the packages. Like itself, the screen locker, the status bar, all of these sorts of packages.
If you look at them in isolation, you know, it's easy enough. or if you want to, if you want to search for, if you want to, for example, let's say you want to ensure that all of the packages that are maintained by the X 11 team. are up to a certain standard version of packaging. Right. And like, maybe you want to use a feature.
Yeah. package that depends on, something else. Right. And then how, how do you think students, right? Like how do you identify who is actually the X 11 packaging group in Debian which package just do they own? Where can I get off the sources? How can I search all of the sources? Right. so Debian code search is sort of an all in one shop for, for this sort of question.
Like if you ever wanted to search through more than just one package, did you already happen to have on your heart, this, you could just go to Debian code search and you put in your search term, possibly using a regular expression syntax, then you can yeah. All of the source code of all of the Debian packages.
So, you know, Debian being a, you know, modern enough Linux distribution, that means a lot of open source software, right? If you want to find an example of, let's say an implementation of an idea them, or if you have recently discovered that there is a security issue in your code, and you're wondering, well, if.
I made that mistake, who else has made it, and are there any high profile cases that I should know about? It's very easy to answer these questions. Once you have a search engine that is always up to date and that covers all of your packages, but even just, you know, finding packages is a thing that in.
Was not trivial, right? Because Devin values, distributed working so much. So, packages didn't necessarily need to be in a, the source code control repository at all. Right. So most developers were using gift, but Debian values that people have choice, right? So some of them were using good. Some are using subversion, some were not using any version control at all.
and then some of them were hosting it on a somewhat centralized Debian hosting site. Others were hosting it on GitHub yet others were hosting it on their personal computer. That happened to be down when you wanted to update your repository, all of this could happen. Right. And just, just getting source code excess in order to even, you know, just view the source code of packages was hard and it was hard like that in every single step of the way in anything that I wanted to do in Debian.
Beyang: makes sense. And when you created it, did you, like, how do you use the previous code search, tool prior to that? Or what was kind of the inspiration for you?
Michael: Yeah. So, there were other code search tools. The one that I was familiar with at the time was Google code search because it was very well known obviously. but they did shut it down. I think they announced a shutdown in 2010, but I'm not sure on that. and then it was still kind of working for a little while and I was like living dangerously, still using it, but then yesterday it went away.
and then luckily though, Just after or in between when it was officially deprecated, but not quite deleted or just after. And there was this blog post that I became aware of, written by Russ Cox, who was the original author of a Google code search. And he actually, he, he talked about the project, in terms of, you know, what makes it special in terms of how is it built, like the trigrams, the trigram index that makes regular expression search possible.
And he was explaining this and he was, sort of. Adding a teaching implementation or an example implementation along to go with his blog posts. And I was like, well, that's very interesting, right? Like we know that or have access to Google code search, but we could build something kind of similar if we took this and added a web front end and made it get source code from somewhere.
And then, I thought like how, how could I achieve this? Right. Because, it's pretty hard to like, Crawl all of open source. Right. And suddenly you have so many problems, right? You have like licensing issues. Like how do you audit licenses? You need to maintain a crawler. You need to not overload other sites like so many, so many operational burdens.
so I figured, well maybe actually coupling it to a Linux distribution would be good because then you get the benefit of you can not only source the up search, the upstream source code, you can also search the distribution specific bits around it. Right. So any sort of FBN, packaging, metadata or. instructions, scripts, anything like that?
You can also search. so suddenly the tool becomes much more valuable. It becomes valuable to one demographic which is interested in just code search across open source, and then is more interesting even potentially to Debby and developers.
Thorsten: So you could answer the question that you said mentioned at the beginning. I'm like, who are the maintainers of these packages? Right? Because that's also indexed
Michael: Exactly. Yeah, you can just search for, you know, if you know that the maintainer is encoded in a file called Devean slash control, you can just enter path, colon, Debby, and slash control blank. And then you say maintain or colon, dot star X 11, right? And then you find every, every package where to fields, contains X 11 in there.
Thorsten: Cool. And you said. You know, here's this problem, Google culture's going away. here's a possible solution. How was this accepted into Debian community? Because I can see multiple versions here where, Oh, a centralized servers where all of our code is index and kept in one place. That's not good, or this is fantastic.
What, what was the initial reaction?
Michael: Yeah. so I was, writing blog posts about this and publishing them on the planet Devean aggregator. so, you know, people knew about it, but it never did get like, you know, I never asked for, is this a project that we should do? Right. Like that was never even the discussion for me. The, the situation is, well, I want to do it.
I'm going to do it. are you interested in having this service? Right. And if so, like how can we. Support each other, like how can you give me the resources, for example, a domain, right? that is actually self service. So I that's the reason why it was so easy to get that domain right code search with wm.net.
the.net is where any developer can just add there. and then I was just running it on my own server. Right. for, for the initial start. so I was just happy to do, like all of, all of that work myself. Right. in the meantime I have, gathered a couple of fans, so to say in the Debby and community that have recognized the value of having Debian code search and that are using it regularly.
and I'm grateful for their feedback and support and testimonials, et cetera. and I think nowadays, like the services much more anchored into community than it was back then.
Beyang: Code search is interesting. So, you know, obviously source graph is a, is a code search engine, as well. And what we found from talking to people is there's kind of like broadly two sets of people. When you talk about code search, there are those who. you know, have used it before.
They love it. They can't live without it. And then there's those people I've never used it before. And they're like, why would I need this? You know, like what can this do that I can't do in my editor or on command line, via grab or so, you know, have you encountered, that difference in attitude and why is it that like code searches is this thing that like, you know, is.
Kind of hard to perceive the value of, but like once you have it, it just becomes like such a core part of your workflow.
Michael: Yeah. It's, it's a very interesting question. Right. and I think there are some similarities too, if you think back to when you were first introduced to the internet itself, Right. Like you, you were like, okay, so this is like the internet, but how do I, where do I navigate to write? What, what, how do I use this?
Right. And then some, at some point search engines came along and then it became much easier. You just told people, look, there's a search engine field. And then still, like, it takes a little bit of coming around, but then eventually it clicks. Right. And people are like, Oh, Oh, like, if I have this problem, I can just put a question into the field and maybe helpful answers might be among the results.
Right. So maybe it's similar to dad. And to me personally, the value of having code search is easily explained, by thinking about it in terms of removing hurdles. Right. Because without, for example, Debian code search, I would need to update my local checkout of Debian packages, or I will just not bother.
Right. Because it's a task that takes potentially hours, right? Yeah. And if I only have a couple of minutes of spare time a day and like a busy, full time job life, then, you know, There's a hurdle, right? I can no longer do this. Right. So as soon as Debian coats are just there suddenly I can. So I think it's, it's, it's that sort of game, right?
It enables a different workflow. And for people who are, who I cannot imagine that works or who haven't been injured, could the workflow, it seems strange. But then once they see it, maybe it clicks right.
Thorsten: it even be possible to keep a local checkout of all the Debian packages.
Michael: Yeah. Yeah, certainly. I mean, if you think about it, that's how we do it on the server side. Right.
Thorsten: Yeah, but how big is it? Is
Michael: Oh, sure. Yeah. it's 140 gigabytes in size. so it is totally doable on like, you know, it wasn't as doable back then. Right? Like when I started the project, it was a, it was a huge contentious point, right. Because I was, I wanted to transition the project from my own server to the Debby and servers.
and like at the time I was, I was sort of. Just a little bit too, to the Google way of doing things right. Where, you know, if you needed a terabyte of disc space for your work project at Google, you wouldn't need to ask anybody. You could just like self service, grab it, right. Like, sure. Why would we spend time arguing about this?
So I was like, I was approaching the Debian Sue's admin team and I was asking them well, for a search index, it needs to be fast. So we need to use flash storage. Right. So. how could I get flat storage? And they were like laughing at me, right? Because it was such an outlandish request. How could I dare to ask for flash storage, like flash storage, flesh, this, this was 2012, right?
Beyang: Oh, okay.
Michael: you know, in 2012 was around the time when I bought my, my own SSDs for my laptop. Right. And they came in sizes such as 128 gigs. So like, if you bought the cheapest of the cheapest, you probably couldn't have your own code, search archive. But these days, you know, with disk sizes that are like 512 gigs, plus even into cheaper laptops, it's you know, if you care about Debian stuff, you can totally have your own checkout.
Thorsten: Yeah, I'm surprised it's not, it's not dead bake, right? Like,
Michael: Yeah. It always it's different in how you scope it, right? Like if you only track, for example, there'd be an unstable, which is really only what the need for development, then it's much smaller than if you also track the other suites that are still actively maintained.
Beyang: Yeah. I think people really underestimate the impact that those kinds of friction points have on what they actually. Choose to do day to day. And it's, it's almost like, you know, when you're developing, you're kind of wandering in this wilderness and you know, the step that you take now is going to impact where you end up, you know, like five, six hours from now, even though you don't think too much about it, it's almost like, You know, like w when we were talking about source graph, especially in the early days to people who had never used code search before, they were like, well, you know, if I wanted to search over, like, my code base is small.
If it's on disc, you know, it's not as big as 120 gigs, it's probably in multiple gigs. If even that it's like a, you know, if I wanted to search it at a particular revision, I would just, you know, stash my, you know, working a state and check out that separate branch and then, you know, just use grappa or something like that.
But then, you know, you ask them like, okay, so how many times do you actually do that on a day to day basis? And it's like, well, almost never because you know, it's, it's annoying to unload your working state. You kind of have the context, which, and at the time it doesn't, it seems you're making this like local.
Calculation like is, is the, that local piece of knowledge, you know, worth the, you know, couple minutes plus context, which probably not, but you don't realize that it's going to lead you down this path of, you know, five or six hours from now. You might be, you know, because you found the answer quickly, you, you didn't waste, you know, several hours of your time, you know, writing code that wasn't necessary.
Michael: Absolutely. Yeah. You know, even when you can, you just said like, you know, you're just doing it, get stashed. You already, we lost like so many people, right? Like, because you know, many people who, when they hear get they're just immediately turned off. or even when they are accustomed to get, when they hear good stash, they're like, Oh my God.
Thorsten: Yeah. And it's also this, you don't know. Yeah. You don't know what you don't know. Right? Like if you have code search, suddenly you can have a link. like a URL that you can share with colleagues or other people that lists all of the results. Like here's a, here's a URL that tracks all of them. Two dues that we have in, I don't know, these five sub projects, whatever.
if you don't have code search, yes. You can do this. Right. You can curate a list of all the dues and, you know, write them down to Mark down somewhere. But once you have it, it's so easy to just, you know, get the URL and share that. And suddenly you don't want to go back.
Michael: absolutely. And I think you raise a great point here, which is that. You know, it's not enough to just have the one time index or to do it one time list manually. All right. There's also immense value in having code search update quickly. Right. And I've seen this so much when I was working in my current team on sort of an internal cleanup effort and we were transitioning people to use a new API instead of the older API.
Right. and you know, when you have code search that lags behind a couple of days, It's like, you have an entirely different workflow, right? Because now you need to maintain your own spreadsheets of what is where and two tackles walked. Right. Whereas if you have that link that you can just share and everybody just opens it up in the morning and goes like, Oh yeah, this is the current state.
Right. then it's much clearer and you don't need to maintain anything. And if you have that and you, you take it to the extreme. Where, if you make a change, it's immediately indexed. And this is what Debian coach tries to do, but there's a couple of hurdles wide. Isn't quite as good, but I've seen other code searches where if you submit a change within seconds, it's actually live.
and that just like it, he was so much more motivation, right. Because you can be like, Oh, so I see this problem here. You just make your change. Right. And then you can be like, and now it's actually gone. Right. I can no longer find it. Like you can be done. It's done an entirely done. Right? Like.
Thorsten: Yeah. And, and, and Desi, even another layer on top where. people recognize that code changes often and they want to basically be notified when something changes and, you know, craft customers, for example, they want to get notified when code changes can get an email or with, you know, social craft campaigns, which is what I'm working on.
They want to react to code changes and say, Hey, whenever new thing pops up here, please run this code or please do this. And.
Michael: Yeah, there's definitely interesting use cases for this sort of stuff. Right? You could say, you know, whenever there's a new user of this API symbol, we can say, Oh, this is a new user. Maybe we should see who this is, sent them an outreach email, or maybe you could track how many deprecated usages you still have over time and have a graph of it, right?
Or a monthly summary or monthly progress emails. There's so much secondary stuff that falls out of having this data programmatically available and always up to date.
Thorsten: Yeah. Yeah, that's it. That's a good, good way of phrasing it. Right. It enables a whole different thing. A whole different class of, I dunno, tools basically.
Beyang: What languages, Debian code search written.
Michael: It is written and go actually. Yeah, it was my first bigger go project where it was actually like, you know, multiple services running on multiple VMs, intermittently actually, like we started out on a single machine. We went to like a cloud deployment at Rackspace, who were, you know, thankfully hosting us, for many years.
and now we're back on like a, a big heads in the books that I just pay for myself, which actually turns out is much faster than the Rackspace cloud. We had access to it at the time.
Beyang: wow. That's crazy.
Michael: Yeah. It really makes a big difference to have like two fast SSDs in your machine. Right? Like these, very modern, NBM, two SSDs, Yeah, they're just a very different ballpark in terms of performance. Like if you don't have that, yeah, it's, it's a big step up. I'm kind of like the original introduction of SSDs, right?
When everybody's mind was collectively blown by, you know, heart tests and now there's SSDs and wow. Suddenly like random Xs is fast. It's kind of like that. And that you can do like tens of thousands of IOPS, easily, and gigabytes of writes and reads per second over these NBME to SSTs. So yeah, I'm, I'm very glad that we have them.
Beyang: and are you, are you running that, at home or are they in like some server rack warehouse somewhere?
Michael: Yeah, the Debian code search one is, one of the ones that I run on my rented server at headstone in Germany. they're like a known for their, you know, cheap cloud offerings, but also cheap dedicated servers. But, you know, if, you know, when a company starts out having cheap services, that doesn't mean that they stay cheap forever because over Overdeck years, right, as they get more customers, maybe they can actually improve their services.
And I feel like it has actually become quite decent. and I'm quite happy with the performance. but a couple of other projects of mine, I do actually, host out of my own home here, which is very nice.
Beyang: yeah, that's I was actually looking into this, not seriously, but like casually a while back. Cause you can buy like a really cheap blade servers, like, you know, secondhand blade servers, that are still functional. The warranty has expired. So like no actual company will buy them, but like,
Michael: Yeah, but then the question, the question always is, do you really want that old blade center? Right? Because they're loud, they're power expensive. I don't know if, and it's hard to get replacement hardware for them. So like, you know, personally, like I, I can see the appeal in this, right? Like I can see it, the home lab perspective and I appreciate it.
Folks are doing that sort of stuff. but for like, Yeah, the sort of day to day or the, the only one that I can personally integrate it into my day to day in a reasonable way. Is to just use standard off the shelf hardware so that if anything breaks, you know, I'll just bike over to the computer shop.
I'll just pick a new part and I just plug it in. Yeah. And you know, I only do it for fun. So old lady, not so super important services are running from my kitchen. yeah, but it's, it's pretty cool to be able to do that and that the internet allows for it. Right. So, you know, if, if you're ever like, if you're ever trying out the distri, Linux distribution of mine, then your packages will be installing right out of my kitchen.
Beyang: Yeah. Actually that's, that's a great segue into distri, which, I also wanted to talk about, so, you know, distri is, this, Lennox distribution. I believe he wrote it for kind of like exploratory research purposes initially. but with the express intent of, of trying to fix package management, is that, is that right?
Michael: Yeah. I don't know that I would, I would have the intent to actually fix package management, but I definitely want to understand why it is the way it is and how it could be done better. And I think. the, the research that I've been doing in there so far has been very fruitful, at least for my own personal understanding, but also I think in conveying the problem and the problem space to other people.
Yeah. You're you're spot on a district is indeed the way I phrase it is it is a Linux distribution to research fast package management. And, you know, it was born out of frustration, you know, as, as is apparently a common theme in, in today's show. yeah, I was frustrated with Debian, and, but also all of the other distributions.
Right. so, you know, in Debian, even if, especially if you're running it on something like a raspberry PI, where the computer is a little bit slower, everything's a little bit slower. and then you just say, Oh yeah, I just wanted to install this real quick on my respirate pen. Then five minutes later, you're still waiting for apt.
Right. And you're like, what. What w w Y right. Is, is there a, is there a reason for this? so actually I was very curious, but it was also very hard to like, get deep into a system where you don't have anybody you can ask. Right. So understanding how Debian has really built. Is really hard, like even over the years.
So I find it sometimes easier to start out Greenfield and just develop something from scratch where like, Oh, this should probably work like that. And then just plug it together. Right. And then you also know what's there and you have control over the whole thing. Right? So that's, that's the advantage of district right.
In Debian. And it takes many year projects to change anything because this is very entrenched distribution, right? Like, so, so many computers are relying on Debian changing slowly. So it changes slowly. But industry nobody's relying on it and like intentionally, so, right. So I have the freedom to change everything between any release, right.
Or to not make any release at all and just use it yeah. Privately for myself and see what I learned and see what I can build. See what I can tell other people.
Beyang: And, you know, what have you found so far? Like why is package management such a seemingly hard problem? I mean, in theory, it's, it's simple, right? You have this dependency graph, you, you know, one thing depends on another, you just walk that graph, build the trends of closure and then just fetch everything.
You know, why, why is that so hard?
Michael: that's exactly what district does and it's not so hard after all.
Beyang: It's so hard for others. Like w
Michael: Yes. Yes. See, it gets harder. The more you look at the details, right? So for sure you have the dependencies, you need to. Fetch the transitive closure of something. Right. But then like at which level do you actually do dependency resolution? Right? For example, you could say, and, and district does it like that.
Like you build a package and everything that went into this package, as it dependency is going to be persistent. And that's going to be the dependencies that you will get when you install this package on your computer. Other distributions don't necessarily do that. Like they have this sort of flexibility in there, right.
They just say, Oh yeah, this package it was built with 1.1. but it only depends on openness, right? If you have openness as L 1.5 or newer the program. Does not, or the package does not need to be changed. It will just be installed. And then it works, right? Because it's dynamically linked against whichever version you happen to have in your system.
And then the package mentor really needs to work. Right. Because now it needs to ensure that the system itself is consistent, that you cannot have conflicting versions of dependencies on your computer because there is only one user lip openness. Oh, right. or I think it's called lip SSL and lip crypto.
Not so to be precise, but you know what I mean? Right. And, and this is, this is where, you know, suddenly in the design space, we haven't steered so far, of course, of the main, right? Like everything is still roughly the same, but now we say, okay, but what if we have conflicts? And then what does that mean?
Well, now it means you need to specify which packages conflict with each other. And that means, set solving. Right. so bullions, satisfiability solving, right? So now you have this process, which. Takes very long. and, and like on Russ Cox's block, there are great articles about why sets solving is complicated and slow and how you would do it properly.
but it's still not great. Right. So, go module it's notably it doesn't do any set solving either. so, industry, one of the key insights that I have is that in fact, on the distribution level, it's not necessary either. I mean, for sure, you've got a couple of other different properties out of that.
Right. For example, if you want to push out a security, he fixed, let's say in lip SSL, you need to actually rebuild the transitive closure of dependencies. Right. So that they all take up the new version because they're all like. Pinned. So to say to it as one set of dependencies otherwise, right? So you have a little bit more effort in there.
You have a little bit more bandwidth costs, but you have an easier mental model. Overall. If you follow this model is much harder for anything to accidentally break, right?
Thorsten: you mentioned something now with, bandwidth, right. Which is a constraint. And I'm wondering, do you think the feasibility of different tradeoffs that package managers and Linux distributions make. has changed over the years and the context from which I'm asking this is, you know, When you distribute binaries or programs, you can, you know, dynamically Lincoln, other libraries, or you can statically compile the binary and put everything right.
And you know, 20, 30 years ago, statically compiling a binary was not feasible because the binary big, you need to transfer it. You couldn't save it on a floppy disk or city, whatever. And now suddenly it's like, We don't care whether the binary's 15 megabytes or, you know, 500 kilobytes, it doesn't matter, but it makes the whole distribution process much more easy, you know, to, to have a single file that we need to copy around.
And I think, you know, the hardware changes in the past decades have kind of changed the game. And I'm wondering, did that also happen in, in Linux package management?
Michael: Yeah, absolutely. I think you're absolutely right. This is also why we see Docker being so popular. Right. Because it, you know, disk space is cheap. Let's pin, everything, everything keeps working. That's nice. yeah, in, in Debian you can see this as well, but. I find it very curious, right? So one example I would like to present here is a mechanism called PDFs, which I think stands for package list, which means that instead of, you know, when you update your package lists using app update, instead of downloading the whole thing, package list file, which can be very large.
It just. Downloads dif file. Now this is a great mechanism, right? It saves bandwidth. It'd be good for users who are on low bend with connection. but at the same time, if you're on a high bent with connection, this is terrible change for you because suddenly you need to fetch so many files and then you need to apply them all.
And it's much better, or to just fetch a big file and save it then to have a small file and then many other small files and then compute like, you know, resolve all the discipline, all the dis, et cetera. so I, I had, you know, in, I think. 2017 or so? Well, maybe a little earlier. I was, I was complaining about why is Pete if enabled by default?
Right. I think it doesn't make sense given the hardware that most people are running on, but I think Debian in general is a little bit slow to even pick up changes like that. but yeah, in general, you can definitely see this in the distribution landscape. I think also as other distributions are getting more into containers, for example, fedora has like their, what's it called?
The, It was project something. I think project atomic, I believe, and silver blue is like the distribution on top of that concept or something like that. It's like all container based and cloudy, you know, that, that is another, another sign of it. The, the changing landscape, both in terms of hardware and software surroundings.
Thorsten: Yeah. Interesting. Cool.
Beyang: in our previous conversations, you mentioned that even though distri is kind of a research project at the moment, you think it can still be used, in some situations like potentially in your development environment for like reproducible dev environments or, or possibly NCI where you want to install things very quickly.
Can you talk about, you know, what that use case looks like?
Michael: Yeah, for sure. I think, you know, it's, it's suitable for this. I don't know if I would. Necessarily be recommending people actually use it like that. Right. But what I find instructive is to, if you're feeling adventurous, you could try and change your environment, like your dot Travis dot Yammel file, from just doing apt commands, where you install your dependencies, to using distri.
there's a little bit more to it, but if you're curious, you can just send an email until you district discuss mailing list. and then I can outline in more detail how I did my experiments. but what I've seen is that the installation time and hence the overall CIO runtime. Drastically falls down if you're using district because packaging installation is just so much lower overhead.
Right. And so much more parallel. And then yeah. And leveraging high bandwidth connections, which you have, I have on all of these modern CIO cloud environments. so yeah, this would totally work, right. if the software that you're interested in is available in district, I think it's mostly useful to get a feeling.
Of what sort of performance we're leaving on the table in the established distribution? That's right. If, if only they could change couple of things like a couple of low hanging fruit, but maybe also a couple of bigger architectural changes, we could be in a much different position in terms of speed of all of the Linux distributions.
Beyang: What, what goes into creating a new Linux distribution? Like how long did it take you to get to kind of a minimal viable, product point for, for distri?
Michael: Yeah. it's so a lot of this you can learn on the Linux from scratch site. Linux from scratch is really a great resource. it essentially. Walks you through how to build a Linux system. If you didn't have a package manager at all, if you only download it like, upstream sources and then you make a and make install them manually.
Right. And it, it just walks you through this process step by step. So if you wanted to build a new Linux distribution, you could just go through that guide. and then see if you could kind of automate it, right. Because that's kind of what a package manager does. you can start very, very small. you can just boot the Colonel.
That could be a first milestone. Right? how do you even place the Kindle in the heart disk or in your development environment? Such that you can start it then the next milestone might be you put the candle and actually start a small user land, maybe BizeeBox, which is like an integrated embedded solution for it is just one binary, right?
From your inner ramifies, right? Like you don't even need to have a kernel driver for your heart disk file system or, and root file system. Anything like that? Yeah. You can just like tell the kind of look here's here's the binary just run that and then you have a shell. Right. And at that point that's already a very satisfying milestone.
And then you just kind of build from there, right? Like there are a bunch of typical new tools that you need to get packaged and a couple of other parts in the typical. open source stack. for many of them you can apply shortcuts. Like maybe you don't need to build them with all of the fancy integrations that they have and all of the, extensive documentation that requires more third party tools in your like phase built closure and stuff like that.
but yeah, that's basically the process, right.
Beyang: It almost sounds like a, like a good. Like even if, you don't have a specific end use case in mind, it almost seems like a really good like educational experience to go and try and do that. Cause it will show you how all the Linux internals, kind of fit together
Michael: Absolutely. And I think that is really the approach that, you know, Linux from scratch to project once, right? Like it wants to be that guy that teaches you how Linux works by having you. Assemble your own distribution from the pieces. Right? the same argument is often cited for distributions that are more involved or more minimalistic, such as arch Linux, or general Linux.
What's a more popular one back in the day. and I think there is some truth to it. Yeah. If you, if you have the time and motivation and energy and circumstances to, you know, spend some time building your own Linux distribution, absolutely go for it.
Beyang: You know, Michael, I did not get to even like half of what I wanted to cover today. So, you know, if, if you're up for it, maybe at some point point, you can come back on the podcast to talk about, go crazy, the pure go user land for raspberry PI that you've written the Twitch streaming that you're doing your desks that up and much, much more in kind of like the go in Linux communities, as well as like.
I just want to pick your brain on like your thoughts on developer experience in general, because that seems like a theme that is, that recurs over and over again in the work that you've done.
Michael: Yeah, for sure. No, I'd be happy to come back. Yeah.
Beyang: okay, awesome. Yeah, we'll definitely do a part two. but for now, I guess, you know, for those listening, the things that we had talked about today, distri, if people want to try out distri and learn more about it, what would you recommend that they do.
Michael: Yeah, they could just check out the [email protected] or just put in distri Linux in the favorite search engine, you should find it easily. you know, or just, you can, you can reach out to me on Twitter. If you want, just to put my full name into the Twitter search field, you should find me, or find my website.
I'm sure you'll find it.
Beyang: Well, Michael, thanks so much for being on the show today.
Michael: Yeah, no problem at all. Glad you have me here.