3.3 retrospective

“Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.” –Norm Kerth, Project Retrospectives: A Handbook for Team Review

Action items not completed from last retrospective: * Nick document agency, ownership, expectations of engineering owners * Nick create checklist that product delivers to eng. Eng responsibility is to verify all requirements are met. * Stephen update PR template to remind devs to update docs. * Quinn to figure out how to document user flows and owners of those user flows better.

Action items: * Beyang to submit proposal for code ownership * Beyang to turn on “Require review” on ALL repositories * Quinn: When Quinn sends PRs, attach a line clarifying priority (“This should be reviewed as if it came from any other team-member” or “This is a priority because XYZ”), so there is not implicitly assumed incorrect priority. * Felix: to add to iteration plan that supporting experimentation in browser ext. Is a goal.

Retro retro: * Last issue took a lot of time and didn’t leave time for many other things. * Dove too far into specifics of browser extension, should’ve stuck to details that are relevant to broader team. * Vote on topics before the meeting. * Do NOT allow questions to go over 10m, period.

Beyang (Stephen+1, Farhan+2, Beyang+2, ijt+1,Felix+1) == 7 * https://github.com/sourcegraph/zoekt/pull/1 had 3 reviewers across 3 timezones. I think it could have been reviewed by just one person (the code owner). To me, this suggests we need clearer code ownership. We currently have the .github/CODEOWNERS file, but that is very coarse and rarely updated (e.g., if a new file is added, I’d rarely think to update the CODEOWNERS file). * Proposal 1: Move to per-file ownership. Every file should have at least 1 owner. The rule is that >= 1 of the owners must approve the change to the file. * Proposal 2: Use OWNERS (examples: 1, 2, 3) files in directories. This means each directory would have a set of owners. >= 1 of the owners must approve each change to that directory. I believe Google uses this internally.(b)©(d)(e) * Follow-up to the previous point, here is another PR with multiple reviewers: https://github.com/sourcegraph/sourcegraph/pull/3331. Here we might actually want multiple reviewers (one for product, one for being the author for the specific code, one for the owner of the backend code, one for a close collaborator on the feature), but it’s unclear what the division of responsibility is. * Proposal: * Each PR needs the following approvals:(f)(g)(h) * Approval from product if the change is user-facing * Approval from the owner(s) of the code * Everyone else is advisory, unless delegated the approval responsibility by either product or code owner * The person who opens the PR should designate explicitly which type of review each reviewer should do (code quality, architecture, product) * Here’s another PR https://github.com/sourcegraph/sourcegraph/pull/3262 where the person currently listed as code owner (Beyang) isn’t the best person to review, but it’s also unclear who the ultimate approval must come from. * Here’s a PR where I was added as a “code owner” but never actually reviewed the code: https://github.com/sourcegraph/sourcegraph/pull/3366. I feel responsible for the code in question (activation flow) and would describe myself as “owner”. It’s unclear how to review after merge, since the Percy link doesn’t show up anymore.

Felix: * Lots of PRs were merged without code review despite we had established consensus that we need to do more code review (Felix+1, Tomás +1, Nick +1, Stephen+1, Farhan +1, ijt+1) == 6: * https://github.com/pulls?utf8=%E2%9C%93&q=is%3Apr+org%3Asourcegraph+is%3Amerged+-review%3Aapproved+-author%3Aapp%2Frenovate+ * A non-zero amount of these contained bugs that could have been caught in code review * In the case of extensions (including code intel), these are even directly shipped to customers, with no way for them to rollback * A non-zero amount of these didn’t even have unit tests (making matters worse) * In a non-zero amount of these the implementation didn’t implement the feature as specced and resulted in instant negative feedback from our team that caused the entire feature to get removed again, which may not have happened if the implementation had been ensured to match the feature spec before merging * Example: https://github.com/sourcegraph/sourcegraph-go/pull/55 https://github.com/sourcegraph/sourcegraph-typescript/pull/140#discussion_r270624520 https://sourcegraph.slack.com/archives/C07KZF47K/p1553879931025500 * Multiple new features (completion, text decoration) were shipped to the browser extension (with refactoring done to enable them) that caused bugs (broke codeintel) and/or lacked test coverage. (Felix+1, Loic+1, SQS +1, Beyang +1, Stephen+1) == 5 * This undermines the process Loic & I are doing to make the browser extension more robust, because the overall bug count, stability and test coverage of the browser extension does not go down (we fix, refactor and test on one end, and on the other features get added that add more complexity/cause more bugs). * We were then busy fire-fighting the new bugs like a game of whack-a-mole instead of being able to focus on our iteration goal of long-term stability improvements. It also took time to thoroughly review all these PRs (which is needed so Loic and I truly own and understand all the code) * Even if a feature does not end up being the cause for a bug, it slows down debugging because any of the new features could be the potential source for the bug * New features collide with planned refactors (e.g. https://github.com/sourcegraph/sourcegraph/pull/3316#pullrequestreview-224369194) * Example of a bug: https://github.com/sourcegraph/sourcegraph/issues/3458 caused by https://github.com/sourcegraph/sourcegraph/pull/3322, which was done as part of the mentioned features * Note most of these points are valid for feature-flagged experiments


  • Our release cadence (once a month) seems just long enough that missing a release means delaying a feature by quite a while. This brings added pressure every month around release time. I don’t think we are fully realising the benefits of a “release train” due to it. Maybe we should experiment with a shorter release cycle (2 weeks), or a 3.x-alpha which is cut from master 2 weeks into a cycle (with much less QA on our part). (Tomás +2, Farhan +1, Nick +1) == 4

Issac Trotts: * I’d like us to experiment with Google’s approach of not publicly making predictions about what will be released and when announcing(i)(j) things before they are ready. We can ship whatever we have at the time of a release and announce the things we finished. It could reduce the overhead and stress that come from saying publicly what we expect to get done for each release despite uncertainty about how long things will take and what will preempt our work. We can still have accountability without trying to predict the future. (Tomás +1, Farhan+1, ijt+1, Nick+1) == 4

Stephen * We haven’t followed up on one of our last retrospective’s action items yet, which is to assign owners of user flow. I was lucky to catch a few such regressions as release captain, but I fear the regressions in non-core user flows that nobody has run through to catch.

Keegan * Team report slides on the team meeting seem to be status updates at quite fine granularity, and most teams seem to just list per person work. I would like it if the status updates where high level (goal oriented) and cohesive across the whole team. (Felix+1, ijt+1, Nick +1) == 3

Tomás * Documentation ownership. How do we keep the docs consistent? Who owns them? Do we need Dev Rel and/or Product always involved to sign off? (SQS +1) == 5 * Felix: “Docs are not as extensive & consistent as they could be, different owners across code intelligence extension docs” (Stephen+2 SQS+1) (Tomás +1) * This article from CockroachDB made me realise we could benefit from adopting Calendar Versioning instead of the kind-of-semantic-versioning we currently have. * https://www.cockroachlabs.com/blog/calendar-versioning/ * https://calver.org/

Felix * * Test coverage didn’t improve enough. Tiny change overall on https://codecov.io/gh/sourcegraph/sourcegraph, other repos that contain core features are still completely without tests: https://codecov.io/gh/sourcegraph (Stephen+1, Vanesa+1)

Chris * Unsure what level of team consensus is required before moving forward with implementation (code nav UX) (Beyang+1, SQS +1, Nick+2) == 4

(a)[email protected] FYI here is the recording (b)Do we need this much process/structure? I would be opposed to a required review. But the low-friction approach PRs have were the OWNER is added sounds good to me. We should make our github owners file easier to understand and be purely directory based.

Additionally the PR you point out is the first ever PR in our fork of Zoekt. I’m not sure we should make big changes from that example, since going forward it will likely be better.

And one more point :) Search has a bit of a handover period => more reviewers. We should have a review from someone on the search team, and sometimes me due to extra context I have. ©The questions in my mind in this case are: - Was it necessary to get reviews / approval from all three of us upfront? - Was it necessary for all reviewers to review the code as if they were the code owner? Who actually is the owner of this code / is it clear to the reviewers who that is (this affects level of scrutiny)?

Not trying to add more process, but to add more signal so that reviewers and reviewees can spend more time on useful work. E.g., in this case it seems like Isaac could’ve gotten most of the useful feedback from either Keegan or Stephen, at least for the first round of review. I see the point that in this case more reviewers were warranted due to knowledge transfer / handoff, but (1) was it useful to have all of us review upfront and (2) when will we know that the handoff has fully taken place (I.e., when Isaac should become primary reviewer of this code, rather than Keegan + Stephen).

Re: OWNERS, even if we changed the structure of the .github/OWNERS, I would find it easy to forget to update that file. (d)> Not trying to add more process, but to add more signal so that reviewers and reviewees can spend more time on useful work.

That sounds good. I don’t see how required review from OWNERS solves that though. OWNERS being added sounds good to me though.

Re: OWNERS, even if we changed the structure of the .github/OWNERS, I would find it easy to forget to update that file.

Maybe we could add a CI check which ensures all files have an OWNER? (e)> That sounds good. I don’t see how required review from OWNERS solves that though. OWNERS being added sounds good to me though.

Seems reasonable to me. I have seen one PR in recent memory that was merged where I thought that I (code owner) should have reviewed beforehand. I’m fine with tolerating that, though. I think [email protected] might have thoughts here, too?

Maybe we could add a CI check which ensures all files have an OWNER?

We’d have to change the .github/OWNERS file to remove the “**” lines that exist now. But if we did that, this seems like a reasonable check… though it is a bit annoying to remember to have to do this. It would be nice if we could automate this so the OWNERS file gets automatically updated when a new file is created. (f)I’m trying to understand what went wrong here? Is it just slowness of getting things reviewed, or too many people involved? Usually when I see proposals to require reviews by owners I expect it is a reaction to us breaking something. (g)I think I didn’t properly communicate intent here. It’s actually not to add more approval process, but to remove unnecessary review work with goal of increasing throughout. By making it clear who the code owner is, it makes it clear who the primary reviewer is, and that allows other reviewers to not spend as much time on a review while still trusting that the code will still be reviewed thoroughly by someone. It also cuts down on PR response time by eliminating redundant reviews. (h)In particular, when I go through PR notifs, I’ll often see changes that look like they should be reviewed carefully. Then I need to answer the following: - Should I be giving this a thorough review (takes more time)? Or can I trust that another reviewer will do it? - If I don’t give this a thorough review, will I be on the hook down the road for fixing bugs in this code? If not me, who’s responsible? Who does the code quality buck stop with on this code? - If someone else has already reviewed, should I trust their review or should I give it additional review? Again, who’s the code owner here? If it’s me, I’d be more inclined to double check. If it’s not, I’ll trust that the owner will take care of ensuring the code they own is good.

It’d be nice if I could answer these questions at a glance. I believe I could if I knew the answer to, “Who owns this code” for any line of code in our codebase. (i)[email protected] Can you clarify what behavior we are doing today that you consider to be “announcing” (and therefore we should stop doing)? Assigned to Issac Trotts (j)I mean the tracking issues. For example https://github.com/sourcegraph/sourcegraph/issues/2740. Over half the items are crossed off, and that’s not necessarily because we bit off more than we could chew, but because something more urgent came up.