Stacked changes: how FB and Google engineers stay unblocked and ship faster

At my last job, I found our code review process was terribly slow and set about analyzing it and coming up with solutions. I found a lot of developers were blocked on reviews. They either task switched (expensive) or babysat their review (expensive). Even though we were using Phab, few did stacked commits because of how unfamiliar the tools were which made them feel brittle.

Improving the stacked workflow is one part of the equation but I'd recommend being careful of letting that mask other problems

I identified a series of busy-waits including:

- Post, wait, CI failure, fix, post

- Post: wait for reviewer, ping reviewer, wait, find feedback was posted, fix, post

Some of the root causes included:

- Developers weren't being pinged when the ball was in their court (as author or reviewer)

- Well, some notifications happened, but review groups were so noisy no one looked

- The shared responsibility of review groups is everyone assumed someone else would look.

- The tools we were using made it hard to identify active conversations

My ideal solution:

- Auto-assign reviewers from a review group, allowing shared responsibility of the code base while keeping individual accountability of reviews

- Review tools that ping the person the review is blocked on when they are ready to act

- Review tools that give a reminder after a set period of time

- Azure DevOps' tracking of conversation state with filtering (and toggling between original code and the "fixed" version)

- Auto-merge-on-green-checks (including reviewer sign off)

With this, responses are timely without people polling the service, letting machines micro-manage the process rather than humans.

zdw · 4 years ago

Gerrit does many of the "ideal solution" items, as it replicates the internal Google workflow.

It's a code review system with browsing as an afterthought, as opposed to GitHub which is a code browser with review as an afterthought.

bialpio · 4 years ago

Gerrit is nice, but to me feels a bit dated, UX-wise. Before joining Google, I used Azure DevOps and I feel that it was easier to review code over there (less jumping around the page). At first I thought it's a matter of (in)familiarity with the tool, but I still feel like I'm fighting with it after 3 years of use...

tomasreimers · 4 years ago

Totally totally totally agree. I think a lot of the slow down comes from not understanding when the ball is in your court. And Github doesn't prioritize this b/c I think this problem is less prevalent in open source, where there is usually no expectation of a reasonable time until review.

A lot of what you described is exactly what we've built / we're planning for the code review portion of the site - you should sign up and check it out :)

SideburnsOfDoom · 4 years ago

Agreed 100%

Making it easier to stack PRs feels like entirely the wrong direction - "when you're in a hole, stop digging" - stacking is giving the people in the hole better shovels to keep digging with, instead of getting PRs flowing through more easily so that stacking is not needed so much.

epage · 4 years ago

Github has been improving on this. I felt like they were rolling out the features I wanted right as I was enumerating them!

They have

- Auto-assign reviewers

- Pings for people (unsure how sophisticated)

- Auto-merge

The review conversation itself is the big issue.

londons_explore · 4 years ago

To add to your list:

> Reviewer able to make tiny changes to the code without a further round trip to the developer. Things like "missing full stop here" or "Please put a comment warning about X here". Developer should get a heads up that their code has been merged with changes, just so the tiny changes don't go unreviewed.

epage · 4 years ago

I feel the hard part is putting automation around this.

I loved it when I worked on a small, high trust team where we were comfortable with each person making the decision on whether it was safe to do a post-merge review or that it needed attention first with a pre-merge review.

EDIT: Something I have liked at a couple of my company's was a "cowboy commit audit" where we allowed people to directly push but doing so would ping the code owners and they would have to review to clear the state. People understood they shouldn't but also understood when it happened.

fivre · 4 years ago

Github suggestion comments usually do this pretty easily. You can prompt the submitter to accept a diff. It does have two annoying problems though:

- You must use a browser editor, no attaching a diff you generated elsewhere. Anything more than 1 line is hell because Github's editor gives no fucks about running gofmt or whatever. - If you request a change, approve the PR, and then the submitter accepts your change, it dismisses your approval.

chii · 4 years ago

i'm a bit ambivalent about this sort of idea - reviewers adding "tiny" changes without extra reviews could be a place where bugs leak in. what's tiny or insignificant might turn out to be big - that's why code is reviewed by multiple people!

I think for style formatting, making it an auto-format step during commit/push is the "better" way. Codified formatting means you argue and hash it all out once and be done with it forever.

andrekandre · 4 years ago

i've heard it said that pair-programming can alleviate the need for doing code review... or at least the typical in-depth reviews that usually become bottlenecks...

any experience with that?

epage · 4 years ago

My primary experience with pairing was not ideal. It was being used in a mentoring capacity rather than when people are more equal. I also found that the person doing the mentoring usually drove which meant that the mentee just saw things whizzing by on the screen and had no idea what happened and when they did they didn't have the first hand experience to commit it to memory.

I do remember a situation of hot seat programming where we had a critical problem right before release that crossed domains and three of us worked at one desk consulting the others or swapping out as needed. Again, it was worth it because of the urgency and priority but not something worth doing generally.

Gene_Parmesan · 4 years ago

Pair programming can't alleviate the need for code review. Good pair programming may reduce the amount of changes needed, and thus speed up the review phase, as a secondary effect of increased code quality.

Bad pair programming will suck up time by occupying two people at reduced efficiency, and won't lead to better code.

I've had some really great pair programming experiences, but we as a team aren't mandated to use it. We see it as a tool to pull out when cross-domain knowledge is needed, or when facing a particularly thorny problem. If you get the right two people together, they can fill in the gaps of each other's knowledge. The person not driving has more mental capacity available to think through potential alternatives.

However, the success of pair programming depends as much on the personalities of the individuals doing the pairing as anything else. They need to function as teammates, without any ego issues or any other thing that may prevent either party from feeling fully comfortable sharing their knowledge. Trying to pair program with someone who is defensive about their code/abilities is a true nightmare.

apotheon · 4 years ago

I've had great experiences with pair programming (and also bad experiences, but this is assuming a great experience), and I'd still want code review. The benefits can actually stack, if not compound, often enough.

ak217 · 4 years ago

Pair programming doesn't scale for all the purposes of code review. Pair programming can be great for bursts and unblocking when there is good alignment, but code review serves as an enduring artifact for the whole team including future team members.

BurningFrog · 4 years ago

The way I'm used to pair program is that the pairing counts as a much more intense review than the typical corporate review case.

So the whole waiting aspect is completely gone.

Sure, this can have bad results with less skilled and/or serious people. But that's always true.

stef25 · 4 years ago

Once a review is done, it can help to sit with the person to avoid an endless back & forth via whatever app's hosting your repo. In my experience developers usually just put aside 1-2 moments per day to spend on reviews, so if there's 3 back & forths that can mean the changes don't get accepted for several days. Sometimes for just a few lines of code.

Deleted Comment

emodendroket · 4 years ago

Well that's obvious, because the person who did the pairing is already pretty familiar with the code, so even if you do perform a review it will go fast.

nsonha · 4 years ago

that's like saying meetings alleviate the need for emails (documented decisions). Ideally you need both, but pair programming/meeting needs to be scheduled, with all the associated problems, async communication doesn't. If you have to pick one over another, then obviously that's async communication. A meeting without minute is a meeting that didn't happen.

vogusar · 4 years ago

At my company, we do exactly that using Github+Slack integration. Github Codeowners + Github Teams let's you to autoassign teams as reviewers. When the team is assigned then a random person from the team is pinged using round robin. Furthermore, with Github Scheduled Reminders you can get notifications in Slack according to some schedule and/or when some event happens, e.g. your PR was approved, got a comment or a mention, etc.

javajosh · 4 years ago

>Auto-merge-on-green-checks

ADO has that.

Post, wait, CI failure, fix, post

This is the worst, especially when the CI system is homegrown, rickety, and difficult or impossible to reproduce locally.

epage · 4 years ago

When I was doing product evaluation at my last company, I was torn between AzDO and Github. After being burned by the lack of progress on Phab (before it went away), one of our criteria was what I called a "living platform" meaning that it either has a vision that aligns with us and progress is being made or there is a vibrant community where we can collaborate. AzDO has a lot of good features but it doesn't seem like its getting much attention and everyone has been keeping their integrations with it private, killing off collaboration.

I've worked at both FB and at smaller companies. The "stacked diff" workflow is quite good, although it can be replicated via Github's PRs via "logical commits." The basic workflow is:

1. Each commit is a Phabricator-style diff. Put the whole idea into a single commit via `git add -p` and rebasing. 2. Multiple ideas/diffs for the same feature? Split them into multiple commits on the same PR. 3. Review the PR commit-by-commit: you can click on the links in Github to the individual commits and see+comment on the changes there. Comments on commits automatically propagate back to the PR FWIW, so this works pretty well.

The mental model becomes: the PR is the overview of the full feature, but not how reviewers are expected to review. Reviewers review commits.

It works pretty well! Assuming you have buy-in from your team to operate this way. In fact, I actually somewhat prefer it: with "stacked diffs," it can be harder to get the birds-eye view of the entire feature's set of changes. Git and Github are very flexible, so in practice I find you do need some established workflows that everyone agrees to follow, or else it's chaos.

kristjansson · 4 years ago

This should describe commits for a normal PR workflow too though - each commit is the smallest deployable unit with a meaningful title and description. I guess squashing merges by default allows / encourages people to put less thought into individual commits since they aren’t expected to become part of the permanent history?

s4i · 4 years ago

But, when using GitHub, isn't the missing puzzle piece here (which Graphite and other "stacked review tools" mentioned here attempt to solve) that you then cannot just merge those individual PRs one by one – you either have to merge the whole PR or nothing? Which leads us to the second option: to use multiple dependent PRs, which causes its own issues explained in other comments.

reissbaker · 4 years ago

Stacked diffs in practice were often deployed as a unit at FB, in my experience, with the "Ship stacked diffs" button. There's maybe a little value in merging individual commits in a PR, but I think not a lot: if all the commits aren't in master, the feature doesn't work anyway. Technically you get a little bit of an edge from merging individual commits if earlier ones are acceptable and later ones aren't, in that those commits land in master faster and are less likely to have to deal with merge conflicts later — but in practice I didn't find this to be a dealbreaker. Most PRs weren't enormous, and most files weren't subject to many engineers modifying the exact same lines concurrently, so overall review time didn't usually cause too many conflicts — and conflicts usually weren't hard to solve anyway.

tumetab1 · 4 years ago

I think his/her perspective, and mine, is that we don't want to merge individual commits.

My personal feeling is that I need to review each commit and then I need to review the whole PR (list of commits). If both are good then the PR can be merged but each individual commit is not OK to merge unless it's part of a PR.

Liskni_si · 4 years ago

> you either have to merge the whole PR or nothing

That's not true, though. You absolutely can merge a single commit, via git push, there's just no button in the GitHub UI for it.

exclipy · 4 years ago

But you can't amend a commit without destroying the old version of it.

Stacked diffs run in two dimensions. In one dimension is the stack of diffs -- logical incremental changes to the code, which make sense to code historians from the future. But each diff in the stack also has a history as it progressed through code review.

Deleted Comment

xapata · 4 years ago

Second.

Squashing and amending commits is equivalent to the workflow described in the article.

zurawiki · 4 years ago

> This idea isn't new, and if you've worked at a company like Facebook, Google, Uber, Dropbox, or Lyft (among others), you've either had first-class support for stacked changes built into your code review platform (i.e. Phabricator at Facebook...

Phabricator got a lot of stuff about code review right (I may be biased from working at FAANG). I'm happy to see someone trying to improve the code review situation on GitHub. When I did use GitHub at the startups I worked at, we were never fully satisfied with the code review experience.

I hope they surface "Phabricator versions" as first-class objects. I never liked it how GitHub just clobbers the history left behind from force-pushes from to branch.

Denvercoder9 · 4 years ago

Gitlab got this right, Merge Requests are versioned and you can look at the differences between versions of a MR.

It's somewhat amazing and shocking how little the review experience on GitHub has improved over the last 10 years.

masklinn · 4 years ago

Honestly what i want is for github to purchase https://reviewable.io/ and make it native.

Maybe gate sone of the more advanced features behind an opt-in so beginners are not too overwhelmed (and because you probably don’t need them on smaller projects), but the current state of things is just appalling, and more importantly has not at all kept up with workflows on large projects or long / complicated PRs.

The complete lack of support for reviewing over force pushes is insane.

gibolt · 4 years ago

The ability to quickly manage branches and reorder commits within them in Mercurial is super powerful, something that Git and `rebase -i` could only dream of.

Doing the same in Phabricator is awesome. One blocking commit PR on the bottom of a stack can easily be moved to the top or a new stack, unblocking the rest of your changes. Re-writing and entire stack due to one small change is mostly eliminated.

(Good) Tooling is a major gain in efficiency.

tadfisher · 4 years ago

> The ability to quickly manage branches and reorder commits within them in Mercurial is super powerful, something that Git and 'rebase could only dream of.

This is a git CLI wart more than anything. Magit (git interface in Emacs) makes juggling branches/worktrees and rewriting history intuitive to humans. I feel super-powerful in Magit and absolutely feeble using the CLI.

surajrmal · 4 years ago

I completely agree. When Google decided to invest in a mercurial front end to their version control system a few years ago, I was confused, but after using it for a while, I can honestly say it has much better ergonomics for common workflows. Creating a tree of dependent commits is simple and trivial. I sometimes wish I could create a dag instead, but that's probably more trouble than it's worth.

One thing Phabricator did/does worse than GitHub is the interface for browsing the individual commits that make up a PR. I like being able to tell a story with the sequence of commits and their messages.

baby · 4 years ago

I honestly don't feel like Github did a good job there either, I mean at least on the web UI because the VSCode Github extension helps a lot (when it's not buggy).

vp8989 · 4 years ago

The actual observed problem in the article is that the team's SDLC pipeline can't handle the current flow it's being subjected to. Work is getting "blocked" in code review.

Devising a coping mechanism for developers so that they can increase the inbound flow to an already overloaded system is not actually a solution.

The real solution to this rather common problem is to introduce a culture where "done" means "shipped" and not "I created a bunch of PRs for my team mates to read and while they all do that Im gonna create some more" and to use tech where you can to increase the flow the SDLC pipeline can handle.

This involves things like moving as much "code review" into the compiler/linter as you can, do a little more design/consensus building upfront, streamline all acceptance testing and deployment steps to their bare essentials etc...

https://trunkbaseddevelopment.com/

exitheone · 4 years ago

None of which works well if your reviewer sits in another timezone and your next business day is their public holiday. Waiting 3 days or more is not an option when I could just comfortably stack my changes and have the whole stack reviewed whenever my reviewer is available.

lmm · 4 years ago

Why would reviewing have to be done by a specific individual? If reviewing is the blocker, you need to raise the priority and allocate more resources to it.

indymike · 4 years ago

> Waiting 3 days or more is not an option

Waiting for more than 20-30 minutes for a review is ridiculous and means some combination of three problems are present:

* commits are too big to be reviewed quickly

* Tooling/testing is not catching a lot of blatant problems before the pull request happens.

* Review process has turned into some bizarre management ritual

dboreham · 4 years ago

I suspect the parent is saying something like "much code review is not necessary, can be replaced with automated checks". So once you remove the 80% of code review that isn't actually needed, the review capacity is 5x more.

dkhenry · 4 years ago

I really didn't understand what the big deal was with Stacked Commits when I was just working in vanilla github, and it used to cause friction with members of my team who knew the stacked workflow, but didn't know the Github workflow. After moving to a company that uses stacked commits I will never go back to the pull request workflow. I think there are some famous people saying that Pull Requests aren't Code Reviews, and that becomes obvious once you start using stacked commits for code reviews.

This is essentially how cli git is set up to function, and mailing list driven workflows use this concept to have small incremental reviews. I am excited to see this tool, but honestly I have soured so much on github because of the lack of code review, that I would just as soon it didn't try to shoehorn into pull requests, and just did what people do with Phabricator, optionally use GH too host the git repo if you need to, but do everything else in a different tool

notreallyserio · 4 years ago

I still don't understand the big deal so maybe you could help: what is the difference between stacked commits and pull requests targeting branches in other pull requests?

I've used GH for code review so I'm not sure what's missing there, either. Reading this whole comment section leaves me a bit confused, like I'm the only one using a hammer to pound in screws when everyone else has moved on to a screwdriver.

The big win for me is when someone reviews my code I can easily rebase the entire stack and push it back up for review, and thats all done with `git rebase -i HEAD~n`. That means when someone is actually reviewing my code its not uncommon for me to change something somewhere near the base of the stack and push up changes to the entire stack. You in theory can do that with branch based workflows but its a pain. As I understand this is what graphite is automating.

Once you can easily do that, it naturally leads to being able to do review per commit which it makes it so easy to unblock your work because your not fighting against git, and you aren't worried about getting immediate reviews because I know if any changes are requested its trivial to make those changes, and I still have branches, so if I do have unrelated work I can still use branches to isolate them. The last thing is another tool feature, when I rebase and push up changes I have to give a comment on that new commit so reviewers can see the changes that caused the new revision.

Honestly I was a heavy user of Github and GitLab having instituted one of them at three different companies, and built out CI pipelines and development procedures for multiple large engineering teams, and I now feel like I was using a hammer to pound in screws and I will never go back.

imiric · 4 years ago

GH will also automatically retarget the dependent PRs to the `main` branch after the dependency PR is merged. I usually just make a note in each stacked PR, "Depends on #N", mostly so that the review and merge sequence is followed. But even this would be clear on its own by the PR number and target branch.

I agree that the PR code review process leaves a lot to be desired, but this particular workflow doesn't seem like a big problem. Though I've never used Phabricator and similar tooling, so there must be something we're missing.

howinteresting · 4 years ago

It's equivalent but horrible to actually use. Try managing a stack of 40 that way.

didibus · 4 years ago

Even when the tool support it, I'm not sure it's a good practice.

Small code reviews often miss the forest for the tree. When you make the extent of the feature changes even harder to mentally trace back and put together by having a diff over another set of code that is itself an unapproved diff, the code review becomes pretty low quality.

I've also done this where my change that depend on other changes get approved, but then the other set of code gets critiqued so I have to change it and that breaks the code that depends on it so I have to change that as well, and now I still need to get it all reviewed again, so really I just wasted someone's time the first round.

What I do now is that I make commits over it in my branch, so I keep working while they review my first commit, and when that's approved I send the next commit to be reviewed, and so on. If there are comments I do an interactive rebase to fix them.

fosterfriends · 4 years ago

Google had a good paper on the benefits of smaller changes: https://sback.it/publications/icse2018seip.pdf

> A correlation between change size and review quality is acknowledged by Google and developers are strongly encouraged to make small, incremental changes

Heard though about wanting to maintain context within a stack. I think the best balance is tooling that both lets you create a stack of small changes, while also seeing each change in the broader context (forest).

The benefit of breaking out PRs instead of commits, is that each small change gets to pass review and CI atomically. I like your strategy of gating commit pushes until the first is reviewed, but I think the dream is to decouple those processes :)

jacobr1 · 4 years ago

Having worked in startups for most my career, and now recently joining a BigTechCo, I think some of the cross-talk comes from the commingling of concerns within a PR. PRs are used to review architecture, enforce security/code-conventions/style, check for logic bugs, catch potential conflicts with other efforts, and validate the logic of a feature.

In BigTechCo you have more coordination problems, so you want to merge code to trunk as soon as possible to prevent integration risks. But that means you drive discussion of the architecture and features to other venues than than the code review. Code review becomes more of a "will this break things" check, and enabling a feature-flag is more the review stage for the feature itself.

Hum, thanks for the study link:

> Previous studies have found that the number of useful comments decreases [11, 14] and the review latency increases [8, 24] as the size of the change increases. Size also influences developers’ perception of the code review process; a survey of Mozilla contributors found that developers feel that size-related factors have the greatest effect on review latency [26]

I'll have to dig into those. It makes sense that it decreases latency, but I'm curious how they assessed useful comments.

That said, I'm not saying to make massive code reviews, because reviewers are lazy, they won't want to spend more than an hour to review no matter the size of it. But when the reviewed code is made on top of code that itself has not been reviewed, the quality of the review suffers in my opinion.

polishdude20 · 4 years ago

That's what we do as well but we've got merge block labels. If a pr depends on another PR, we put a merge blocked label on it and reference the PR in the comments. We've got a GitHub action which doesn't let a merge happen unless zero labels are present.

barbazoo · 4 years ago

Usually when feature-branch-1 is still under review and feature 2 relies on feature 1, I just open a PR from feature-branch-2 onto feature-branch-1 to get the review process started, knowing that feature-branch-1 might still change which I can deal with by rebasing.

I came from Airbnb (vanilla github) and some friends converted me to a stacked workflow. I think the challenge with vanilla git is that rebasing and re-pushing becomes a brutal process if you stack 3+ changes. (At facebook people regularly stack over 10). Beyond recursive rebases, Github offers no support for navigating your stack, merging in a stack of approved PRs together, adjusting merge bases, etc.

Most of these practices are just inspired by what Phabricator and a few of the big companies are already doing well :)

treis · 4 years ago

>(At facebook people regularly stack over 10).

Meaning branch 1 depends on branch 2 which depends branch 3... all the way to 10?

That just seems crazy to me. What happens if branch 4 changes their implementation and borks everything depending on it?

Are you sure about merge bases? I regularly use GH to specify a branch that is part of a different open pull request.

milkytron · 4 years ago

Yeah this shouldn't be an issue when following standard git branching practices. It's kinda the point

One problem with GitHub is that you can't do this with branches from a fork.

hobofan · 4 years ago

Can you be more specific what you think can't be done?

I just tried it, and I can without problem open a PR from a fork against an arbitrary branch of an arbitrary fork (or main repository).

pas · 4 years ago

but if everyone works at the same company they can create the branches in the same repo

jeblair · 4 years ago

Zuul is an open-source CI/CD/project-gating-system that implements this for Github, GitLab, Gerrit, and pagure. It actually lets you stack dependencies between any of those systems. Just say "Depends-On: <url>" in the PR.

See https://zuul-ci.org/ and https://zuul-ci.org/docs/zuul/discussion/gating.html#cross-p...

y2kenny · 4 years ago

Zuul is great. After using it for about a year in production, I must say Zuul is everything I hoped for and more (coming from a Jenkins deployment previously.) Some of the concepts take a bit of effort to wrap my head around but once I understood them they make a lot of sense. Leveraging of Ansible is great. Making jobs cheap and inheritable is awesome.

delsarto · 4 years ago

As the parent poster well knows (as they invented it) the trick here is that you write your jobs to install from a checkout the CI system does for you.

https://opendev.org/opendev/system-config/src/commit/0a27974...

is really nice practical example. What it does isn't even really that important; but it runs a simulation of deploying production code in OpenDev. This is a "devel" job, we deliberately test any changes against all the latest HEADs of the master/main/development branches of projects we use. This is a non-voting job -- a notice that what you're introducing might be fine now, but there is trouble brewing when our dependencies release their next version. Sometimes that's fine and a known issue, or upstream is broken, and other times it's something totally unique and needs to be fixed (this is why, despite AI being able to write code for you, so far the implications of using that code still need a human in the loop :)

The "required-projects" in the job definition tells Zuul what repositories this test needs.

You can clearly see how it handles various projects having "devel", "main" or "master" branches to pull from to get their latest versions.

In the "vars" section you can see we're setting variables that get passed to the job roles flagging "this is the devel job, don't install the latest release but use the source Zuul will checkout for you from here".

The amazing thing? If I make a change and this job fails, I may debug it and find that it wasn't actually my fault, but something in upstream Ansible committed recently. I can propose the fix upstream. They do all their CI, and that's fine. But I now put in my change comment

Depends-On: https://github.com/ansible/ansible/pull/1234

and magically Zuul will recognise that I want to apply that pull request to the Ansible tree in testing and set it up for me (as noted, Zuul can do this for all sorts of systems, not just github). Additionally, Zuul will not merge the change until the dependency is satisfied -- I can NOT commit broken code!

This is awesome! We should add support for this to the CLI, I took a note