The merge vs. rebase debate

I'm a big fan of the rebase workflow, but not of squashing. I wrote it as several separate commits for a reason: documenting each step, making each step revertible, separating refactors from from semantic changes, etc. Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.

(A workflow that preserves those commits requires actually having useful commits, and obviously if you have PRs with commits like "fix it" and "fix it more" then it might as well get squashed.)

alphazard · 2 years ago

Most developers I've worked with think of Git as a CLI front-end to GitHub. They aren't writing commits with the intention of showing their work, they are using commits to save their work before signing off, or to trigger CI. They aren't proficient enough with Git to go back over these and clean them up, as would be expected in a project like the Linux kernel.

If that's the state of developer proficiency with version control then, ideal workflows aside, we need a way to prevent those noise commits from ending up in master, and the easiest way is to squash with every pull request, as part of the "merge" automation.

burntsushi · 2 years ago

Why do you need to squash & merge literally every PR? Why not use squash & merge when appropriate and rebase & merge otherwise? That's what I do for my projects. Most contributions coming from others tend to get squashed & merged, where as I'll rebase & merge my commits. But I'll sometimes rebase & merge PRs from other contributors who have broken their change down into thoughtful commits.

jmharvey · 2 years ago

I'm a fan of the workflow where the PR gets squashed in the upstream git repo, but the individual commits are preserved in the PR in the code review tool. I feel that Phabricator handles this well.

usrusr · 2 years ago

But does that still lose the source commit long term? What I'd love to have is a mechanism that keeps references to the pre-squash commits at blame granularity, allowing one to dig deeper into the commit messages associated with a given line. Kind of like a sourcemap, but for squash instead of transpile.

easton · 2 years ago

GitHub and Azure DevOps also do that, you just need to know where to look.

I don’t mind squashing either, unless I’m being really intentional or rewriting my history my intermediate commits couldn’t be reverted without leaving stuff broken (totally a me problem of course).

yetanother12345 · 2 years ago

Squashing is nice IMHO, and even a must after a while. For one recent very small project a squash of the commit history reduced the storage from tens of kilobytes to a few hundred bytes total. Orders of magnitude. That was a very small project, so imagine the storage space savings for larger projects.

I find that the commit history tends to grow viciously for anything I've been involved with. And I fail to see the benefit of amassing that amount of detail once you are past the stages where each individual commit is reversible (or even interesting)

So, for a project that runs for, say, three months, the commits of the first few weeks aren't really very interesting or valuable at all at the end of the period. Just hard drive space being eaten up. YMMV.

erik_seaberg · 2 years ago

Real example: I had to mitigate an outage in the middle of the night, and I found the root cause in ten lines of code. I needed very badly to “git blame” that code and find a specific commit message from three years ago and its author (a former colleague), to figure out what he had been trying to do.

Right now I have a full clone of a pretty large monorepo dating back almost nine years, and the .git dir is less than half of the total space. Sparse checkouts and shallow clones can make clown car hardware sort of work, but I do not want to go back to the pre-git days and try to work without full history to conserve 0.008 TB of SSD. We spend more than that on coffee.

klipt · 2 years ago

If you want to preserve separate commits you can just make them separate PRs?

Small PRs are better since they're easier to review.

imiric · 2 years ago

There are many reasons for having several commits in the same PR.

PRs often have a lot of overhead. They need a separate branch, CI jobs need to run, there are more notifications for everyone, separate approvals, etc.

Sometimes there's a need for keeping separate commits if they're all related to a single change. Proposing them all as part of the same PR helps maintaining that context while reviewing as well. Reviewers can always choose to focus on individual commits, and see the progression easily.

Sometimes it does makes sense to squash a PR if the work is part of the same change, but the golden rule of atomic commits always applies. Never just blindly squash PRs.

In fact, if the PR is messy and contains fixes and changes from the review, and the PR should have more than one commit, I go back and clean up the history by squashing each change to its appropriate commit. `git commit --fixup` helps with this, so I also prefer addressing comments locally rather than via GitHub's UI. Then it's a simple matter of running `git rebase --autosquash`.

mostlylurks · 2 years ago

If you're doing proper atomic commits, as most people critiquing squashing probably are, the overhead of making separate PRs for each would be ludicrous. It depends heavily on the situation, but in favorable conditions (greenfield development of some simple CRUD app, for instance) you can easily produce dozens of clean, atomic commits a day. As part of the same PR, those take almost no time at all to review. Put into separate PRs, you'd be wasting a lot of time and effort both on the reviewer's and reviewee's side.

burntsushi · 2 years ago

You can "just" do anything. The problem is that things don't "just" work that smoothly. What happens when the feature or bug fix I'm working on demands a refactor or a name change or something of that sort. I could put the refactor in a separate PR, but what if I'm not sure of the changes until I get far enough along with implementing the feature or bug fix? I might want to go back and tweak the refactor I did. So if I put up a PR as soon as the refactor was done, I'll then need to put up another PR with tweaks to it and then a third PR with the actual feature or bug fix. Or I could just put them all up in one PR together broken down by commit. Reviewers can review commit-by-commit. Or I could wait until I've finished everything, and then I'm left with submitting a single PR or splitting them into multiple PRs are submitting them simultaneously. (And dealing with stacking them appropriately.)

This is of course a balancing act. Which is my point. Sometimes it makes sense to split things up into multiple PRs. But sometimes it makes sense to fatten a PR a little bit with multiple commits. You can't just say "small PRs are better." Size is but one dimension of what makes a good PR.

This is why I personally use both "squash & merge" and "rebase & merge." If a PR has a bunch of commits but is really just one logical change, then I'll squash it. But if a PR has thoughtful commits, then I'll rebase it and do a fast-forward merge.

My bottom line is that I try to treat the source history as well as I treat the source. The source history is a tool for communicating changes to other humans, both for review and for looking back on. Squash & merge has a place in that worldview, but so does rebase & merge.

tazjin · 2 years ago

The problem there is that toy systems like Github's review tool don't have good flows for stacking dependent changes, and then people end up not bothering.

For this reason I don't bother with Github and the like and just use Gerrit ;)

jen20 · 2 years ago

If you actually have commits which stand alone - i.e. the build succeeds, the tests pass etc - I see no reason not to land them altogether with a rebase. What I object to is commits which do _not_ meet those criteria, and make life much harder for the next person who has to spelunk through the history trying to work out what has happened and why.

JoshTriplett · 2 years ago

We're in full agreement there. If you run an "every commit should compile" project, rebase. If you have "fix" and "another fix" commits stacked atop the original in a PR, by all means squash them all.

aeternum · 2 years ago

Rebasing would be so much better if Git just had better defaults.

Having to fix the same merge conflict for each of your commits is one of the leading causes of developer burnout :D

jauntywundrkind · 2 years ago

Yeah it can be a bit of a pain. If you have long running branches or do keep hitting having to re-resolve, getting good at git rerere is recommended; "reuse recorded resolution"!

https://hn.algolia.com/?q=git+rerere

o11c · 2 years ago

Rebasing generally doesn't require repeated merge conflicts though? Since the formerly conflicting commit disappears and the non-conflicting one is now baked into a linear history.

Regardless, `git rerere` is supposed to solve that problem, but I don't do enough conflicting merges to be intimately familiar with it in practice.

cplusplusfellow · 2 years ago

I’ve known developers who left the company over the rebase mandate for this exact reason.

mrkeen · 2 years ago

Squash before rebase?

trevor-e · 2 years ago

It sounds like you're more describing a stacked PR workflow. I achieve the same thing using stacked PRs and can still have a bunch of the "fix stuff" intermediate commits that get squashed away, because who can really predict whether everything will pass CI in the first attempt. :)

JoshTriplett · 2 years ago

I don't want to have a stack of PRs where each one depends on the previous, where each PR needs to justify itself separately while at the same time being interdependent and ordered. That adds cognitive overhead and makes it take longer to get merged, if it gets merged at all.

It's possible the tooling could handle that case much better, but until it's sufficiently better that it's as simple as `gh pr create` by the author and one click of a merge button (or equivalent "@somebot merge") by the reviewer, that's still too much.

nunez · 2 years ago

If you're using GitHub or gitlab and merging through pull requests, I've found that these commits become duplicative and, given GitHub's rich comments-based collaboration UX, somewhat lossy. It is much easier/more valuable for me to view a commit off main that points to the PR that brought it in and the discussion that took place (along with the original commit in that branch) than to see the individual commit rebased into main without context.

Also, a lot of people (myself included) write really crappy commit messages that don't tell the whole story behind a change. This is another reason why falling back on the PR has been valuable for me.

kevingadd · 2 years ago

Historically any time I try to rebase a branch with more than a few commits in it, it effectively fails unless I squash it because resolving all the conflicts for every commit would take me over an hour. Maybe this is just a 'many contributors' problem though, since our repo lands many large PRs each day.

(Merging has the same problem, so I squash frequently and then rebase.)

feanaro · 2 years ago

How does merging have the same problem? By definition, merging only requires you to resolve the conflicts once, at the point of the merge commit.

HeavyStorm · 2 years ago

Me too. I think git blame is the ultimate documentation tool - you can have a description for each block of code, tied to who and when it happened. If you squash, all the sudden you have a simple explanation for hundreds or thousands of lines.

Thaxll · 2 years ago

imo you should always squash,when merged pr / mr should act as a single unit not many commits.

Also having many commits does not means it's going to be easier to revert / fix than a single big one.

blakesley · 2 years ago

> Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.

Can you give an example? I don't even think I understand what you're saying. You control your overall summary, so it's up to you to make it useful.

I would cite clarity as my reason for squashing! I think most people are just bad at organizing (& naming) their commits, so they make A LOT of vacuous ones, crowding out the forest for the trees. It's never helpful to see the git blame saying things like "Addressed PR comments" or "Resolved merge conflict", etc.

I do prefer a merge commit when the author has done a good job with their own commits, but that's rare. In all other cases, squashing is greatly preferable to me.

JoshTriplett · 2 years ago

> > Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.

> Can you give an example?

Sure. Here's a common pattern I've used and seen others use:

Commit 1: "Introduce a helper for XYZ", +35,-8, changes 1 file. Introduce a helper for a common operation that appears in various places in the codebase.

Commit 2: "Use the XYZ helper in most places", +60,-260, changes 17 files. Use the helper in all the mechanically easy places.

Commit 3: "Rework ABC to fit with the XYZ helper", +15,-20, changes 1 file. Use the helper in a complicated case that could use some extra scrutiny.

I don't want those squashed together; I want to make it easy, both in the PR and later on, to be able to see each step. Otherwise, the mechanical changes in commit 2 will bury the actual implementation from commit 1 in the middle (wherever the file sorts), and will mask the added complexity in the commit 3 case.

> rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits

"Cleaner", for some definition of "clean". In this case, pretty, not accurate.

I just can't understand the draw of rebase based workflows. It seems to be an expression of a preference for aesthetics over accuracy. What is the point of source control, other than to reliably capture what actually happened in history? As soon as you start rewriting that, you compromise the main purpose.

Using merge commits preserves what you actually committed. If you ran tests before you committed, rewriting that commit invalidates that testing. If you need to go back and discover where a problem was introduced or what actually happened, with certainty, in a commit history, rebase undermines that, because it retroactively changes your commits.

It's like a huge portion of the industry is collectively engaging in a lie, so that our commit histories look prettier.

azornathogron · 2 years ago

> What is the point of source control, other than to reliably capture what actually happened in history?

Unless you're committing every keystroke, you're recording a curated history. You choose when to commit, and by choosing you declare some historical states to be worth keeping and the rest to be merely incidental.

I think usually history "rewriting" (eg, rebasing) is much more about curation - choosing which aspects of the history you care to record - than it is about presenting a false record.

lolinder · 2 years ago

Exactly. To analogize to history history: OP wants the version control history to look like a collection of primary sources. Here's the president's daily calendar, there's the letter he received on April 24 from a small child in Wisconsin. In this model, it's up to future code historians to piece it all together into a story.

When I go back and look at the git history, I would much rather have had someone do the work of compiling the story for me at the time. Commits are your chance to document what you did for future programmers (including future you). If you insist on them faithfully reflecting every change you made over the course of three days, then future you will have to piece that all back together into a coherent story.

Why not take the chance to tell the story now, so that future you can skip all the false starts and failed experiments and just see the code that actually made it into main?

wandernotlost · 2 years ago

Okay, but rebasing is changing each point in time of that history–that you curated by choosing when to commit–to be something different from what it ever was, retroactively. It's literally creating an entirely new history that nobody has ever actually examined, introducing the possibility that points along that history are inconsistent with what was intended at the point of each commit.

flir · 2 years ago

Try it like this, see what you think:

Commits serve two needs: saving your work and publishing it. Adopting an "early and often, explain what you did" approach is effective for saving, but when it comes to publication a "refine before release, explain why you did it" strategy is more valuable.

The commit history is an artifact of the development process, just like documentation, tickets, or even code. I'm sure you wouldn't complain about people taking the time to write better comments, and a commit message is like a super-comment, because it can apply across multiple files.

Honestly, do a maintenance programmer a favour - fix up your commits before publishing them. A linear history makes tools like bisect easier to work with.

wandernotlost · 2 years ago

I wonder if the difference here is in what your quality threshold for a commit is. I commit when I reach a point of coherence in the code, and ensure that the code passes tests before I commit. Each commit is thus a checkpoint of coherence, where the points in between may be out of order or failing tests.

Maybe I just don't consider "saving your work" to be a valid use case for commits. Use an IDE or other local tools for that. Commits are points that are worth saving (or "publishing" if you prefer) beyond your local workspace.

fnordpiglet · 2 years ago

I don’t look at anything other than the merge to a trunk or main as part of the history. It’s not an audit log. I often do check point commits to move local state to a central git as a backup, or commit when I simply want to have a rollback option for something I’m not confident in. I always commit at the end of a day, for instance, and push to a remote, as I don’t trust my laptop or whatever, or worse some cloud dev machine.

None of these commits are useful for anyone, not even myself, beyond the immediate utility. I squash intermediate commits between change sets, and try to only reveal atomic change sets on any shared branch.

It’s absolutely the history of what has changed, but it is not some sort of journal log of every event in my development workflow the shared branch should absolutely be the evolved history of the source code, but without reflecting the work style of any one developer. It should be a comprehensible history of meaningful changes that can be independent reasoned about and cherry picked or reverted to as necessary. Every other commit is noise to everyone, including yourself, once it leaves your own branch. Since it didn’t even run in production there’s not even a plausible regulatory reason to keep them.

hgomersall · 2 years ago

Why not have both? If you can filter by merges, what's the harm in having intermediate positions? There have been various points I actually wanted to have the vim undo log as well. That's what I'd really like - essentially a way of undoing back to time zero, with commits denoting feature complete positions and merges denoting, well, mergeable positions that have passed review.

codeptualize · 2 years ago

This, I really don't mind merge commits, it's nice to see what happened when. Especially if you run into conflicts and issues caused by bad resolution it's much better to have a clear true history.

kragen · 2 years ago

the point of git is to enable linus or al viro or whoever to review your proposed changes as quickly and efficiently as possible, so they can be confident that what they're merging into their kernel tree is relatively sane, and then to actually do the merge in a reliable way that won't introduce other unintentional changes, and to be able to reproduce their own previous state

in that context it makes sense to use rebase to present linus with the cleanest, most comprehensible patch set possible, not your lab notebook of all the experiments you tried and the obvious bugs you had. you don't want to waste linus's time saying 'you have an obvious bug in commit xyz' followed by 'oh, never mind, you fixed that in commit abc'

but for my own stuff i prefer merge over rebase because i'm both the producer and the consumer of the feature branch, and rebase seems like more work and more risk

ponector · 2 years ago

I see no issues here.

If you run tests before commit then you also run them after rebase, same way as after merge. If tests failed - you can force pull your branch from remote and have the same state as before rebase.

wandernotlost · 2 years ago

You run tests against each commit in the history that you're rebasing? I doubt it, and I guarantee that nearly nobody using rebase does that.

dasil003 · 2 years ago

> "Cleaner", for some definition of "clean". In this case, pretty, not accurate.

What do you mean "accurate"? The developer decides when to commit and what message to write, rebasing just enables more control over the final artifact that is shared.

Have you ever heard the writing advice: don't write and edit at the same time?

Rebasing allows one to use the full power of git during development, committing frequently, and creating a very fine grained record of progress while working, without committing to leaving every commit on the permanent record. The official record of development history is more useful if it's distilled down to the essence of changes, with a strictly linear history, and no commits that break CI or were not shippable to production (at least in theory). Doing so makes future analysis and git-bisect operations much more efficient, and allows future developers to better understand the long arc of the project without wading through every burp and fart the programmers did during their individual coding process.

To those who say, "don't commit until you have a publishable unit of work," I say, you are depriving yourself of a valuable development tool. To those who say, "don't rebase, just squash", I say, squashing is rebasing, just without curation. To those who say, "rebasing is more error prone than merging", I say, if a merge commit turns out to have a problem you will have a much harder problem debugging it because it could be caused by either branch, or an interaction which no one considered.

The beauty of rebasing is that it forces the developer to think about all the intervening changes commit by commit as if they started their feature development from the current state of the main branch. This is a more healthy mental model and puts more responsibility on the developer to ensure their code reflects the current state of the world, and not just hastily merging without recognition of what has changed since then. After all, production can only have one commit on it at a time, and given many investigations hinge on understanding what SHAs were in production at what point in time, it makes everything a lot easier with a linear history that hews closely to what was actually shipped.

I realize that there's a learning curve for rebasing, but once you understand it, it allows conflict resolution to be resolved much more precisely with roughly the same level of effort. You can dismiss this as an aesthetic preference, along with good commit messages, changelogs and other points of software craftsmanship, but in my experience that there is real value in maintaining a high quality history on a long-lived project.

foreigner · 2 years ago

This. It's a dirty lie, that's not what actually happened!

Deleted Comment

lolinder · 2 years ago

Why does it matter what actually happened? Can you give a concrete example of when you care the exact sequence of experiments, false starts, and refinements that a feature went through before making it into a PR?