I'm a big fan of the rebase workflow, but not of squashing. I wrote it as several separate commits for a reason: documenting each step, making each step revertible, separating refactors from from semantic changes, etc. Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.
(A workflow that preserves those commits requires actually having useful commits, and obviously if you have PRs with commits like "fix it" and "fix it more" then it might as well get squashed.)
Most developers I've worked with think of Git as a CLI front-end to GitHub. They aren't writing commits with the intention of showing their work, they are using commits to save their work before signing off, or to trigger CI. They aren't proficient enough with Git to go back over these and clean them up, as would be expected in a project like the Linux kernel.
If that's the state of developer proficiency with version control then, ideal workflows aside, we need a way to prevent those noise commits from ending up in master, and the easiest way is to squash with every pull request, as part of the "merge" automation.
Why do you need to squash & merge literally every PR? Why not use squash & merge when appropriate and rebase & merge otherwise? That's what I do for my projects. Most contributions coming from others tend to get squashed & merged, where as I'll rebase & merge my commits. But I'll sometimes rebase & merge PRs from other contributors who have broken their change down into thoughtful commits.
I'm a fan of the workflow where the PR gets squashed in the upstream git repo, but the individual commits are preserved in the PR in the code review tool. I feel that Phabricator handles this well.
But does that still lose the source commit long term? What I'd love to have is a mechanism that keeps references to the pre-squash commits at blame granularity, allowing one to dig deeper into the commit messages associated with a given line. Kind of like a sourcemap, but for squash instead of transpile.
GitHub and Azure DevOps also do that, you just need to know where to look.
I don’t mind squashing either, unless I’m being really intentional or rewriting my history my intermediate commits couldn’t be reverted without leaving stuff broken (totally a me problem of course).
Squashing is nice IMHO, and even a must after a while. For one recent very small project a squash of the commit history reduced the storage from tens of kilobytes to a few hundred bytes total. Orders of magnitude. That was a very small project, so imagine the storage space savings for larger projects.
I find that the commit history tends to grow viciously for anything I've been involved with. And I fail to see the benefit of amassing that amount of detail once you are past the stages where each individual commit is reversible (or even interesting)
So, for a project that runs for, say, three months, the commits of the first few weeks aren't really very interesting or valuable at all at the end of the period. Just hard drive space being eaten up. YMMV.
Real example: I had to mitigate an outage in the middle of the night, and I found the root cause in ten lines of code. I needed very badly to “git blame” that code and find a specific commit message from three years ago and its author (a former colleague), to figure out what he had been trying to do.
Right now I have a full clone of a pretty large monorepo dating back almost nine years, and the .git dir is less than half of the total space. Sparse checkouts and shallow clones can make clown car hardware sort of work, but I do not want to go back to the pre-git days and try to work without full history to conserve 0.008 TB of SSD. We spend more than that on coffee.
There are many reasons for having several commits in the same PR.
PRs often have a lot of overhead. They need a separate branch, CI jobs need to run, there are more notifications for everyone, separate approvals, etc.
Sometimes there's a need for keeping separate commits if they're all related to a single change. Proposing them all as part of the same PR helps maintaining that context while reviewing as well. Reviewers can always choose to focus on individual commits, and see the progression easily.
Sometimes it does makes sense to squash a PR if the work is part of the same change, but the golden rule of atomic commits always applies. Never just blindly squash PRs.
In fact, if the PR is messy and contains fixes and changes from the review, and the PR should have more than one commit, I go back and clean up the history by squashing each change to its appropriate commit. `git commit --fixup` helps with this, so I also prefer addressing comments locally rather than via GitHub's UI. Then it's a simple matter of running `git rebase --autosquash`.
If you're doing proper atomic commits, as most people critiquing squashing probably are, the overhead of making separate PRs for each would be ludicrous. It depends heavily on the situation, but in favorable conditions (greenfield development of some simple CRUD app, for instance) you can easily produce dozens of clean, atomic commits a day. As part of the same PR, those take almost no time at all to review. Put into separate PRs, you'd be wasting a lot of time and effort both on the reviewer's and reviewee's side.
You can "just" do anything. The problem is that things don't "just" work that smoothly. What happens when the feature or bug fix I'm working on demands a refactor or a name change or something of that sort. I could put the refactor in a separate PR, but what if I'm not sure of the changes until I get far enough along with implementing the feature or bug fix? I might want to go back and tweak the refactor I did. So if I put up a PR as soon as the refactor was done, I'll then need to put up another PR with tweaks to it and then a third PR with the actual feature or bug fix. Or I could just put them all up in one PR together broken down by commit. Reviewers can review commit-by-commit. Or I could wait until I've finished everything, and then I'm left with submitting a single PR or splitting them into multiple PRs are submitting them simultaneously. (And dealing with stacking them appropriately.)
This is of course a balancing act. Which is my point. Sometimes it makes sense to split things up into multiple PRs. But sometimes it makes sense to fatten a PR a little bit with multiple commits. You can't just say "small PRs are better." Size is but one dimension of what makes a good PR.
This is why I personally use both "squash & merge" and "rebase & merge." If a PR has a bunch of commits but is really just one logical change, then I'll squash it. But if a PR has thoughtful commits, then I'll rebase it and do a fast-forward merge.
My bottom line is that I try to treat the source history as well as I treat the source. The source history is a tool for communicating changes to other humans, both for review and for looking back on. Squash & merge has a place in that worldview, but so does rebase & merge.
The problem there is that toy systems like Github's review tool don't have good flows for stacking dependent changes, and then people end up not bothering.
For this reason I don't bother with Github and the like and just use Gerrit ;)
If you actually have commits which stand alone - i.e. the build succeeds, the tests pass etc - I see no reason not to land them altogether with a rebase. What I object to is commits which do _not_ meet those criteria, and make life much harder for the next person who has to spelunk through the history trying to work out what has happened and why.
We're in full agreement there. If you run an "every commit should compile" project, rebase. If you have "fix" and "another fix" commits stacked atop the original in a PR, by all means squash them all.
Yeah it can be a bit of a pain. If you have long running branches or do keep hitting having to re-resolve, getting good at git rerere is recommended; "reuse recorded resolution"!
Rebasing generally doesn't require repeated merge conflicts though? Since the formerly conflicting commit disappears and the non-conflicting one is now baked into a linear history.
Regardless, `git rerere` is supposed to solve that problem, but I don't do enough conflicting merges to be intimately familiar with it in practice.
It sounds like you're more describing a stacked PR workflow. I achieve the same thing using stacked PRs and can still have a bunch of the "fix stuff" intermediate commits that get squashed away, because who can really predict whether everything will pass CI in the first attempt. :)
I don't want to have a stack of PRs where each one depends on the previous, where each PR needs to justify itself separately while at the same time being interdependent and ordered. That adds cognitive overhead and makes it take longer to get merged, if it gets merged at all.
It's possible the tooling could handle that case much better, but until it's sufficiently better that it's as simple as `gh pr create` by the author and one click of a merge button (or equivalent "@somebot merge") by the reviewer, that's still too much.
If you're using GitHub or gitlab and merging through pull requests, I've found that these commits become duplicative and, given GitHub's rich comments-based collaboration UX, somewhat lossy. It is much easier/more valuable for me to view a commit off main that points to the PR that brought it in and the discussion that took place (along with the original commit in that branch) than to see the individual commit rebased into main without context.
Also, a lot of people (myself included) write really crappy commit messages that don't tell the whole story behind a change. This is another reason why falling back on the PR has been valuable for me.
Historically any time I try to rebase a branch with more than a few commits in it, it effectively fails unless I squash it because resolving all the conflicts for every commit would take me over an hour. Maybe this is just a 'many contributors' problem though, since our repo lands many large PRs each day.
(Merging has the same problem, so I squash frequently and then rebase.)
Me too. I think git blame is the ultimate documentation tool - you can have a description for each block of code, tied to who and when it happened. If you squash, all the sudden you have a simple explanation for hundreds or thousands of lines.
> Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.
Can you give an example? I don't even think I understand what you're saying. You control your overall summary, so it's up to you to make it useful.
I would cite clarity as my reason for squashing! I think most people are just bad at organizing (& naming) their commits, so they make A LOT of vacuous ones, crowding out the forest for the trees. It's never helpful to see the git blame saying things like "Addressed PR comments" or "Resolved merge conflict", etc.
I do prefer a merge commit when the author has done a good job with their own commits, but that's rare. In all other cases, squashing is greatly preferable to me.
> > Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.
> Can you give an example?
Sure. Here's a common pattern I've used and seen others use:
Commit 1: "Introduce a helper for XYZ", +35,-8, changes 1 file. Introduce a helper for a common operation that appears in various places in the codebase.
Commit 2: "Use the XYZ helper in most places", +60,-260, changes 17 files. Use the helper in all the mechanically easy places.
Commit 3: "Rework ABC to fit with the XYZ helper", +15,-20, changes 1 file. Use the helper in a complicated case that could use some extra scrutiny.
I don't want those squashed together; I want to make it easy, both in the PR and later on, to be able to see each step. Otherwise, the mechanical changes in commit 2 will bury the actual implementation from commit 1 in the middle (wherever the file sorts), and will mask the added complexity in the commit 3 case.
> rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits
"Cleaner", for some definition of "clean". In this case, pretty, not accurate.
I just can't understand the draw of rebase based workflows. It seems to be an expression of a preference for aesthetics over accuracy. What is the point of source control, other than to reliably capture what actually happened in history? As soon as you start rewriting that, you compromise the main purpose.
Using merge commits preserves what you actually committed. If you ran tests before you committed, rewriting that commit invalidates that testing. If you need to go back and discover where a problem was introduced or what actually happened, with certainty, in a commit history, rebase undermines that, because it retroactively changes your commits.
It's like a huge portion of the industry is collectively engaging in a lie, so that our commit histories look prettier.
> What is the point of source control, other than to reliably capture what actually happened in history?
Unless you're committing every keystroke, you're recording a curated history. You choose when to commit, and by choosing you declare some historical states to be worth keeping and the rest to be merely incidental.
I think usually history "rewriting" (eg, rebasing) is much more about curation - choosing which aspects of the history you care to record - than it is about presenting a false record.
Exactly. To analogize to history history: OP wants the version control history to look like a collection of primary sources. Here's the president's daily calendar, there's the letter he received on April 24 from a small child in Wisconsin. In this model, it's up to future code historians to piece it all together into a story.
When I go back and look at the git history, I would much rather have had someone do the work of compiling the story for me at the time. Commits are your chance to document what you did for future programmers (including future you). If you insist on them faithfully reflecting every change you made over the course of three days, then future you will have to piece that all back together into a coherent story.
Why not take the chance to tell the story now, so that future you can skip all the false starts and failed experiments and just see the code that actually made it into main?
Okay, but rebasing is changing each point in time of that history–that you curated by choosing when to commit–to be something different from what it ever was, retroactively. It's literally creating an entirely new history that nobody has ever actually examined, introducing the possibility that points along that history are inconsistent with what was intended at the point of each commit.
Commits serve two needs: saving your work and publishing it. Adopting an "early and often, explain what you did" approach is effective for saving, but when it comes to publication a "refine before release, explain why you did it" strategy is more valuable.
The commit history is an artifact of the development process, just like documentation, tickets, or even code. I'm sure you wouldn't complain about people taking the time to write better comments, and a commit message is like a super-comment, because it can apply across multiple files.
Honestly, do a maintenance programmer a favour - fix up your commits before publishing them. A linear history makes tools like bisect easier to work with.
I wonder if the difference here is in what your quality threshold for a commit is. I commit when I reach a point of coherence in the code, and ensure that the code passes tests before I commit. Each commit is thus a checkpoint of coherence, where the points in between may be out of order or failing tests.
Maybe I just don't consider "saving your work" to be a valid use case for commits. Use an IDE or other local tools for that. Commits are points that are worth saving (or "publishing" if you prefer) beyond your local workspace.
I don’t look at anything other than the merge to a trunk or main as part of the history. It’s not an audit log. I often do check point commits to move local state to a central git as a backup, or commit when I simply want to have a rollback option for something I’m not confident in. I always commit at the end of a day, for instance, and push to a remote, as I don’t trust my laptop or whatever, or worse some cloud dev machine.
None of these commits are useful for anyone, not even myself, beyond the immediate utility. I squash intermediate commits between change sets, and try to only reveal atomic change sets on any shared branch.
It’s absolutely the history of what has changed, but it is not some sort of journal log of every event in my development workflow the shared branch should absolutely be the evolved history of the source code, but without reflecting the work style of any one developer. It should be a comprehensible history of meaningful changes that can be independent reasoned about and cherry picked or reverted to as necessary. Every other commit is noise to everyone, including yourself, once it leaves your own branch. Since it didn’t even run in production there’s not even a plausible regulatory reason to keep them.
Why not have both? If you can filter by merges, what's the harm in having intermediate positions? There have been various points I actually wanted to have the vim undo log as well. That's what I'd really like - essentially a way of undoing back to time zero, with commits denoting feature complete positions and merges denoting, well, mergeable positions that have passed review.
This, I really don't mind merge commits, it's nice to see what happened when. Especially if you run into conflicts and issues caused by bad resolution it's much better to have a clear true history.
the point of git is to enable linus or al viro or whoever to review your proposed changes as quickly and efficiently as possible, so they can be confident that what they're merging into their kernel tree is relatively sane, and then to actually do the merge in a reliable way that won't introduce other unintentional changes, and to be able to reproduce their own previous state
in that context it makes sense to use rebase to present linus with the cleanest, most comprehensible patch set possible, not your lab notebook of all the experiments you tried and the obvious bugs you had. you don't want to waste linus's time saying 'you have an obvious bug in commit xyz' followed by 'oh, never mind, you fixed that in commit abc'
but for my own stuff i prefer merge over rebase because i'm both the producer and the consumer of the feature branch, and rebase seems like more work and more risk
If you run tests before commit then you also run them after rebase, same way as after merge. If tests failed - you can force pull your branch from remote and have the same state as before rebase.
> "Cleaner", for some definition of "clean". In this case, pretty, not accurate.
What do you mean "accurate"? The developer decides when to commit and what message to write, rebasing just enables more control over the final artifact that is shared.
Have you ever heard the writing advice: don't write and edit at the same time?
Rebasing allows one to use the full power of git during development, committing frequently, and creating a very fine grained record of progress while working, without committing to leaving every commit on the permanent record. The official record of development history is more useful if it's distilled down to the essence of changes, with a strictly linear history, and no commits that break CI or were not shippable to production (at least in theory). Doing so makes future analysis and git-bisect operations much more efficient, and allows future developers to better understand the long arc of the project without wading through every burp and fart the programmers did during their individual coding process.
To those who say, "don't commit until you have a publishable unit of work," I say, you are depriving yourself of a valuable development tool. To those who say, "don't rebase, just squash", I say, squashing is rebasing, just without curation. To those who say, "rebasing is more error prone than merging", I say, if a merge commit turns out to have a problem you will have a much harder problem debugging it because it could be caused by either branch, or an interaction which no one considered.
The beauty of rebasing is that it forces the developer to think about all the intervening changes commit by commit as if they started their feature development from the current state of the main branch. This is a more healthy mental model and puts more responsibility on the developer to ensure their code reflects the current state of the world, and not just hastily merging without recognition of what has changed since then. After all, production can only have one commit on it at a time, and given many investigations hinge on understanding what SHAs were in production at what point in time, it makes everything a lot easier with a linear history that hews closely to what was actually shipped.
I realize that there's a learning curve for rebasing, but once you understand it, it allows conflict resolution to be resolved much more precisely with roughly the same level of effort. You can dismiss this as an aesthetic preference, along with good commit messages, changelogs and other points of software craftsmanship, but in my experience that there is real value in maintaining a high quality history on a long-lived project.
Why does it matter what actually happened? Can you give a concrete example of when you care the exact sequence of experiments, false starts, and refinements that a feature went through before making it into a PR?
Realistically, how much does merging vs rebasing actually matter - do you save days of time over the year, or just a few minutes cumulatively because the commit graph is prettier?
I understand that it makes the history "cleaner," but how frequently do you end up bisecting manually searching the repo's commit history?
Even on large projects with dozens of feature branches that eventually make it through a dev / main / prod branch, I've never had a problem when merge was the default rule. But maybe we never hit an otherwise common problem.
Staying consistent matters. Once I joined a new team and my first task was to take care of a large code reformat everyone was afraid of taking over. I had done a similar thing many times, a super easy thing once you know git and are focused. It turned out a terrible experience for a combination for reasons, but one important reason was that the team used merge workflow, I had always done this in rebase. Don't remember the exact details though.
It's just plain harder to reason about the history of a branch when there are a bunch of merge commits in it. Even if I'm not using bisect (I rarely do), having the 'git log' be polluted by merges makes it harder for me to fit everything into my head.
I also prefer to think about my branches as 'here is a stack of commits on top of a fixed point in time', so having merges in the middle of that flow makes it much harder to reason about that way. Rebasing to choose a new fixed point is much simpler.
> Realistically, how much does merging vs rebasing actually matter - do you save days of time over the year, or just a few minutes cumulatively because the commit graph is prettier?
Excellent question! This is one of those how-to-measure-intangibles sort of question, so there's really no good answer to it. The issue is that bisection is really useful when you have a hard-to-find bug, and so the question is really "how often do you have such bugs", and the answer is hard to find because few companies require recording of such metadata in their bug reporting systems.
master branch - code currently deployed to production. never used as the base branch except rare hot fixes
staging branch - the base branch for all feature branches. On prod deploy, staging is merged to master with a merge commit (probably could (should?) be a rebase)
feature branches - always use staging as base branch
Most critically: all feature branches are squashed and merged, so that each single commit in master corresponds to a single PR.
Makes it easy to revert PR but difficult to cherry pick after squashing. Also keeps the hit history extremely clean and condensed. Not sure if this method will scale, but it’s working well at our company with 6 engineers with 20 or so feature branches open at any given time.
Edit: one reason this works for us is we keep feature branches short-lived (ideally at most 2-3 weeks) and staging gets merged to master twice a week (we do a deploy Mondays and Thursdays)
how often do you cut production from staging in this "gitflow light"?
in my opinion your staging/production parity needs to be really good if you do large iterations in prod, deploying smaller changes constantly will get you little oops moments more often, but you'll be able to fix them immediately since it's clear what caused it, as opposed as 2+ weeks of commits going to prod at once.
we had every feature branch create its own little minimal staging environment and started bugging developers to finish up after a week (the staging environment would tear itself down if not told to stay up explicitly via PR labels). and those feature environments went straight into main/production.
One benefit is you can have different branch restrictions. Staging can be less strict (e.g. allow merging even if test suite is failing while master can have stricter requirements, like no merging unless tests are passing.
Or only require code owner approval on staging->master (or only requiring code owner approval on merging to staging)
I’m sure there are ways to accomplish the same sort of thing with tags, I’m not hugely tied to this workflow (other than it seems to work for our team)
I’m confused. Is the workflow to rebase branches and squash merge into main? Because that’s what I do at work and it works quite well. You get atomic PR merges so reverts are easy and you get the clean history for a PR so people can in theory review commit by commit. Although if you want to use merge commits in your own branch, I don’t care because it all gets squashed. I don’t fully get using rebase to merge PRs cause then it’s exposing the commits of the PR, when in fact the PR should be considered atomic code changes. But I suppose for workflows where PRs are not considered atomic code changes, rebasing could make sense.
Really, what this boils down to is a confusion between commits as a save point and commits as an atomic code change. With my aforementioned process, commits inside a PR are save points, I.e. I need to just save my code before leaving work, while commits on main are atomic code changes (and therefore should correspond to a single pull request). In the rebase-everything approach all commits are atomic code changes, which I find a little too obsessive since you need to make sure your code is always working when you commit or rewrite your history so that is true.
The article author appears to weasel-word merge commits and squash-merge together when they are very different things. Squash-merge into main / feature branch is almost equivalent to rebase and is the workflow Github / Gitlab / etc supports well in the UI. The article author might be conflating rebase and squash-merge in order to create clickbait. In particular the author cites lots of “private repos” but gives no evidence because I guess they’re private haha.
I like to use pure merge commits on my solo projects (where I actually do use feature branches), because I practice good commit hygiene (and clean up sloppy commits with interactive rebase). But for collaborating with others who can't be bothered to practice good commit hygiene, blindly squashing every feature branch before merging is definitely the lesser evil.
My commits on a PR are always rebased as I go, into one or two or at most three neat changes. Meanwhile (some) others I work with seem to have no problem creating PRs consisting of a dozen or more changes, most of which with messages like “wip”, “typo”, “fix comment” etc.
I think at scale, merges are too problematic, however for a long time I worked on a (small) team who took the approach that got history should represent “what you actually did”, and the thought process behind that, rather than the “perfect ideal” of the changes being made.
This brings its own benefits, it is often easier to learn from the commits, it’s often easier to review because you have more granular commits and can follow a dev’s thought process. And if you’re doing post-merge review as we did in the early days, you don’t lose that granularity when squashing. A nice bonus was that because there was no rebasing, no one ever really “broke git”, a classic issue for more junior developers. Ultimately the approach didn’t scale beyond ~8 devs/~500k lines/~15 PRs a day, but it was good for a long time.
The important thing though is: have a git style guide, make decisions for reasons that matter to your team, and stick to the style guide.
Aside from the obvious advantages, `git bisect` with rebase-managed trees works to single patch resolution as you would hope. On merge-damaged histories it only traces it to one giant hairball or another, ie, bisect is made useless.
Right, squash-and-rebase is not very good. You want meaningful history. So if you have a feature branch that adds some feature consisting of N sub-features, R refactorings, B bug fixes, then you want N+R+B commits. The commits you squash are the "fixup" commits; all others stay. You also want good commit order. Basically, write your code, commit, then refactor your commits so that you get nice history of all the changes you made as if you were starting from the final state of play (because you are).
Oh... well, for benefit of others since you seem to know well already, there's no need for merge at all if you are just adding patches that have already been rebased to the HEAD of the tree they're going on. They just go on top cleanly and naturally, and all right with the world. Consumers of the branch fast-forward as normal. There's a pure, linear history with no hairballs that people can bisect to patch granularity. Just say no to merge.
(A workflow that preserves those commits requires actually having useful commits, and obviously if you have PRs with commits like "fix it" and "fix it more" then it might as well get squashed.)
If that's the state of developer proficiency with version control then, ideal workflows aside, we need a way to prevent those noise commits from ending up in master, and the easiest way is to squash with every pull request, as part of the "merge" automation.
I don’t mind squashing either, unless I’m being really intentional or rewriting my history my intermediate commits couldn’t be reverted without leaving stuff broken (totally a me problem of course).
I find that the commit history tends to grow viciously for anything I've been involved with. And I fail to see the benefit of amassing that amount of detail once you are past the stages where each individual commit is reversible (or even interesting)
So, for a project that runs for, say, three months, the commits of the first few weeks aren't really very interesting or valuable at all at the end of the period. Just hard drive space being eaten up. YMMV.
Right now I have a full clone of a pretty large monorepo dating back almost nine years, and the .git dir is less than half of the total space. Sparse checkouts and shallow clones can make clown car hardware sort of work, but I do not want to go back to the pre-git days and try to work without full history to conserve 0.008 TB of SSD. We spend more than that on coffee.
Small PRs are better since they're easier to review.
PRs often have a lot of overhead. They need a separate branch, CI jobs need to run, there are more notifications for everyone, separate approvals, etc.
Sometimes there's a need for keeping separate commits if they're all related to a single change. Proposing them all as part of the same PR helps maintaining that context while reviewing as well. Reviewers can always choose to focus on individual commits, and see the progression easily.
Sometimes it does makes sense to squash a PR if the work is part of the same change, but the golden rule of atomic commits always applies. Never just blindly squash PRs.
In fact, if the PR is messy and contains fixes and changes from the review, and the PR should have more than one commit, I go back and clean up the history by squashing each change to its appropriate commit. `git commit --fixup` helps with this, so I also prefer addressing comments locally rather than via GitHub's UI. Then it's a simple matter of running `git rebase --autosquash`.
This is of course a balancing act. Which is my point. Sometimes it makes sense to split things up into multiple PRs. But sometimes it makes sense to fatten a PR a little bit with multiple commits. You can't just say "small PRs are better." Size is but one dimension of what makes a good PR.
This is why I personally use both "squash & merge" and "rebase & merge." If a PR has a bunch of commits but is really just one logical change, then I'll squash it. But if a PR has thoughtful commits, then I'll rebase it and do a fast-forward merge.
My bottom line is that I try to treat the source history as well as I treat the source. The source history is a tool for communicating changes to other humans, both for review and for looking back on. Squash & merge has a place in that worldview, but so does rebase & merge.
For this reason I don't bother with Github and the like and just use Gerrit ;)
Having to fix the same merge conflict for each of your commits is one of the leading causes of developer burnout :D
https://hn.algolia.com/?q=git+rerere
Regardless, `git rerere` is supposed to solve that problem, but I don't do enough conflicting merges to be intimately familiar with it in practice.
It's possible the tooling could handle that case much better, but until it's sufficiently better that it's as simple as `gh pr create` by the author and one click of a merge button (or equivalent "@somebot merge") by the reviewer, that's still too much.
Also, a lot of people (myself included) write really crappy commit messages that don't tell the whole story behind a change. This is another reason why falling back on the PR has been valuable for me.
(Merging has the same problem, so I squash frequently and then rebase.)
Also having many commits does not means it's going to be easier to revert / fix than a single big one.
Can you give an example? I don't even think I understand what you're saying. You control your overall summary, so it's up to you to make it useful.
I would cite clarity as my reason for squashing! I think most people are just bad at organizing (& naming) their commits, so they make A LOT of vacuous ones, crowding out the forest for the trees. It's never helpful to see the git blame saying things like "Addressed PR comments" or "Resolved merge conflict", etc.
I do prefer a merge commit when the author has done a good job with their own commits, but that's rare. In all other cases, squashing is greatly preferable to me.
> Can you give an example?
Sure. Here's a common pattern I've used and seen others use:
Commit 1: "Introduce a helper for XYZ", +35,-8, changes 1 file. Introduce a helper for a common operation that appears in various places in the codebase.
Commit 2: "Use the XYZ helper in most places", +60,-260, changes 17 files. Use the helper in all the mechanically easy places.
Commit 3: "Rework ABC to fit with the XYZ helper", +15,-20, changes 1 file. Use the helper in a complicated case that could use some extra scrutiny.
I don't want those squashed together; I want to make it easy, both in the PR and later on, to be able to see each step. Otherwise, the mechanical changes in commit 2 will bury the actual implementation from commit 1 in the middle (wherever the file sorts), and will mask the added complexity in the commit 3 case.
"Cleaner", for some definition of "clean". In this case, pretty, not accurate.
I just can't understand the draw of rebase based workflows. It seems to be an expression of a preference for aesthetics over accuracy. What is the point of source control, other than to reliably capture what actually happened in history? As soon as you start rewriting that, you compromise the main purpose.
Using merge commits preserves what you actually committed. If you ran tests before you committed, rewriting that commit invalidates that testing. If you need to go back and discover where a problem was introduced or what actually happened, with certainty, in a commit history, rebase undermines that, because it retroactively changes your commits.
It's like a huge portion of the industry is collectively engaging in a lie, so that our commit histories look prettier.
Unless you're committing every keystroke, you're recording a curated history. You choose when to commit, and by choosing you declare some historical states to be worth keeping and the rest to be merely incidental.
I think usually history "rewriting" (eg, rebasing) is much more about curation - choosing which aspects of the history you care to record - than it is about presenting a false record.
When I go back and look at the git history, I would much rather have had someone do the work of compiling the story for me at the time. Commits are your chance to document what you did for future programmers (including future you). If you insist on them faithfully reflecting every change you made over the course of three days, then future you will have to piece that all back together into a coherent story.
Why not take the chance to tell the story now, so that future you can skip all the false starts and failed experiments and just see the code that actually made it into main?
Commits serve two needs: saving your work and publishing it. Adopting an "early and often, explain what you did" approach is effective for saving, but when it comes to publication a "refine before release, explain why you did it" strategy is more valuable.
The commit history is an artifact of the development process, just like documentation, tickets, or even code. I'm sure you wouldn't complain about people taking the time to write better comments, and a commit message is like a super-comment, because it can apply across multiple files.
Honestly, do a maintenance programmer a favour - fix up your commits before publishing them. A linear history makes tools like bisect easier to work with.
Maybe I just don't consider "saving your work" to be a valid use case for commits. Use an IDE or other local tools for that. Commits are points that are worth saving (or "publishing" if you prefer) beyond your local workspace.
None of these commits are useful for anyone, not even myself, beyond the immediate utility. I squash intermediate commits between change sets, and try to only reveal atomic change sets on any shared branch.
It’s absolutely the history of what has changed, but it is not some sort of journal log of every event in my development workflow the shared branch should absolutely be the evolved history of the source code, but without reflecting the work style of any one developer. It should be a comprehensible history of meaningful changes that can be independent reasoned about and cherry picked or reverted to as necessary. Every other commit is noise to everyone, including yourself, once it leaves your own branch. Since it didn’t even run in production there’s not even a plausible regulatory reason to keep them.
in that context it makes sense to use rebase to present linus with the cleanest, most comprehensible patch set possible, not your lab notebook of all the experiments you tried and the obvious bugs you had. you don't want to waste linus's time saying 'you have an obvious bug in commit xyz' followed by 'oh, never mind, you fixed that in commit abc'
but for my own stuff i prefer merge over rebase because i'm both the producer and the consumer of the feature branch, and rebase seems like more work and more risk
If you run tests before commit then you also run them after rebase, same way as after merge. If tests failed - you can force pull your branch from remote and have the same state as before rebase.
What do you mean "accurate"? The developer decides when to commit and what message to write, rebasing just enables more control over the final artifact that is shared.
Have you ever heard the writing advice: don't write and edit at the same time?
Rebasing allows one to use the full power of git during development, committing frequently, and creating a very fine grained record of progress while working, without committing to leaving every commit on the permanent record. The official record of development history is more useful if it's distilled down to the essence of changes, with a strictly linear history, and no commits that break CI or were not shippable to production (at least in theory). Doing so makes future analysis and git-bisect operations much more efficient, and allows future developers to better understand the long arc of the project without wading through every burp and fart the programmers did during their individual coding process.
To those who say, "don't commit until you have a publishable unit of work," I say, you are depriving yourself of a valuable development tool. To those who say, "don't rebase, just squash", I say, squashing is rebasing, just without curation. To those who say, "rebasing is more error prone than merging", I say, if a merge commit turns out to have a problem you will have a much harder problem debugging it because it could be caused by either branch, or an interaction which no one considered.
The beauty of rebasing is that it forces the developer to think about all the intervening changes commit by commit as if they started their feature development from the current state of the main branch. This is a more healthy mental model and puts more responsibility on the developer to ensure their code reflects the current state of the world, and not just hastily merging without recognition of what has changed since then. After all, production can only have one commit on it at a time, and given many investigations hinge on understanding what SHAs were in production at what point in time, it makes everything a lot easier with a linear history that hews closely to what was actually shipped.
I realize that there's a learning curve for rebasing, but once you understand it, it allows conflict resolution to be resolved much more precisely with roughly the same level of effort. You can dismiss this as an aesthetic preference, along with good commit messages, changelogs and other points of software craftsmanship, but in my experience that there is real value in maintaining a high quality history on a long-lived project.
Deleted Comment
I understand that it makes the history "cleaner," but how frequently do you end up bisecting manually searching the repo's commit history?
Even on large projects with dozens of feature branches that eventually make it through a dev / main / prod branch, I've never had a problem when merge was the default rule. But maybe we never hit an otherwise common problem.
I also prefer to think about my branches as 'here is a stack of commits on top of a fixed point in time', so having merges in the middle of that flow makes it much harder to reason about that way. Rebasing to choose a new fixed point is much simpler.
Excellent question! This is one of those how-to-measure-intangibles sort of question, so there's really no good answer to it. The issue is that bisection is really useful when you have a hard-to-find bug, and so the question is really "how often do you have such bugs", and the answer is hard to find because few companies require recording of such metadata in their bug reporting systems.
master branch - code currently deployed to production. never used as the base branch except rare hot fixes
staging branch - the base branch for all feature branches. On prod deploy, staging is merged to master with a merge commit (probably could (should?) be a rebase)
feature branches - always use staging as base branch
Most critically: all feature branches are squashed and merged, so that each single commit in master corresponds to a single PR.
Makes it easy to revert PR but difficult to cherry pick after squashing. Also keeps the hit history extremely clean and condensed. Not sure if this method will scale, but it’s working well at our company with 6 engineers with 20 or so feature branches open at any given time.
Edit: one reason this works for us is we keep feature branches short-lived (ideally at most 2-3 weeks) and staging gets merged to master twice a week (we do a deploy Mondays and Thursdays)
in my opinion your staging/production parity needs to be really good if you do large iterations in prod, deploying smaller changes constantly will get you little oops moments more often, but you'll be able to fix them immediately since it's clear what caused it, as opposed as 2+ weeks of commits going to prod at once.
we had every feature branch create its own little minimal staging environment and started bugging developers to finish up after a week (the staging environment would tear itself down if not told to stay up explicitly via PR labels). and those feature environments went straight into main/production.
People that do trunk-based development would consider 2-3 weeks as quite long. My definition of short-lived is about 1 day.
I’m guessing with a 1 day PR length, you’re mostly pushing finished code ready for peer review.
Or only require code owner approval on staging->master (or only requiring code owner approval on merging to staging)
I’m sure there are ways to accomplish the same sort of thing with tags, I’m not hugely tied to this workflow (other than it seems to work for our team)
Really, what this boils down to is a confusion between commits as a save point and commits as an atomic code change. With my aforementioned process, commits inside a PR are save points, I.e. I need to just save my code before leaving work, while commits on main are atomic code changes (and therefore should correspond to a single pull request). In the rebase-everything approach all commits are atomic code changes, which I find a little too obsessive since you need to make sure your code is always working when you commit or rewrite your history so that is true.
My commits on a PR are always rebased as I go, into one or two or at most three neat changes. Meanwhile (some) others I work with seem to have no problem creating PRs consisting of a dozen or more changes, most of which with messages like “wip”, “typo”, “fix comment” etc.
This brings its own benefits, it is often easier to learn from the commits, it’s often easier to review because you have more granular commits and can follow a dev’s thought process. And if you’re doing post-merge review as we did in the early days, you don’t lose that granularity when squashing. A nice bonus was that because there was no rebasing, no one ever really “broke git”, a classic issue for more junior developers. Ultimately the approach didn’t scale beyond ~8 devs/~500k lines/~15 PRs a day, but it was good for a long time.
The important thing though is: have a git style guide, make decisions for reasons that matter to your team, and stick to the style guide.
Deleted Comment
The blog post is advocating a "squash, rebase, and merge" flow, which leaves equally giant hairballs.