> Once you understand what rebasing is, the most important thing to learn is when not to do it. The golden rule of git rebase is to never use it on public branches.
For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.
The way I phrase and teach what I consider to be the important rule of git is:
> Don't rewrite history on shared branches with proper communication.
I don't teach "never", I don't teach that `main` is special, I don't teach that force pushing is forbidden, because I don't believe in those things.
I highly prefer a rebase-heavy workflow. In addition to not "cluttering" the history, it's an invaluable tool to keep commits focused on "the right level" of atomic changes.
You can simply pass flags to “git log” to hide merge commits, without needing to rewrite history to “destroy” that information. While they are often noisy, sometimes they can be useful. I usually prefer to hide information rather than destroy it.
It’s annoying when someone force pushes to a branch that you just reviewed, but you can no longer see the history so you have to scan through the whole PR you already reviewed looking for the change. Please just commit the fix, let me see it, then squash it.
I think squash merges are a last resort heavy-handed tool for dealing with developers who refuse to clean up their commit history before merging. Most developers can do better by hand.
Git history should tell a simple, understandable story of each change. For example: 1) refactor existing code, 2) add feature. Or 1) add missing tests, 2) refactor existing code, 3) add feature.
But since you're working on the fly with imperfect knowledge, it doesn't happen in such neat steps. Refactorings and behavior changes end up interleaved in your raw git history, so you need to do a little bit of cleanup by hand in order to present a simple story in the commit log.
Of course if you have developers that don't do that and instead merge dozens of commits that just say wip, wip, wip, lol, fml, wip, wip, lol, yolo and you can't fire them or get them to change, then squash merges ftw.
I actually hate squash merge because of all the noise it adds. Sure, the commit graph looks nicer, but it come with a terrible loss of information when doing git blame.
I'm a big proponent of rebase and squash if it helps to make a commit more coherent, but we use squash merges by default in the current project I'm working on, and I die a little bit each time I try to understand what changes were related to a line when tracking down a bug.
Now you can see your features in a nice history and also have added benefit of seeing intermediary commits. Pro tip: merge commits aren't required to use the canned "Merge branch into..." message, you can give it any message you want, such as "feat: ..." or whatever your convention is.
I hate that branch squashing has become something of a defacto. I actually do rewrite my history and often add context to my commits. `git blame` can be an incredibly useful tool to get context about a given small change. Getting a massive diff for a whole feature is much less so, especially since you can just look at the diff of the merge commit.
What I think I see these days is squash merges being used lazily to avoid having to do anything to build a clean history with clearly semantically delineated commits. Squash merges are good compared to an alternative where people check in super messy noisy branches, but they unfortunately have a big downside because squash merges can make bisecting and history spelunking more difficult, when the branches that are squash merged were big.
My rule of thumb for commits is that they should be of a size and scope suitable for cherry-picking. So, maybe I'm working on a small feature that entails three changes, and each of those three changes is useful in and of itself and could conceivably be cherry-picked by others. I would create three separate commits, generate a PR with all three, and merge in the work. Sure, I could squash merge and end up with one merge commit encompassing all three changes, but now none of those three changes is cherry-pickable.
They do but they have their own issues. e.g. having to delete local branches using git branch -D instead of git branch -d and getting the protection from deleting unmerged work.
I still agree that on balance annoyances like that might still be worth putting up with for larger teams with mixed skill levels.
I don't mind merge commits, it's the 100 tiny individual commits some developers seem to like to do that really clutters things up. Yes, I know, git squash is a thing, but not committing until the feature is working and ready to commit is also a thing.
> not committing until the feature is working and ready to commit is also a thing
That leaves you prone to losing work if you have a false start that you need to back out of. I prefer to commit early and often on my private branches, then before submitting a pull request I clean up the history to where there are a few good commits that form useful, standalone chunks (ideally the test suite fully passes on each commit).
> For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.
The purpose of history is to remember. Rewriting history, whether git or in life, is bad; outside of the context of don't use it on public repos. Such advice is similar to saying, only point the shotgun away from you when firing. If you have to remember such a rule, it's best to avoid it.
> I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.
I've heard this many times before, but haven't been able to figure out why this is a problem. In your workflow is it a problem to have a cluttered commit history? If so, could you explain how?
> I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.
GitHub recently added a feature that prompts people to update their branches via merge. It's frustrating because every PR now had dozens of merge commits polluting the history.
A PR with merges is fine by me, it lets me see how the PR has evolved.
What I want is for GitHub to track changes between sets of commits in a PR so that you can do most of the review with merges and "address review comments" commits, and then rebase into well organized, logical commits and review that those have the same diff as the messy history after a force push.
I find it fascinating that people talk about "Having a history of what people did" in such emotive terms - "Cluttering", "Polluting".
What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.
Let it go. Accept that coding is not a smooth, robotic, endeavour, where everything is always tidy. And that's just fine.
I've accepted this a decade ago.
I put my ego on the side, and now I don't care if my git history doesn't look like "beautiful" when looking at the commit graph.
I've been working on dozens of projects since, and probably did thousands of commits. Some of the teams of those projects included dozens of developers working concurrently on the same codebases. We always merged the upstream branches into our development branches and never did any rebases.
I have NEVER ended up in a situation where I thought rebases would have been better.
The git tools and IDE integrations of our current age allow me to find any information I need from the history without pain.
Have you ever had to use git bisect? That's really where a 'clean' git history is important. Plenty of people never use git bisect, and that's fine too. That said it's a very useful tool when you do need it, and can drastically simplify finding when and where a regression was introduced.
The point of a clean git history is not to have a clean git history. The point is to make it possible to debug later, via bisect, or show, or even just a diff. The point is to make the workspace clean for the next guy.
Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.
It's hard to tell what side you're on, because both sides refer to their stance as "clean history".
The pro-revisionists (squash, rebase) say they do what they do so the history looks clean (no intermediate commits breaking stuff, a "straight line" graph, etc)
The anti-revisionists say they do what they do so the history looks clean (can see the actual development, can safely diff different commits to see what changed in between, see the log in chronological order, etc).
> Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.
Again, both sides could argue that they're the ones with more discipline.
> The point is to make it possible to debug later, via bisect, or show, or even just a diff.
This sounds anti-revisionist.
> The point is to make the workspace clean for the next guy.
This is one of the most common pro-revisionist arguments.
100% agree, but nobody gives a shit, and I’ve learned to just let it go. I’ve been in so many meetings, seen so many PSAs, and you know what happens every single time? Nothing. Maybe a couple people learn what interactive rebase is for the first time, try it once, say “it lost all my code” and never try it again. Good luck explaining ref log in these cases.
Did you notice, though, that rebase advocates use very "emotive" terminology when talking about git history? Like it's a subject they care about? Seems awfully touchy feely.
> What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.
And git blame. And git checkout to a past state. It "doesn't matter" only if ease of understanding your project history doesn't matter.
how often is "understanding your project history" something that actually comes up for you? In all my years of working with projects in git, I will occasionally look at my history to help me find a change that may have led to a bug, but it really only comes up for me once or twice a year and even then, it is rarely an extensive deep dive and never very far back in time.
I never use rebase, and I've never once had trouble understanding who did what where and when, even in a large project with 500+ users.
That being said, after reading this stuff, I may start using it on my local branches to clean up multiple commits into one tidy one, but that's about it.
Every time I try to blame or bisect and just end up stuck on an irrelevant megacommit I curse the Git maintainers that don't have the backbone to just get rid of --squash.
Every time I try to review a PR and the bookmark resets because they decided to force push I curse the Git maintainers that don't have the backbone to just get rid of rebase.
I think if the definition of a “good history” is “clean and not messy”, then yes I agree that’s pointless. If the definition is “a clear ability to see what changes were made, by who, and most importantly why” I think that’s incredibly necessary and would even go so far as to say it’s naive at best to not support.
The amount of time that has been saved in my life by someone leaving an explanation in their commit (for some weird edge case or context I’d have no way of gleaning because they’ve since left the company) is SO much more than the extra time I’ve put in to make sure the history has this extra info in it.
What's worse, the desire for cleanliness ends up making things like `git bisect` less useful.
If I had a bad day and introduced something stupid, I want a bisect to point me a the code I wrote on that bad day. If you squash liberally, perhaps because you want each commit to correspond with a release-note, you're going to lose that debugging granulariry.
The git history of a project is the main source of knowledge on that project, once the people that wrote it are gone. The git history answers questions such as "wtf is that supposed to do?", "what's this code connected to?", and "why did they do it that way?". You can use other kinds of documentation, but the git history is always there, so it makes sense to make it semi-useful.
This is such a strange thing to say. I'd be curious if you feel the same way about cleaning up your code, or cleaning up your room. I think you have an unfair advantage in this argument because it's difficult to defend such intangible benefits. We have to resort to making up logical explanations, or sounding unhinged or emotional as you suggest.
But it's simply intangible. My instinct tells me that it's helpful and that's okay. I don't owe anyone a justification for how I organize things, and there's nothing controversial about this. (Or maybe I could even come up with a logical example of a benefit, but that's a trap I'm not going to fall into) And a lot of people agree, and they know what I mean, so it's not merely an individual preference. If I have to work with someone who has strong preference against it I'll worry at that point about negotiating.
> I'd be curious if you feel the same way about cleaning up your code, or cleaning up your room
Very genuinely: I do not care at all whether you clean your room starting from left and continuing to right. Or, starting from doors and continuing toward window. Or whether you clean it in a random order. I also do not care about whether you clean every Friday or whenever you feel like. That is the equivalent of git history. Because this excessive care about git history is just that - insisting that room is cleaned from left to right as if any other order was an issue.
The reason why it is hard to defend the tangible benefits of this or that git history strategy is that there are very little benefits.
A clean git history on a pull request also makes it easier for the reviewer to understand your code. Small, concise commits will tell the reviewers about your train of thought or what issues did you run into, making it easier to pick up the context. I start with every code review by looking at the commit history.
I prefer not to have squash commits in our team for this reason. It makes master look good, but usually nobody ever looks at the master commit history first, they look at the merged pull requests. However, everybody must look at the commits you made in a pull request. If you have squash commits, you are encouraged to have messy commit history in your pull requests, leading to meaningless commit messages and even large commits (causing other problems...).
IMO the only advantage of squashing is that it makes it easy to roll forward when you accidentally deploy something that causes problems.
Yeah we use pull requests for the coarse-grained stuff and leave the small commits, which should also have good comments, intact. Maybe other shops use pull requests differently.
Agree, plus let's avoid having the CI pipeline creating commits in the remote repo. I like CI/CD to be stateless with regards to the files in the repository. I tried to plea for this today with my colleagues with very mixed results
I’ve never understood the tradeoff of rebasing, squashing or otherwise “keeping a clean history”. It always seemed like tons of sometimes highly error prone work (sometimes you can wipe out a colleague’s work with it! Wtf!), for almost no gain (why does it matter that the git history is “clean”?).
* press the "annotate" button in my IDE and can see which commit introduced each line
* run "git bisect"
* use "tig" to drill down through the history of a file (shortcut "," is "move to commit preceding current line's blame commit")
...every step of the way, I get a meaningful description of why a change was made and what other diffs were necessary to achieve that change. And not just "fix", "bug", "PR commments".
> * press the "annotate" button in my IDE and can see which commit introduced each line
In PyCharm, I can see which commit introduced each line, regardless of branching. Same with drilling down through a files history. Is this an IDE limitation you're seeing?
> every step of the way, I get a meaningful description of why
Isn't this more about commit messages, than anything else?
> why does it matter that the git history is “clean”?
Makes reviewing a set of changes prior to a merge much easier. It's nice if there's a 1:1 correlation between a commit message and the actual patch contents.
Im sure you've dealt with the case of reviewing a colleague's changes with a commit message like "Enable logging in foobar module" and the patch is actually enabling foobar logging and a bunch of other stuff.
This makes bisecting your git history to identify and fix bugs much more difficult.
If the git history is clean, you can just read the commit messages and implicitly trust the developer if clean git hygiene is in place (as opposed to actually needing to read the whole diff on a per-commit basis to find out what _actually_ happen at commit XYZ, despite it's message).
For me the big gain is at the code review stage. It's much easier to review a set of patches that are a clear and distinct sequence of changes without "oops, fix bug" changes later in the series. It does require extra work by the code author, but it means less work for the code reviewer. Depending on the project and the organisation and the workflow, that can be a worthwhile tradeoff.
Never understood why you wouldn't want it clean. There's no benefit whatsoever to it being messy and it's a liability for a lot of reasons, whereas the clean version is free and easy and makes everything you do that interacts with git history simpler.
There is a giant benefit to it being messy. And that is that the mess is the actual history.
Every time you do a git rebase, you are literally asking your source control system to lie about history. If you mess up, and you eventually will, you're then forced to manually figure out what the history really was despite being lied to. If you mess it up, well, good luck.
I used to work at a company where someone (we never figured out who) in another group would rebase every few weeks. We didn't find out about it until their stuff was pushed then released. The result was that features which we'd written, QAed, and released to production would simply disappear a few weeks later. With no history suggesting that it ever existed.
Have you ever been pulled off of a project to go fix a project from a month ago which has disappeared from source control? You don't know what happened, you no longer have context, you've just got complaints because your stuff no longer works.
Is your desire for a "clean history" worth potentially creating THAT disaster for other developers on your team???
You can have it clean without rebasing. This is simply a matter of properly visualizing the history. Unfortunately, most of the major version control systems have decided to basically just dump a raw graph rather than presenting the history in a more user-friendly fashion.
As your parent already said, (and it matches my experience), we didn't have any of these problems when using systems without history rewriting (mercurial, for example).
I recall when I first switched to git at work and the team was insisting on a "linear history", I was bemused: Could these developers really not handle a merge graph? It was bizarre how something straightforward in other VCS is suddenly "messy" amongst git folks.
I'm a fan of rebase myself, but understand the point made above. For me, the biggest pro of a clean history is when doing `git blame`. If the history is clean and the commits are good, it might solve my issue. On the other hand, if the commit in question is a huge mess of unrelated things it doesn't help me at all. I also find it way easier to review a PR with a clean, well-described history.
You can run ‘git log —-first-parent’ that will give you the same output as squash merging, without losing the ability to effectively manage stacked branches/PRs.
But because GitHub and other tool’s version of rendering history just flatten merge commits into spaghetti we’re stuck with squash merge. Thanks GitHub.
Another thing is if you keep commits in a clean "state", it is easier to revert a commit, when you squash or keep them messy it can make it harder to revert.
Also sometimes you decide you want to backport some change to other releases, and if commits are in a good state, it is much easier to do this.
Sometimes when working on an old code base built by developers that came and went, one needs to perform what I call "code archeology": going back in time to understand why a feature was implemented the way it was.
Whether this is feasible at all depends largely on the care developers put in structuring their commits.
This has become a large chunk of my job over the past few years, as part of fixing/upgrading systems no one has touched in a decade, and none of those original people are still here. There are some weird things in there I've only been able to figure out because all the svn history still exists.
When an engineer made a change is of no consequence to me. When it got merged into the main branch does matter a whole lot if you're doing trunk-based development.
A badly done merge can indeed ruin code. But you'll always have the versions that went into the merge, and the merge itself. Your history has all of the information to recreate exactly what happened, find what changed, and then figure out how to fix it.
A badly done rebase not only ruins your work, it also removes from the branch any record of your work having been done. Unless you can find the right stray old commit which is not yet cleaned up, there is no choice but to start doing it again from scratch.
It's for humans. You can more easily cycle to a specific point. I find linear history easier to comprehend. But it's not like a game ender. People will do whatever they will.
I find it easier to run git binary search with it like this too.
"Clean history" is not a principal reason for VCS. "Full history so you don't accidentally lose something and can revert to any point" is the principal reason. When "clean history" conflicts with "full history", the choice should always defer to the latter. Rebase clearly breaks the full history principle.
I love rebase (I'm a tip-of-master-only person, no merges ever, squash all your commits with `rebase -i` before pushing and write one good commit message for the group). But there's one really, really irritating thing about them:
You should not be able to use `--amend` during a rebase.
For me editing all my changes onto the commit I'm working on with `git commit -a --amend` (or as I've aliased it, `gcaa`) is automatic; I do it 500 times a day, just to save my work. But I can't count how many times I've been in the middle of squashing commits and accidentally typed `gcaa` and amended someone else's commit after fixing a merge conflict, and it's super annoying to unwind (if you realize after typing `rebase --continue`) so usually I end up just giving up and starting over. I really wish amending to a commit that wasn't one of the ones you're rebasing was just totally disabled.
I guess there are some other small complaints, like the annoying reversing of `--ours` and `--theirs` from what makes sense (yes, it makes sense if you have the internal model of rebase instead of the intuitive one, but that's stupid), rebase's tendency to pick the wrong parent commit if you've accidentally amended someone else's commit (and therefore lag a while and then produce a rebase log of 1000 commits or something), and the utter tedium of editing the rebase log to replace every instance of "pick" with "s" for squash except the first, since almost 100% of the time what I want to do is squash everything (and use the last commit message, not the first, and definitely not all of them munged together which is the default).
I would love a separate command or a flag, like "git rebase --tip" that does all of this automatically for my otherwise extremely elegant workflow (and I'm gonna be really bummed if it turns out it exists and I didn't know about it for the last 5 years...).
Random thought: given you already have the gcaa alias, perhaps you could include a check that .git/REBASE_HEAD doesn't exist in that?
Probably easiest as a little shell function like
gcca() {
local GIT_DIR
if ! GIT_DIR=$(git rev-parse --git-dir); then
return 1
elif test -f "$GIT_DIR/REBASE_HEAD"; then
printf 'Rebase in progress: commit --amend is disabled\n' >&2
return 1
fi
git commit -a --amend "$@"
}
rather than an alias?
[Edit] I forgot about rev-parse --verify, which simplifies this further:
gcca() {
if git rev-parse --verify REBASE_HEAD >/dev/null 2>&1; then
printf 'Rebase in progress: commit --amend is disabled\n' >&2
return 1
fi
git commit -a --amend "$@"
}
This also leaves you still able to use commit --amend long-hand if (for example) you want to edit one of your own commits during rebase -i.
That's a great idea. Git problems are sorta in the category of problems I've avoided trying to solve because, can't solve everything, and I've been hacking around it successfully instead.
> accidentally typed gcaa and amended someone else's commit after fixing a merge conflict
You could try reverting the first commit on the HEAD once you finish the rebase. This is of course assuming your branch and the last commit don't touch the same files.
There's a false dichotomy nobody addresses here, which is the notion that there needs to be such a thing as "the" history for you to get the benefits of a clean history.
If all you really want is a linear history, then just do merges, and make sure the "first parent" is the main branch (which you can enforce with tooling). Now you can just traverse solely the (linear!) sequence of first parents, which is exactly the same view squashing would have given you, except without the information loss.
If for some reason you can't stand the idea of something branching off your main branch at all, then set up a separate job that automatically squashes everything onto a branch that only it can write to (or branch from). Now you have a truly linear history with nothing branching off it, exactly as you would've had with squashing. And you can always reproduce it on demand.
That way you avoid the information loss, and can always do archeology on the full evolution graph if needed.
If I were to write a blog post on this I’d make a few do’s and don’ts (why make a blog post when you can blog in HN comments?)
Don’t merge the base branch into a feature branch. Rebase to “update”.
Do use rerere and the curse of fixing the same conflict over and over is (almost) gone.
Don’t rebase (or force push for other reasons) a shared branch. Rule of thumb here is you can probably rewrite history if you work with _one_ coworker in a branch but any more than that and you’re more likely than not to upset someone.
Do rebase -I HEAD~N to reorder/reword/squash into easily reviewable sequences of commits.
Don’t force push after review, until the review is complete. This keeps the history of the review process but you can later merge the fixups with the commits they logically belong in right before merging.
Do use Merge, Squash and “Rebase+FF” as appropriate for merging PR. There is no best solution for every scenario so prescribing “always merge” or “never merge” or similar isn’t helpful. A good rule of thumb though is that IF a branch has merged from the parent branch to update (which I suggested was a “don’t”) then avoid merging it back. A branch that was updated that way is better to e.g squash when merging back.
> Don’t force push after review, until the review is complete. This keeps the history of the review process but you can later merge the fixups with the commits they logically belong in right before merging.
I've had to talk to soooo many developers about this. I want to see what changed since my last review, not restart my review.
When I force push to a branch on GitLab with an open merge request, GitLab retains the previous set of commits, and provides an interface within the merge request to compare the current set of commits to previous sets. I love this feature.
This is usually caused by merging an upstream branch (e.g. develop) into your feature branch and then later trying rebase it.
Effectively the commits you've merged in from develop undo the changes you've made in your feature branch. You fix them but the foreign commits undo the changes again.
The solution is actually pretty easy. Use git rebase --interactive to remove any commits from the rebase that aren't directly part of the feature work.
You may still have an odd merge conflict to fix but you'll only have to do it the once and everything should go smoothly.
I would also recommend never using the same commit message twice. When you have a list of 10 commits all called "Wip" it's hard to tell which are obviously duplicates that can be deleted.
git-rebase is stupid because somebody doesn't know how to use it?
I use it all the time and I really like how I can make garbage commits (wip, test) and then squash them into atomic commits which are easy to review and later on easy to bisect when inevitably mistakes happen. Sure I've fucked up too when I was learning on how to use it and those were some painful mistakes but only through using it and making those mistakes have I learned to use the tool to great advantage (clean history).
> git-rebase is stupid because somebody doesn't know how to use it?
The whole purpose of source control is to reliably track code changes so you don't lose anything and can revert to any point or recover from bad merges. Since rebase permits you to violate this core purpose and literally lose the entire history of code changes, then yes, it is stupid.
The trouble with git rebase is that it can create havoc by those who don't understand what it is doing conceptually, and (probably more importantly) how to recover when things go wrong.
When I hear people griping about rebase, I assume that nobody took the time to teach them how to use reflog first. Once I had an understanding of reflog, I could mess up all I wanted (without pushing) and recover. In that environment, rebase can become a very useful tool. Without being able to recover, rebase becomes a tool of confusing irreversible destruction.
> git-rebase is stupid because somebody doesn't know how to use it?
No, it's stupid because it's really common for people to fuck it up, and because the purported benefits (clean history) are not something which matters.
my case for good-looking history is pride of ownership. Are you proud of this "thing that makes money and ultimately pays your paycheck" or do you leave it polluted and full of crumbs and detritus?
I would have made this part of a root-level comment but I doubt anyone would read it, but: I think what gets lost in all these git debates is what language/context are we talking about? a shop that churns out javascript and releases to prod every 8 hours is very different than a C++ shop that writes safety-critical software. Their git needs are very different, and having an "I make my bed every day at 5:30am before I go for a 5mi run and come back and drink my juice and eat avocado toast" git regimen may be appropriate for some codebases but not for others where "I woke up hungover at 10am with a partner whose name I cannot remember, in a bed that is not mine" regimen. I think countless human-brain cycles are lost to bickering between these 2 camps.
> The golden rule of rebasing
> Once you understand what rebasing is, the most important thing to learn is when not to do it. The golden rule of git rebase is to never use it on public branches.
https://www.atlassian.com/git/tutorials/merging-vs-rebasing#...
For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.
> Don't rewrite history on shared branches with proper communication.
I don't teach "never", I don't teach that `main` is special, I don't teach that force pushing is forbidden, because I don't believe in those things.
I highly prefer a rebase-heavy workflow. In addition to not "cluttering" the history, it's an invaluable tool to keep commits focused on "the right level" of atomic changes.
Git history should tell a simple, understandable story of each change. For example: 1) refactor existing code, 2) add feature. Or 1) add missing tests, 2) refactor existing code, 3) add feature.
But since you're working on the fly with imperfect knowledge, it doesn't happen in such neat steps. Refactorings and behavior changes end up interleaved in your raw git history, so you need to do a little bit of cleanup by hand in order to present a simple story in the commit log.
Of course if you have developers that don't do that and instead merge dozens of commits that just say wip, wip, wip, lol, fml, wip, wip, lol, yolo and you can't fire them or get them to change, then squash merges ftw.
I'm a big proponent of rebase and squash if it helps to make a commit more coherent, but we use squash merges by default in the current project I'm working on, and I die a little bit each time I try to understand what changes were related to a line when tracking down a bug.
I hate that branch squashing has become something of a defacto. I actually do rewrite my history and often add context to my commits. `git blame` can be an incredibly useful tool to get context about a given small change. Getting a massive diff for a whole feature is much less so, especially since you can just look at the diff of the merge commit.
They do but they have their own issues. e.g. having to delete local branches using git branch -D instead of git branch -d and getting the protection from deleting unmerged work.
I still agree that on balance annoyances like that might still be worth putting up with for larger teams with mixed skill levels.
Deleted Comment
That leaves you prone to losing work if you have a false start that you need to back out of. I prefer to commit early and often on my private branches, then before submitting a pull request I clean up the history to where there are a few good commits that form useful, standalone chunks (ideally the test suite fully passes on each commit).
The purpose of history is to remember. Rewriting history, whether git or in life, is bad; outside of the context of don't use it on public repos. Such advice is similar to saying, only point the shotgun away from you when firing. If you have to remember such a rule, it's best to avoid it.
I've heard this many times before, but haven't been able to figure out why this is a problem. In your workflow is it a problem to have a cluttered commit history? If so, could you explain how?
GitHub recently added a feature that prompts people to update their branches via merge. It's frustrating because every PR now had dozens of merge commits polluting the history.
What I want is for GitHub to track changes between sets of commits in a PR so that you can do most of the review with merges and "address review comments" commits, and then rebase into well organized, logical commits and review that those have the same diff as the messy history after a force push.
Deleted Comment
What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.
Let it go. Accept that coding is not a smooth, robotic, endeavour, where everything is always tidy. And that's just fine.
I've been working on dozens of projects since, and probably did thousands of commits. Some of the teams of those projects included dozens of developers working concurrently on the same codebases. We always merged the upstream branches into our development branches and never did any rebases.
I have NEVER ended up in a situation where I thought rebases would have been better. The git tools and IDE integrations of our current age allow me to find any information I need from the history without pain.
Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.
The pro-revisionists (squash, rebase) say they do what they do so the history looks clean (no intermediate commits breaking stuff, a "straight line" graph, etc)
The anti-revisionists say they do what they do so the history looks clean (can see the actual development, can safely diff different commits to see what changed in between, see the log in chronological order, etc).
> Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.
Again, both sides could argue that they're the ones with more discipline.
> The point is to make it possible to debug later, via bisect, or show, or even just a diff.
This sounds anti-revisionist.
> The point is to make the workspace clean for the next guy.
This is one of the most common pro-revisionist arguments.
And git blame. And git checkout to a past state. It "doesn't matter" only if ease of understanding your project history doesn't matter.
That being said, after reading this stuff, I may start using it on my local branches to clean up multiple commits into one tidy one, but that's about it.
Every time I try to review a PR and the bookmark resets because they decided to force push I curse the Git maintainers that don't have the backbone to just get rid of rebase.
The amount of time that has been saved in my life by someone leaving an explanation in their commit (for some weird edge case or context I’d have no way of gleaning because they’ve since left the company) is SO much more than the extra time I’ve put in to make sure the history has this extra info in it.
If I had a bad day and introduced something stupid, I want a bisect to point me a the code I wrote on that bad day. If you squash liberally, perhaps because you want each commit to correspond with a release-note, you're going to lose that debugging granulariry.
But it's simply intangible. My instinct tells me that it's helpful and that's okay. I don't owe anyone a justification for how I organize things, and there's nothing controversial about this. (Or maybe I could even come up with a logical example of a benefit, but that's a trap I'm not going to fall into) And a lot of people agree, and they know what I mean, so it's not merely an individual preference. If I have to work with someone who has strong preference against it I'll worry at that point about negotiating.
Very genuinely: I do not care at all whether you clean your room starting from left and continuing to right. Or, starting from doors and continuing toward window. Or whether you clean it in a random order. I also do not care about whether you clean every Friday or whenever you feel like. That is the equivalent of git history. Because this excessive care about git history is just that - insisting that room is cleaned from left to right as if any other order was an issue.
The reason why it is hard to defend the tangible benefits of this or that git history strategy is that there are very little benefits.
I prefer not to have squash commits in our team for this reason. It makes master look good, but usually nobody ever looks at the master commit history first, they look at the merged pull requests. However, everybody must look at the commits you made in a pull request. If you have squash commits, you are encouraged to have messy commit history in your pull requests, leading to meaningless commit messages and even large commits (causing other problems...).
IMO the only advantage of squashing is that it makes it easy to roll forward when you accidentally deploy something that causes problems.
* use filtering commands like "git log -S"
* press the "annotate" button in my IDE and can see which commit introduced each line
* run "git bisect"
* use "tig" to drill down through the history of a file (shortcut "," is "move to commit preceding current line's blame commit")
...every step of the way, I get a meaningful description of why a change was made and what other diffs were necessary to achieve that change. And not just "fix", "bug", "PR commments".
* `git blame --first-parent`
* `git bisect --first-parent`
* At least one "tig-like" with a --first-parent first UI: https://github.com/kalkin/git-log-viewer
In PyCharm, I can see which commit introduced each line, regardless of branching. Same with drilling down through a files history. Is this an IDE limitation you're seeing?
> every step of the way, I get a meaningful description of why
Isn't this more about commit messages, than anything else?
Makes reviewing a set of changes prior to a merge much easier. It's nice if there's a 1:1 correlation between a commit message and the actual patch contents.
Im sure you've dealt with the case of reviewing a colleague's changes with a commit message like "Enable logging in foobar module" and the patch is actually enabling foobar logging and a bunch of other stuff.
This makes bisecting your git history to identify and fix bugs much more difficult.
If the git history is clean, you can just read the commit messages and implicitly trust the developer if clean git hygiene is in place (as opposed to actually needing to read the whole diff on a per-commit basis to find out what _actually_ happen at commit XYZ, despite it's message).
Every time you do a git rebase, you are literally asking your source control system to lie about history. If you mess up, and you eventually will, you're then forced to manually figure out what the history really was despite being lied to. If you mess it up, well, good luck.
I used to work at a company where someone (we never figured out who) in another group would rebase every few weeks. We didn't find out about it until their stuff was pushed then released. The result was that features which we'd written, QAed, and released to production would simply disappear a few weeks later. With no history suggesting that it ever existed.
Have you ever been pulled off of a project to go fix a project from a month ago which has disappeared from source control? You don't know what happened, you no longer have context, you've just got complaints because your stuff no longer works.
Is your desire for a "clean history" worth potentially creating THAT disaster for other developers on your team???
I recall when I first switched to git at work and the team was insisting on a "linear history", I was bemused: Could these developers really not handle a merge graph? It was bizarre how something straightforward in other VCS is suddenly "messy" amongst git folks.
I'm a fan of rebase myself, but understand the point made above. For me, the biggest pro of a clean history is when doing `git blame`. If the history is clean and the commits are good, it might solve my issue. On the other hand, if the commit in question is a huge mess of unrelated things it doesn't help me at all. I also find it way easier to review a PR with a clean, well-described history.
But because GitHub and other tool’s version of rendering history just flatten merge commits into spaghetti we’re stuck with squash merge. Thanks GitHub.
Also sometimes you decide you want to backport some change to other releases, and if commits are in a good state, it is much easier to do this.
Whether this is feasible at all depends largely on the care developers put in structuring their commits.
A free form textual interface to document everything about why you made the changes you just made? Why not maximize the value of this resource!
I’m not necessarily on Team Rebase, but isn't this just as likely with merging gone wrong?
A badly done merge can indeed ruin code. But you'll always have the versions that went into the merge, and the merge itself. Your history has all of the information to recreate exactly what happened, find what changed, and then figure out how to fix it.
A badly done rebase not only ruins your work, it also removes from the branch any record of your work having been done. Unless you can find the right stray old commit which is not yet cleaned up, there is no choice but to start doing it again from scratch.
Deleted Comment
I find it easier to run git binary search with it like this too.
This being a principal reason for VCS, I very much understand the motivation.
You should not be able to use `--amend` during a rebase.
For me editing all my changes onto the commit I'm working on with `git commit -a --amend` (or as I've aliased it, `gcaa`) is automatic; I do it 500 times a day, just to save my work. But I can't count how many times I've been in the middle of squashing commits and accidentally typed `gcaa` and amended someone else's commit after fixing a merge conflict, and it's super annoying to unwind (if you realize after typing `rebase --continue`) so usually I end up just giving up and starting over. I really wish amending to a commit that wasn't one of the ones you're rebasing was just totally disabled.
I guess there are some other small complaints, like the annoying reversing of `--ours` and `--theirs` from what makes sense (yes, it makes sense if you have the internal model of rebase instead of the intuitive one, but that's stupid), rebase's tendency to pick the wrong parent commit if you've accidentally amended someone else's commit (and therefore lag a while and then produce a rebase log of 1000 commits or something), and the utter tedium of editing the rebase log to replace every instance of "pick" with "s" for squash except the first, since almost 100% of the time what I want to do is squash everything (and use the last commit message, not the first, and definitely not all of them munged together which is the default).
I would love a separate command or a flag, like "git rebase --tip" that does all of this automatically for my otherwise extremely elegant workflow (and I'm gonna be really bummed if it turns out it exists and I didn't know about it for the last 5 years...).
Probably easiest as a little shell function like
rather than an alias?[Edit] I forgot about rev-parse --verify, which simplifies this further:
This also leaves you still able to use commit --amend long-hand if (for example) you want to edit one of your own commits during rebase -i.You could try reverting the first commit on the HEAD once you finish the rebase. This is of course assuming your branch and the last commit don't touch the same files.
Then on the final rebase the commits are automatically ordered with s and f as appropriate.
Although I do a fair bit of amending, too.
There's a false dichotomy nobody addresses here, which is the notion that there needs to be such a thing as "the" history for you to get the benefits of a clean history.
If all you really want is a linear history, then just do merges, and make sure the "first parent" is the main branch (which you can enforce with tooling). Now you can just traverse solely the (linear!) sequence of first parents, which is exactly the same view squashing would have given you, except without the information loss.
If for some reason you can't stand the idea of something branching off your main branch at all, then set up a separate job that automatically squashes everything onto a branch that only it can write to (or branch from). Now you have a truly linear history with nothing branching off it, exactly as you would've had with squashing. And you can always reproduce it on demand.
That way you avoid the information loss, and can always do archeology on the full evolution graph if needed.
Don’t merge the base branch into a feature branch. Rebase to “update”.
Do use rerere and the curse of fixing the same conflict over and over is (almost) gone.
Don’t rebase (or force push for other reasons) a shared branch. Rule of thumb here is you can probably rewrite history if you work with _one_ coworker in a branch but any more than that and you’re more likely than not to upset someone.
Do rebase -I HEAD~N to reorder/reword/squash into easily reviewable sequences of commits.
Don’t force push after review, until the review is complete. This keeps the history of the review process but you can later merge the fixups with the commits they logically belong in right before merging.
Do use Merge, Squash and “Rebase+FF” as appropriate for merging PR. There is no best solution for every scenario so prescribing “always merge” or “never merge” or similar isn’t helpful. A good rule of thumb though is that IF a branch has merged from the parent branch to update (which I suggested was a “don’t”) then avoid merging it back. A branch that was updated that way is better to e.g squash when merging back.
I've had to talk to soooo many developers about this. I want to see what changed since my last review, not restart my review.
This is usually caused by merging an upstream branch (e.g. develop) into your feature branch and then later trying rebase it.
Effectively the commits you've merged in from develop undo the changes you've made in your feature branch. You fix them but the foreign commits undo the changes again.
The solution is actually pretty easy. Use git rebase --interactive to remove any commits from the rebase that aren't directly part of the feature work.
You may still have an odd merge conflict to fix but you'll only have to do it the once and everything should go smoothly.
I would also recommend never using the same commit message twice. When you have a list of 10 commits all called "Wip" it's hard to tell which are obviously duplicates that can be deleted.
git rerere is the even easier solution.
Deleted Comment
I use it all the time and I really like how I can make garbage commits (wip, test) and then squash them into atomic commits which are easy to review and later on easy to bisect when inevitably mistakes happen. Sure I've fucked up too when I was learning on how to use it and those were some painful mistakes but only through using it and making those mistakes have I learned to use the tool to great advantage (clean history).
The whole purpose of source control is to reliably track code changes so you don't lose anything and can revert to any point or recover from bad merges. Since rebase permits you to violate this core purpose and literally lose the entire history of code changes, then yes, it is stupid.
When I hear people griping about rebase, I assume that nobody took the time to teach them how to use reflog first. Once I had an understanding of reflog, I could mess up all I wanted (without pushing) and recover. In that environment, rebase can become a very useful tool. Without being able to recover, rebase becomes a tool of confusing irreversible destruction.
No, it's stupid because it's really common for people to fuck it up, and because the purported benefits (clean history) are not something which matters.
I would have made this part of a root-level comment but I doubt anyone would read it, but: I think what gets lost in all these git debates is what language/context are we talking about? a shop that churns out javascript and releases to prod every 8 hours is very different than a C++ shop that writes safety-critical software. Their git needs are very different, and having an "I make my bed every day at 5:30am before I go for a 5mi run and come back and drink my juice and eat avocado toast" git regimen may be appropriate for some codebases but not for others where "I woke up hungover at 10am with a partner whose name I cannot remember, in a bed that is not mine" regimen. I think countless human-brain cycles are lost to bickering between these 2 camps.