The Perfect Commit - Readit News

I am a firm believer that a commit should rarely be more than a couple minutes of work. When I go to git-blame, I want the commit message to be the why of this specific change. When I git blame and find a commit that's thousands of lines I wonder why we're even using git at that point and not just saving zips.

The proponents of PR Squashes irk me. You're losing all the value of git. In the future when I'm hunting down a problem I almost never care about what feature or fix this was a part of - e.g. the PR - I care entirely about the context of the developers mind when he committed that specific line of code.

In the rare case you do actually care what PR a commit is from, GitHub's search will tell you.[1]. There's no need to put that information in a commit at all.

I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed.

Was it an accident? Did it serve an actual purpose? I might never know.

Commits are cheap. Blames are easy. There's zero need to try to keep your tree "looking nice".

1: https://github.com/donatj/CsvToMarkdownTable/search?q=f4c167...

ilammy · 3 years ago

> I care entirely about the context of the developers mind when he committed that specific line of code.

In squash workflow, the PR is the context and the unit of change. It’s shoehorned onto git only because of GitHub.

In some way it’s a self-fulfilling prophecy: if you disregard individual patches then individual patches will be disregarded. It could also be attributed to git being hard to use, with all these “commits” and “patches”. So some developers treat Ctrl-S as the “commit” hotkey, putting whatever state of the codebase into commits, keep stringing them. Then in the end just ship it for review as is, since it’s the path of least resistance, never even expecting anyone to review individual patches, because as far as they are concerned they are not sending in patches, they are sending the PR. Then maintainers are faced with these PRs, and then they have a choice: either enforce the quality of “commits”, or review the PR as a whole and disregard the commits. The choice often falls for the latter, because the former does not provide any practical benefits for the process (rather than theoretical “but we’ll have individual commits when 5 years later someone blames them”), while the latter does provide practical benefits of not alienating developers with subpar commit discipline.

u801e · 3 years ago

> In squash workflow, the PR is the context and the unit of change. It’s shoehorned onto git only because of GitHub.

This could be useful if a PR is essentially changing only one thing like a commit should, but since PRs usually cover a feature or a fix for a ticket, these frequently involve doing multiple things to implement the feature or fix the bug.

> the [option of enforcing the quality of commits in a PR] does not provide any practical benefits for the process (rather than theoretical “but we’ll have individual commits when 5 years later someone blames them”),

Your parenthetical clause is not theoretical. It's basically a thing I have to deal with on a daily basis when I'm fixing bugs or removing/adding features in multiple code bases that were written by people years ago who no longer work there. I can't really make use of git blame because the commit messages are non-informative, the diffs are just work in progress saves, and the linked PRs just have looks good to me comments and no description.

But for code bases where commit quality was enforced, git blame actually becomes useful and allows me to see what was done and why it was done at the time it was written.

js2 · 3 years ago

I also work this way and advocate against squash merging or putting all the context into the PR.

A big reason for me is that I've built up a lot of context in my head when writing code. When I make the commit, that context is fresh in my mind and flows easily into the commit message.

But say instead I make a half-dozen commits with just single sentence descriptions. At the end of my work day I push the commits to a new branch and open a PR, and start writing the PR description. What was I thinking 5 hours ago when I wrote that first commit? I've forgotten. I've lost context.

So I always try to commit and document as I go. At that point I don't spend a lot of time getting the commit messages perfect, but I make sure they at least have all my thoughts. Before I push up the commits, I'll use "git rebase -i" and "reword" to edit the messages (typos, reformatting the text, maybe adding additional thoughts as I look at the diff). Then I open the PR and at that point, maybe I'll add additional context but usually my PRs are just an abstract and I ask reviewers to look at the commits themselves. It drives me crazy how hostile GitHub is to reviewing individual commits.

Many teams end up with squash-merges though because frankly: (a) developers don't know how to use git beyond committing and pushing so; (b) PR feedback is addressed by adding more commits and pushing them up, as opposed to amending the original commits; combined with; (c) GitHub only grudgingly accommodating an amend+force push workflow.

Frankly Gerrit is a better-designed review tool, but it's not widely used.

u801e · 3 years ago

> So I always try to commit and document as I go. At that point I don't spend a lot of time getting the commit messages perfect, but I make sure they at least have all my thoughts. Before I push up the commits, I'll use "git rebase -i" and "reword" to edit the messages (typos, reformatting the text, maybe adding additional thoughts as I look at the diff).

I actually don't bother with committing anything to version control until I'm done with the feature or fix. I then will create a diff against the base branch and stage each individual part and create a commit from it. I then push up those commits to the remote.

> It drives me crazy how hostile GitHub is to reviewing individual commits.

I actually have an ingrained habit of middle-clicking on the commit sha1 and add my comments on the commit itself. It doesn't really show up with any context in the main PR view though.

Lately, I've started running git log -p --reverse origin/master.., piping the output into vim, prefixing every line with '>' and typing my comments inline like I would with an email message (trimming the parts that I'm not responding to), and then pasting the entire thing in a single comment on Github.

> Frankly Gerrit is a better-designed review tool, but it's not widely used.

I actually wish more pepole would use the email patch review process. At least there, reviewing individual commits is the default and the relation among multiple commits is preserved. I'm not sure how that's accomplished in Gerrit.

simonw · 3 years ago

This is why my workflow is so issue heavy: I use issue comments to constantly capture my context as I'm working, rather than using small commits.

I sometimes even run "git diff" and copy the result into an issue comment to record a potential implementation route that I'm not fully convinced of just yet.

seba_dos1 · 3 years ago

> In the rare case you do actually care what PR a commit is from, GitHub's search will tell you.[1]. There's no need to put that information in a commit at all.

Even better - if you work with linear history (like most GitHub projects do I guess), you can always require merges to be fast-forwardable, but create a merge commit anyway. This way you retain all the individual commits and they're being grouped together by pull requests, so you can easily filter the details out with flags like `--first-parent` when you're not interested in them.

Squashing PRs truly is absolutely useless.

GauntletWizard · 3 years ago

The tooling just isn't there, though, especially if you want to do something like signing. I want to have my whole merge/pull request history in one subtree, while having the 1st subtree maintain a clear linear history with MR/PR Numbers in each merge commit, and also have that one be signed by the committer. It's easy to represent. It's a little harder to get the forges to do it right.

howscrewedami · 3 years ago

> Squashing PRs truly is absolutely useless

That is quite an absolute statement. Some people just have a personal preference for having git trees arranged in a certain way. If squashing can help with that, then it serves a purpose.

majormajor · 3 years ago

Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

The idea that I can commit every few minutes with commit messages that will make sense to you months later strikes me as incredibly unlikely, short of "write the code once to figure out how to do it, then throw it again and write it again to have a good chain of commits." And it's not worth that much - especially because the bit that's relevant when debugging may not be the bit that was relevant to me when writing the code.

Non-obvious things like that string/bool thing? I use comments for that. Yes, comments can get outdated, blah blah blah, but commit messages can also be unhelpful. What if that cast wasn't what was on their mind when writing the message anyway?

I would argue that if your PR was a thousand lines it's too damn big anyway. Figure out how to do multiple PRs, your reviewers will thank you. But I want commits that are cohesive, compilable, functional pieces. Not just path-dependent detritus of the dev process.

imaltont · 3 years ago

> Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

Not constantly, but will clean up the local/personal branch history before creating a PR with a rebase. Change some messages, maybe change up where some of the code is committed or even change the order if that makes more sense than the way it actually happened. It's a pretty quick and easy task to do when you get use to it.

seba_dos1 · 3 years ago

> Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

Yes. Being able to effortlessly do just that is the main value provided by git.

strictfp · 3 years ago

I've been following your school of thought as well as the one proposed by the author.

I would argue that your model is a lot freer and will lead to better code quality in the long run, since you're not feeling so restricted when working.

Meticulously crafting commits feel like one of those "good in theory" approaches that feels good but hurts you more than you realize. They're good for workflows where you move commits between lots of (release) branches, sure, but if you're doing trunk-based development I'm going to call it a bit formalistic.

lamontcg · 3 years ago

> I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed.

> Was it an accident? Did it serve an actual purpose? I might never know.

You're never going to get that level of detail.

Particularly if it was an accident or done without really thinking about it--the original coder will never document that because they were literally not thinking about it.

New commit rule: always document what you weren't considering when you wrote the commit.

Izkata · 3 years ago

> > Was it an accident? Did it serve an actual purpose? I might never know.

> You're never going to get that level of detail.

I have! I once found a bug that was caused by a typo in a linting commit, something obviously unintentional. If it was squashed it would have taken a lot longer to figure that out.

I also recently found issues in a common python 2 and 3 library where the code was broken in py2to3 and first-run-of-black commits. More places where it was obvious how and why the code was wrong because the formatting was a separate commit from the feature.

gorgoiler · 3 years ago

You wouldn’t publish the chapters of your book with a complete edit history and all of the margin notes and email discussions you had with your editor.

At some point you have to draw a line under your creative process and present a finished piece of work that stands up on its own. That means one diff with a meaningful commit message.

It might seem like a reasonable point that you want the unsquashed commit history in order to see into the mind of the developer. In practice, anything of consequence mentioned in those commits should be in the squashed message. I don’t need to see your thought process as you get something wrong three times then spend another three commits fighting with the linter all before you present your final thesis. I just need to see what you’ve finally settled on.

(See also: changes to a code base that land and are then immediately changed again. Developers like this put up code too soon and too persuasively. Try to get them to slow down.)

PS: Most projects I work on are mature code bases with between tens and thousands of developers. Most of the changes are iterative, adding features on the long march to revenue. There’s an exception to this process — which the OP alludes to — where large new features land as blocks of hundreds or thousands of lines of code.

When that happens, if the author spent a long time in modelling hell trying to find the best way to lay out their code, then it’s going to have a horrible commit history. If there’s anything they can do to land it in stages then they should — including rebuilding their history with pretend commits as if they have one clear idea after another. That is hard to simulate, so it’s often easier to break the code up on the orthogonal axis to time: space. With version controlled code, the dimensions of space are modules, and if a large change is broken into clearly defined modules then that provides the same benefit as a series of changes broken into clearly defined ideas / diffs / patches / PRs.

trashburger · 3 years ago

> I don’t need to see your thought process as you get something wrong three times then spend another three commits fighting with the linter all before you present your final thesis. I just need to see what you’ve finally settled on.

I don't think parent is advocating for a commit to stay the same before a merge. You can simply clean up those WIP commits via rebase + fixups and put them in a coherent order and structure (i.e. make them atomic as the article mentions). This and rebase-merge workflows make git history a delight to work with.

DeathArrow · 3 years ago

I see the purpose of code versioning in being able to save the code fast, being able to reverse code changes fast and doing code merged with ease.

If some philosophy stays against someone's productivity I don't believe it is a good thing. I would let developers commit in every state they find easy, provided the code builds and the commit comment describes what is being committed.

I think it is the PR which we can push more demands on but not commits.

I, for example, dislike having to track any small change I have made and commit it. I like committing larger parts which contain a functionality.

imaltont · 3 years ago

> I, for example, dislike having to track any small change I have made and commit it. I like committing larger parts which contain a functionality.

This is pretty much the suggested way by the creator of git too. A commit should contain the necessary changes for one bit of functionality/bug fix. This imo makes the history pretty neat and tidy, while at the same time making it easy to search through with blame/bisect whenever you need to. Giant commits (multiple functionality/whole project/extension squashed into one) makes both of those hard to use and in some cases pretty much useless outside of finding who did it and hope they still work at the company and remember their state of mind when they did the change.

darekkay · 3 years ago

> Blames are easy. There's zero need to try to keep your tree "looking nice".

I don't care about how the tree looks like. But I do care about keeping the number of commits low _because_ it makes blames easier.

> In the future when I'm hunting down a problem I almost never care about what feature or fix this was a part of - e.g. the PR - I care entirely about the context of the developers mind when he committed that specific line of code.

I've mostly made the opposite experience. Many developers don't have the knowledge or confidence to use "amend" etc. consciously, so it leads to a mass of useless "fix error", "forgot something", "add tests" and "implement PR reviews" commits. This makes blame more difficult. I also agree: sometimes a squashed commit message doesn't help me understand the code of line I'm blaming. But not having squashed the commit mostly wouldn't help me anyway. Instead, I try to avoid this problem beforehand, by making sure that the PR itself contains all the information (via PR description and/or code comments). Recently I've reviewed a "Fix scrollbar" PR. The "what" is clear. The "why" (= "it's broken") _sounds_ sufficient, but I was missing information like "how exactly is it broken" and "why does this change fix it". That's the kind of stuff I might blame in a year from now, and now it's part of the Git history.

b3morales · 3 years ago

> Many developers don't have the knowledge or confidence to use "amend" etc. consciously,

The secondary problem here is that GitHub (just to throw another complaint on the pile about GitHub) makes a dog's breakfast of tracking PR comments when things have changed. Amending and force pushing only seems to make it worse, so even though it should be the better workflow, it causes friction in a different place.

int0x2e · 3 years ago

In my experience, this only works if you have a small team of great engineers. Once you have 50+ people of various skill levels making contributions to the codebase, you'll start seeing your git history littered with variants of "fix", "fix 2", "fix broken CI test" and on and on...

For me, the PR is the context I can use when trying to find an issue. Sifting through many hundreds of commits per day is painful. Sifting through tens of PRs per day is not great, but is much more manageable...

seba_dos1 · 3 years ago

> you'll start seeing your git history littered with variants of "fix", "fix 2", "fix broken CI test" and on and on...

Should such commits even pass through the review in the first place?

MaulingMonkey · 3 years ago

> I am a firm believer that a commit should rarely be more than a couple minutes of work.

While I'm a proponent of small commits, this is overstating things IMO. I work on codebases that frequently take a couple of minutes to perform an incremental link, that put commits through dozens of CI hours of (useful!) integration testing, chase down C++ heisenbugs that take weeks to hunt down the root cause motivating a 1-line change with a 30-line commit message.

On the right codebase, I'm more than happy to throw a single file "fix whitespace" commit at CI without even locally compiling it, confident that CI will catch any problems, because it'll help untangle the diffs of future impending commits and make them easier to review. On the same codebase, I might make a commit touching 1000 files, mechanically switching code from an old deprecated API to a new one. If individual changes are "high risk" and potentially worthy of bisecting, proper review, etc. then I might split the commit up. If the individual changes are "low risk", and basically guaranteed to work if it compiles, then splitting up the commit is just adding noise - better to make it a complete and atomic commit than a micro commit, even if it might involve hours of work (e.g. adding a categorization enumerand to allocations or logging.)

The real risk would be something like changing allocation patterns breaking some uncaught edge case workflow by going out-of-memory, and the complete commit will be easier to bisect, track, and revert than dozens of scattered micro commits that individually made the crash only slightly more likely, but on the whole made the crash certain.

Unless I have a huge slog of entirely mechanical commits, I probably top out at maybe 20 commits on average over a flowing, coding-focused, 8 hour workday, doing work which is straightforward and easy to carve off completed atoms of work into their own commits. Which is what - 24 minutes per commit? Bit more than a couple. And I'm an outlier compared to coworkers over multiple game companies, who trend towards less frequent, girthier commits, even if they appreciate my approach.

I have asked myself the same questions of plenty of tiny commits, sometimes even my own. Tiny commits don't actually solve this problem. Thorough review, clear code, and proper documentation can help, but if nothing ever slipped through the cracks, we wouldn't have bugs in the first place.

And sometimes you discover it's actually a bug canceling out another bug, and that even the author of the commit didn't actually grok what was going on, even if they tricked themselves into thinking they did at the time.

---

My own rule of thumb: carve off small/atomic/freestanding "complete" changes into their own commits for easier review of both it and your future commit whenever reasonably possible, if only because the change is broken up. But even this has caveats - reviewers traumatized by past coworkers adding "code for the sake of code" might push back on these for lack of concrete use cases, and it may be easier to bend to their whims than to spend the time and political capital to fight 'em and bring them around to your way of thinking. "Reasonably possible" means "stop if it hurts."

Deleted Comment

bennysonething · 3 years ago

Git squash into main works well for my team. Lots a commits, often undoing then redoing things really doesnt add anything for me. In fact I find it harder to understand lots of small commits because the overall goal / change change is so fragmented. Also it's it's easier to revert a single squash.

simonw · 3 years ago

Is this example too large or about the right size for you? https://github.com/simonw/sqlite-utils/commit/ab8d4aad0c42f9...

js2 · 3 years ago

Here's what I would have written. This took me reading the code and clicking through to #299 to figure out and could have been in the commit message:

    Add optional tables argument to schema command

    The schema of every table in the DB is shown by default. Add optional
    tables argument to restrict which tables are shown.

    Closes #299

fragmede · 3 years ago

The commit message could stand to be the tiniest bit longer, instead of having to click through to find out what #299 is.

donatj · 3 years ago

That's probably fine, honestly. I'd maybe take a little more context in the message, but I certainly write messages like that myself. It's always later when I'm looking into problems that I want more context. I spend a lot of time hunting down problems. Half the time I caused them.

I know tone/angle of commit messages is an entirely separate bag of cats that people have strong opinions about, but I find the tone strange. I generally fond of commits that state what the commit does. Something along the lines of "Adds optional table support"

bmitc · 3 years ago

If you keep your commits so small, what do you put as the commit message?

Also, this showcases why Git is a terrible system. It contains no semantic information in it and is just a dumb line and text change keeper.

raydiatian · 3 years ago

This makes sense for bugs, which every feature secretly is

Perfect commits are a bulwark against technical debt:

https://www.infoq.com/articles/business-impact-code-quality/

https://news.ycombinator.com/item?id=33372016

The extra time spent crafting a perfect commit today will save 10x that time in the future. Your future self will thank you, and if not your future self, then I will, when I inherit your code.

psychoslave · 3 years ago

I very rarely check the repository history, even less to find an answer in how some buggy behavior was introduced.

At most I'll git blame, search the issue repository and see if I can chat with the concerned people.

Understanding the current relevant code and what need to be done is basically always sufficient to do the job. Digging into history, while potentially very interesting, never helped to reduce the time to resolve anything in my experience. If the code is a mess, the history will be exponentially so. If the code base is clean and well documented, you might more likely have a great set of "perfect commits" that you will never need.

js2 · 3 years ago

> see if I can chat with the concerned people

The concerned people quit and now I'm left supporting their untested, undocumented, unreviewed code that doesn't have commit messages more meaningful that "now it works", "fix it", "make it work", "another try".

So now answering every "why?" question takes me much longer than if the damn code was just documented in the first place.

I often go back to my own code, go to make a change, can't remember why I did it a particular way, run "git blame, git show" and thank myself for having the foresight a year ago to spent an extra minute or two when the code was fresh in my mind writing down why I did something a particular way.

If that doesn't float your boat, fine, but I don't want to work with you.

The git repo itself is the epitome of what I aim for in my commit messages. Here's an example:

https://github.com/git/git/commit/d3775de0745c975e2d13819a63...

Literally a single line change, the addition of:

    BASIC_CFLAGS += -O0

to the top level Makefile, and 9 paragraphs of explanation in the commit message.

grogers · 3 years ago

Writing good code and writing good commits definitely clusters together. It's part of a general pride in quality of work, organization, and planning.

Going back into the commit history can be useful to figure out if something suspicious was introduced intentionally or if it was just a mistake. Beyond about 3-6 months, the author will generally not have any context to remember small code details, so leaving breadcrumbs for your future self and team can be nice.

Deleted Comment

quickthrower2 · 3 years ago

History is useful because blame often get’s it wrong. So it is a manual blame.

keyle · 3 years ago

Until the requirements change 12 times over 12 months, and that time invested writing prose with all the stuff that goes with it goes down the drain.

Don't get me wrong, I'm not saying commit crap. I'm saying "Yes But". Use code as documentation, use Tests as defence turrets, forget the poetic muppetry. The code should reflect the now and mark the pitfalls. If the code is well written, I should be able to understand the why it's done this way. Don't send me off to some document on confluence written by the dude that left 2 years ago.

simonw · 3 years ago

That is the exact argument I'm making this article: if your documentation is on confluence it will inevitably go out of date.

The solution to that is to keep the documentation in the same repository of the code, and update it in lock step every time you make an implementation change that affects the documentation.

wruza · 3 years ago

For a counter example, most of my work projects usually live up to 3-4 years (with 0.5-1 year of iterative development) and I have never investigated a commit from the past nor had to understand what it does, because that state of a project has nothing to do with today’s and checking it out or blaming is usually completely meaningless, unless it’s from the last week or two.

I believe that most of these high-culture commit advices come from a specific (maybe common in bigcorps maintenance phase, idk) development process and/or paradigm, which may or may not be present at your workplace (though you may pretend they are, as we all do sometimes to look better). Personally and company-wise we are using RCSs simply to merge parallel work and to have an undo stack a little deeper than an editor could provide. That’s it. Nobody ever goes to the history, unless there is a technical issue with RCS itself.

I’m not suggesting to write “new task” in a commit message, but am not fond of writing markdown poems there either, or committing every 3 minutes because changes “must” be short. Commit when it’s done, for some reasonable definition of “it” and “done”, and explain it in one line in under a minute, this is my empirical rule. If you have more to tell, tell it in the comments. (That’s what your future self will really be grateful for.)

Pretty sure that many people who nod to all this noble culture actually silently relate more to the above.

Edit: should have read this subthread before being cautious in my statements. Glad that there are people who share these views. Because reading on how to do it The Right Way and then waiting for “the obvious benefit” over decades may be confusing and anxiety inducing af.

js2 · 3 years ago

You've presented a strawman argument. No one is suggesting committing every 3 minutes, nor writing markdown poems.

Commit your work in logical blocks that make sense and explain what the change is doing and why so that a developer other than yourself can understand why the change was made. That's all. Unless you expect 100% code churn, you'd be surprised what really survives in the history. And even in the course of churning code, I often refer back to commit messages for the existing code.

As I've mentioned elsewhere, I've been doing this a long time (more than two decades) at startups and fortune 500s, on both open and closed source code, on projects of all sizes. I'm surprised frankly that 3-4 years on a single code base has not been enough time for you to have never wished you'd written a better commit message.

I literally lost an hour today working backwards from an error message to what the code was actually complaining about. No comments in the code. No documentation on the API endpoint. No commit message or PR description to explain the change beyond "make it work now with IPA uploads." In my experience, all these things go together and are signs of a conscientious developer.

The code tells the computer what to do. The commit message tells other developers what the code is supposed to do and why it's supposed to do it.

This has nothing to do with nobility. It's about sustainable development that tries to avoid technical debt, about being forward thinking, and about courtesy to future maintainers of the code you've written. Someday that future maintainer may be you.

mianos · 3 years ago

Is this irony? Given the time.to write the odd comment, tidy a bit of code write a commit message, I know what I ask for. People read the code. I am sure some people jump right to the commit log but I have never met one in my 40 years as an active developer. Commit messages matter. A short statement of intent is a good idea. Dangerfiles are the penultimate expression of bikeshedding.