Readit News logoReadit News
donatj · 3 years ago
I am a firm believer that a commit should rarely be more than a couple minutes of work. When I go to git-blame, I want the commit message to be the why of this specific change. When I git blame and find a commit that's thousands of lines I wonder why we're even using git at that point and not just saving zips.

The proponents of PR Squashes irk me. You're losing all the value of git. In the future when I'm hunting down a problem I almost never care about what feature or fix this was a part of - e.g. the PR - I care entirely about the context of the developers mind when he committed that specific line of code.

In the rare case you do actually care what PR a commit is from, GitHub's search will tell you.[1]. There's no need to put that information in a commit at all.

I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed.

Was it an accident? Did it serve an actual purpose? I might never know.

Commits are cheap. Blames are easy. There's zero need to try to keep your tree "looking nice".

1: https://github.com/donatj/CsvToMarkdownTable/search?q=f4c167...

ilammy · 3 years ago
> I care entirely about the context of the developers mind when he committed that specific line of code.

In squash workflow, the PR is the context and the unit of change. It’s shoehorned onto git only because of GitHub.

In some way it’s a self-fulfilling prophecy: if you disregard individual patches then individual patches will be disregarded. It could also be attributed to git being hard to use, with all these “commits” and “patches”. So some developers treat Ctrl-S as the “commit” hotkey, putting whatever state of the codebase into commits, keep stringing them. Then in the end just ship it for review as is, since it’s the path of least resistance, never even expecting anyone to review individual patches, because as far as they are concerned they are not sending in patches, they are sending the PR. Then maintainers are faced with these PRs, and then they have a choice: either enforce the quality of “commits”, or review the PR as a whole and disregard the commits. The choice often falls for the latter, because the former does not provide any practical benefits for the process (rather than theoretical “but we’ll have individual commits when 5 years later someone blames them”), while the latter does provide practical benefits of not alienating developers with subpar commit discipline.

u801e · 3 years ago
> In squash workflow, the PR is the context and the unit of change. It’s shoehorned onto git only because of GitHub.

This could be useful if a PR is essentially changing only one thing like a commit should, but since PRs usually cover a feature or a fix for a ticket, these frequently involve doing multiple things to implement the feature or fix the bug.

> the [option of enforcing the quality of commits in a PR] does not provide any practical benefits for the process (rather than theoretical “but we’ll have individual commits when 5 years later someone blames them”),

Your parenthetical clause is not theoretical. It's basically a thing I have to deal with on a daily basis when I'm fixing bugs or removing/adding features in multiple code bases that were written by people years ago who no longer work there. I can't really make use of git blame because the commit messages are non-informative, the diffs are just work in progress saves, and the linked PRs just have looks good to me comments and no description.

But for code bases where commit quality was enforced, git blame actually becomes useful and allows me to see what was done and why it was done at the time it was written.

js2 · 3 years ago
I also work this way and advocate against squash merging or putting all the context into the PR.

A big reason for me is that I've built up a lot of context in my head when writing code. When I make the commit, that context is fresh in my mind and flows easily into the commit message.

But say instead I make a half-dozen commits with just single sentence descriptions. At the end of my work day I push the commits to a new branch and open a PR, and start writing the PR description. What was I thinking 5 hours ago when I wrote that first commit? I've forgotten. I've lost context.

So I always try to commit and document as I go. At that point I don't spend a lot of time getting the commit messages perfect, but I make sure they at least have all my thoughts. Before I push up the commits, I'll use "git rebase -i" and "reword" to edit the messages (typos, reformatting the text, maybe adding additional thoughts as I look at the diff). Then I open the PR and at that point, maybe I'll add additional context but usually my PRs are just an abstract and I ask reviewers to look at the commits themselves. It drives me crazy how hostile GitHub is to reviewing individual commits.

Many teams end up with squash-merges though because frankly: (a) developers don't know how to use git beyond committing and pushing so; (b) PR feedback is addressed by adding more commits and pushing them up, as opposed to amending the original commits; combined with; (c) GitHub only grudgingly accommodating an amend+force push workflow.

Frankly Gerrit is a better-designed review tool, but it's not widely used.

u801e · 3 years ago
> So I always try to commit and document as I go. At that point I don't spend a lot of time getting the commit messages perfect, but I make sure they at least have all my thoughts. Before I push up the commits, I'll use "git rebase -i" and "reword" to edit the messages (typos, reformatting the text, maybe adding additional thoughts as I look at the diff).

I actually don't bother with committing anything to version control until I'm done with the feature or fix. I then will create a diff against the base branch and stage each individual part and create a commit from it. I then push up those commits to the remote.

> It drives me crazy how hostile GitHub is to reviewing individual commits.

I actually have an ingrained habit of middle-clicking on the commit sha1 and add my comments on the commit itself. It doesn't really show up with any context in the main PR view though.

Lately, I've started running git log -p --reverse origin/master.., piping the output into vim, prefixing every line with '>' and typing my comments inline like I would with an email message (trimming the parts that I'm not responding to), and then pasting the entire thing in a single comment on Github.

> Frankly Gerrit is a better-designed review tool, but it's not widely used.

I actually wish more pepole would use the email patch review process. At least there, reviewing individual commits is the default and the relation among multiple commits is preserved. I'm not sure how that's accomplished in Gerrit.

simonw · 3 years ago
This is why my workflow is so issue heavy: I use issue comments to constantly capture my context as I'm working, rather than using small commits.

I sometimes even run "git diff" and copy the result into an issue comment to record a potential implementation route that I'm not fully convinced of just yet.

seba_dos1 · 3 years ago
> In the rare case you do actually care what PR a commit is from, GitHub's search will tell you.[1]. There's no need to put that information in a commit at all.

Even better - if you work with linear history (like most GitHub projects do I guess), you can always require merges to be fast-forwardable, but create a merge commit anyway. This way you retain all the individual commits and they're being grouped together by pull requests, so you can easily filter the details out with flags like `--first-parent` when you're not interested in them.

Squashing PRs truly is absolutely useless.

GauntletWizard · 3 years ago
The tooling just isn't there, though, especially if you want to do something like signing. I want to have my whole merge/pull request history in one subtree, while having the 1st subtree maintain a clear linear history with MR/PR Numbers in each merge commit, and also have that one be signed by the committer. It's easy to represent. It's a little harder to get the forges to do it right.
howscrewedami · 3 years ago
> Squashing PRs truly is absolutely useless

That is quite an absolute statement. Some people just have a personal preference for having git trees arranged in a certain way. If squashing can help with that, then it serves a purpose.

majormajor · 3 years ago
Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

The idea that I can commit every few minutes with commit messages that will make sense to you months later strikes me as incredibly unlikely, short of "write the code once to figure out how to do it, then throw it again and write it again to have a good chain of commits." And it's not worth that much - especially because the bit that's relevant when debugging may not be the bit that was relevant to me when writing the code.

Non-obvious things like that string/bool thing? I use comments for that. Yes, comments can get outdated, blah blah blah, but commit messages can also be unhelpful. What if that cast wasn't what was on their mind when writing the message anyway?

I would argue that if your PR was a thousand lines it's too damn big anyway. Figure out how to do multiple PRs, your reviewers will thank you. But I want commits that are cohesive, compilable, functional pieces. Not just path-dependent detritus of the dev process.

imaltont · 3 years ago
> Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

Not constantly, but will clean up the local/personal branch history before creating a PR with a rebase. Change some messages, maybe change up where some of the code is committed or even change the order if that makes more sense than the way it actually happened. It's a pretty quick and easy task to do when you get use to it.

seba_dos1 · 3 years ago
> Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

Yes. Being able to effortlessly do just that is the main value provided by git.

strictfp · 3 years ago
I've been following your school of thought as well as the one proposed by the author.

I would argue that your model is a lot freer and will lead to better code quality in the long run, since you're not feeling so restricted when working.

Meticulously crafting commits feel like one of those "good in theory" approaches that feels good but hurts you more than you realize. They're good for workflows where you move commits between lots of (release) branches, sure, but if you're doing trunk-based development I'm going to call it a bit formalistic.

lamontcg · 3 years ago
> I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed.

> Was it an accident? Did it serve an actual purpose? I might never know.

You're never going to get that level of detail.

Particularly if it was an accident or done without really thinking about it--the original coder will never document that because they were literally not thinking about it.

New commit rule: always document what you weren't considering when you wrote the commit.

Izkata · 3 years ago
> > Was it an accident? Did it serve an actual purpose? I might never know.

> You're never going to get that level of detail.

I have! I once found a bug that was caused by a typo in a linting commit, something obviously unintentional. If it was squashed it would have taken a lot longer to figure that out.

I also recently found issues in a common python 2 and 3 library where the code was broken in py2to3 and first-run-of-black commits. More places where it was obvious how and why the code was wrong because the formatting was a separate commit from the feature.

gorgoiler · 3 years ago
You wouldn’t publish the chapters of your book with a complete edit history and all of the margin notes and email discussions you had with your editor.

At some point you have to draw a line under your creative process and present a finished piece of work that stands up on its own. That means one diff with a meaningful commit message.

It might seem like a reasonable point that you want the unsquashed commit history in order to see into the mind of the developer. In practice, anything of consequence mentioned in those commits should be in the squashed message. I don’t need to see your thought process as you get something wrong three times then spend another three commits fighting with the linter all before you present your final thesis. I just need to see what you’ve finally settled on.

(See also: changes to a code base that land and are then immediately changed again. Developers like this put up code too soon and too persuasively. Try to get them to slow down.)

PS: Most projects I work on are mature code bases with between tens and thousands of developers. Most of the changes are iterative, adding features on the long march to revenue. There’s an exception to this process — which the OP alludes to — where large new features land as blocks of hundreds or thousands of lines of code.

When that happens, if the author spent a long time in modelling hell trying to find the best way to lay out their code, then it’s going to have a horrible commit history. If there’s anything they can do to land it in stages then they should — including rebuilding their history with pretend commits as if they have one clear idea after another. That is hard to simulate, so it’s often easier to break the code up on the orthogonal axis to time: space. With version controlled code, the dimensions of space are modules, and if a large change is broken into clearly defined modules then that provides the same benefit as a series of changes broken into clearly defined ideas / diffs / patches / PRs.

trashburger · 3 years ago
> I don’t need to see your thought process as you get something wrong three times then spend another three commits fighting with the linter all before you present your final thesis. I just need to see what you’ve finally settled on.

I don't think parent is advocating for a commit to stay the same before a merge. You can simply clean up those WIP commits via rebase + fixups and put them in a coherent order and structure (i.e. make them atomic as the article mentions). This and rebase-merge workflows make git history a delight to work with.

DeathArrow · 3 years ago
I see the purpose of code versioning in being able to save the code fast, being able to reverse code changes fast and doing code merged with ease.

If some philosophy stays against someone's productivity I don't believe it is a good thing. I would let developers commit in every state they find easy, provided the code builds and the commit comment describes what is being committed.

I think it is the PR which we can push more demands on but not commits.

I, for example, dislike having to track any small change I have made and commit it. I like committing larger parts which contain a functionality.

imaltont · 3 years ago
> I, for example, dislike having to track any small change I have made and commit it. I like committing larger parts which contain a functionality.

This is pretty much the suggested way by the creator of git too. A commit should contain the necessary changes for one bit of functionality/bug fix. This imo makes the history pretty neat and tidy, while at the same time making it easy to search through with blame/bisect whenever you need to. Giant commits (multiple functionality/whole project/extension squashed into one) makes both of those hard to use and in some cases pretty much useless outside of finding who did it and hope they still work at the company and remember their state of mind when they did the change.

darekkay · 3 years ago
> Blames are easy. There's zero need to try to keep your tree "looking nice".

I don't care about how the tree looks like. But I do care about keeping the number of commits low _because_ it makes blames easier.

> In the future when I'm hunting down a problem I almost never care about what feature or fix this was a part of - e.g. the PR - I care entirely about the context of the developers mind when he committed that specific line of code.

I've mostly made the opposite experience. Many developers don't have the knowledge or confidence to use "amend" etc. consciously, so it leads to a mass of useless "fix error", "forgot something", "add tests" and "implement PR reviews" commits. This makes blame more difficult. I also agree: sometimes a squashed commit message doesn't help me understand the code of line I'm blaming. But not having squashed the commit mostly wouldn't help me anyway. Instead, I try to avoid this problem beforehand, by making sure that the PR itself contains all the information (via PR description and/or code comments). Recently I've reviewed a "Fix scrollbar" PR. The "what" is clear. The "why" (= "it's broken") _sounds_ sufficient, but I was missing information like "how exactly is it broken" and "why does this change fix it". That's the kind of stuff I might blame in a year from now, and now it's part of the Git history.

b3morales · 3 years ago
> Many developers don't have the knowledge or confidence to use "amend" etc. consciously,

The secondary problem here is that GitHub (just to throw another complaint on the pile about GitHub) makes a dog's breakfast of tracking PR comments when things have changed. Amending and force pushing only seems to make it worse, so even though it should be the better workflow, it causes friction in a different place.

int0x2e · 3 years ago
In my experience, this only works if you have a small team of great engineers. Once you have 50+ people of various skill levels making contributions to the codebase, you'll start seeing your git history littered with variants of "fix", "fix 2", "fix broken CI test" and on and on...

For me, the PR is the context I can use when trying to find an issue. Sifting through many hundreds of commits per day is painful. Sifting through tens of PRs per day is not great, but is much more manageable...

seba_dos1 · 3 years ago
> you'll start seeing your git history littered with variants of "fix", "fix 2", "fix broken CI test" and on and on...

Should such commits even pass through the review in the first place?

MaulingMonkey · 3 years ago
> I am a firm believer that a commit should rarely be more than a couple minutes of work.

While I'm a proponent of small commits, this is overstating things IMO. I work on codebases that frequently take a couple of minutes to perform an incremental link, that put commits through dozens of CI hours of (useful!) integration testing, chase down C++ heisenbugs that take weeks to hunt down the root cause motivating a 1-line change with a 30-line commit message.

On the right codebase, I'm more than happy to throw a single file "fix whitespace" commit at CI without even locally compiling it, confident that CI will catch any problems, because it'll help untangle the diffs of future impending commits and make them easier to review. On the same codebase, I might make a commit touching 1000 files, mechanically switching code from an old deprecated API to a new one. If individual changes are "high risk" and potentially worthy of bisecting, proper review, etc. then I might split the commit up. If the individual changes are "low risk", and basically guaranteed to work if it compiles, then splitting up the commit is just adding noise - better to make it a complete and atomic commit than a micro commit, even if it might involve hours of work (e.g. adding a categorization enumerand to allocations or logging.)

The real risk would be something like changing allocation patterns breaking some uncaught edge case workflow by going out-of-memory, and the complete commit will be easier to bisect, track, and revert than dozens of scattered micro commits that individually made the crash only slightly more likely, but on the whole made the crash certain.

Unless I have a huge slog of entirely mechanical commits, I probably top out at maybe 20 commits on average over a flowing, coding-focused, 8 hour workday, doing work which is straightforward and easy to carve off completed atoms of work into their own commits. Which is what - 24 minutes per commit? Bit more than a couple. And I'm an outlier compared to coworkers over multiple game companies, who trend towards less frequent, girthier commits, even if they appreciate my approach.

> I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed. > > Was it an accident? Did it serve an actual purpose? I might never know.

I have asked myself the same questions of plenty of tiny commits, sometimes even my own. Tiny commits don't actually solve this problem. Thorough review, clear code, and proper documentation can help, but if nothing ever slipped through the cracks, we wouldn't have bugs in the first place.

And sometimes you discover it's actually a bug canceling out another bug, and that even the author of the commit didn't actually grok what was going on, even if they tricked themselves into thinking they did at the time.

---

My own rule of thumb: carve off small/atomic/freestanding "complete" changes into their own commits for easier review of both it and your future commit whenever reasonably possible, if only because the change is broken up. But even this has caveats - reviewers traumatized by past coworkers adding "code for the sake of code" might push back on these for lack of concrete use cases, and it may be easier to bend to their whims than to spend the time and political capital to fight 'em and bring them around to your way of thinking. "Reasonably possible" means "stop if it hurts."

Deleted Comment

bennysonething · 3 years ago
Git squash into main works well for my team. Lots a commits, often undoing then redoing things really doesnt add anything for me. In fact I find it harder to understand lots of small commits because the overall goal / change change is so fragmented. Also it's it's easier to revert a single squash.
simonw · 3 years ago
Is this example too large or about the right size for you? https://github.com/simonw/sqlite-utils/commit/ab8d4aad0c42f9...
js2 · 3 years ago
Here's what I would have written. This took me reading the code and clicking through to #299 to figure out and could have been in the commit message:

    Add optional tables argument to schema command

    The schema of every table in the DB is shown by default. Add optional
    tables argument to restrict which tables are shown.

    Closes #299

fragmede · 3 years ago
The commit message could stand to be the tiniest bit longer, instead of having to click through to find out what #299 is.
donatj · 3 years ago
That's probably fine, honestly. I'd maybe take a little more context in the message, but I certainly write messages like that myself. It's always later when I'm looking into problems that I want more context. I spend a lot of time hunting down problems. Half the time I caused them.

I know tone/angle of commit messages is an entirely separate bag of cats that people have strong opinions about, but I find the tone strange. I generally fond of commits that state what the commit does. Something along the lines of "Adds optional table support"

bmitc · 3 years ago
If you keep your commits so small, what do you put as the commit message?

Also, this showcases why Git is a terrible system. It contains no semantic information in it and is just a dumb line and text change keeper.

raydiatian · 3 years ago
This makes sense for bugs, which every feature secretly is
wellpast · 3 years ago
Going to express a counterpoint.

Maintainable/malleable software is mostly to due with the architecture of the code at any given point in time, not the sequence of commits that got it there. (I'll take all N,000 commits collapsed into one with a message "Stuff" if the code is well-articulated than a series of perfect commits that culminate into a highly coupled volatile system.)

Furthermore, I'll posit that these "perfect commits" have a tendency to trend toward overall bad code architecture. Primarily because the attention flashlight is being shined on the wrong then (the delta as opposed to the resultant codebase). In the course of working on a system I'll often earmark code decoupling/cleanup as part of a commit. Or move to get cleanup in quickly so that my general feature/value delivery is achieved. But processes that focus attention on commits (like this "perfect commit" and also formal code review processes) discourage this kind of continuous cleanup/refactoring that is the only way to get to good code.

ecshafer · 3 years ago
Commit basically doesn't matter to me, I think in the terms of PR. I can make 100 commits or 1, it doesn't matter because when I merge I squash them into a single commit that encompasses everything. There are some exceptions, as splitting a large merge into several smaller merges can help a lot, for example adding a database column in a pr, then adding in a separate merge usage of that column.

In terms of commits, the commit message should be a good summary of what is there. In terms of a PR there should be several things. One is obviously tests present. The next is a what, why and a how, which should be explanations of why you are doing it, what the purpose is, how you are doing. Finally I think there should be a descsription of how this should be tested manually.

rektide · 3 years ago
Its tragic how either/or git is.

Both are great in different ways. If Im trying to understand a complex tricky bit of code, a long string of small commits is ideal. If Im browsing a repo, large squashed commits or PRs are better.

It's be great to have better heirarchy, to get both.

jimmaswell · 3 years ago
Resorting to sleuthing around the commit history means the codebase has a problem with adequate commenting or unclear code.
seba_dos1 · 3 years ago
`git log --first-parent`

`git log --graph --oneline`

It's GitHub you want to complain about. git handles that perfectly well.

drdec · 3 years ago
You can have both if you keep the PR branches
dec0dedab0de · 3 years ago
I leave all the commits in without squashing, but filter to only show merge commits
stormbrew · 3 years ago
Git isn't either/or: GitHub is because it has extremely poor support for nonlinear history. It's not alone in that but git itself has the fundamental tools to support it.
bagels · 3 years ago
Having all the commits, including the ones where you change your mind and try something different is not useful. Spending a ton of time to rewrite history crafting a ton of tiny commits is a waste of time over squashing.
simonw · 3 years ago
Totally agree - if I'm working on a larger change I'll usually do that in a branch, which ends up as a PR which I then squash-merge into a "perfect commit" as described in the article.
neon_electro · 3 years ago
This is why I now love GitHub's "squash & merge"; it accomplishes so much of the ideals laid out in this "perfect commit" article.
jacoblambda · 3 years ago
I have to disagree. Mainly because I care what the contents of the branch were after the merge when I have to go back and figure out why a particular error is occurring.

Mainly my workflow is:

1. Commit often with moderately useful commit titles/texts.

2. Rebase -i prior to pushing to remote. During rebase, squash/fixup together relevant commits and write succinct commit titles with useful commit texts. Then push.

3. Rebase -i again at the end of the PR to guarantee everything is coherent and that each commit that makes it in the merge at least builds and passes existing tests (i.e. that functionality before the branch isn't broken in the branch).

4. Merge into master. Merge commit should include the PR info (like PR text contents) in the merge text if at all possible.

5. Forget. Maybe move to a new git host.

The years later when I identify a bug I can just:

1. Put together a reproducible example.

2. Git bisect --first-parent (only top level commits/merges) to identify the merge where the bug was introduced.

3. Git bisect the merge itself to identify the exact commit that introduced it.

4. From there I can determine the context of hows and whys behind the bug being introduced, identify a possible solution, and evaluate any process changes that could have prevented this bug from being introduced elsewhere in the project.

seba_dos1 · 3 years ago
I believe that "squash & merge" is the most useless possible git forge feature that never makes sense to use except in cases of blatant laziness or incompetence. Its only purpose is to plaster over the mess that people who don't know how to use git well make with their branches, and relying on it is always net negative for the utility of your repository.

The only case where I use "squash & merge" is when I get some trivial merge request from someone who clearly doesn't grok git well and posts tons of fix up commits. When their change would boil down to a single logical commit anyway, then sure, I can click that button instead of asking them to clean up their branch - saving time for both of us. Otherwise, there's absolutely no benefit to it that I can perceive. Every reason that I have ever heard from people using that workflow could be nullified by learning how to use git a bit better, often bringing lots of other benefits to the table.

(there's one exception - "GitHub doesn't handle PR reviews well" is a somewhat valid reason, but that's a perfect excuse to look for alternatives or at least to complain to GitHub to make them finally fix their product; it's not the only forge out there, GitLab deals with branches getting rewritten during review well)

js2 · 3 years ago
Perfect commits are a bulwark against technical debt:

https://www.infoq.com/articles/business-impact-code-quality/

https://news.ycombinator.com/item?id=33372016

The extra time spent crafting a perfect commit today will save 10x that time in the future. Your future self will thank you, and if not your future self, then I will, when I inherit your code.

psychoslave · 3 years ago
I very rarely check the repository history, even less to find an answer in how some buggy behavior was introduced.

At most I'll git blame, search the issue repository and see if I can chat with the concerned people.

Understanding the current relevant code and what need to be done is basically always sufficient to do the job. Digging into history, while potentially very interesting, never helped to reduce the time to resolve anything in my experience. If the code is a mess, the history will be exponentially so. If the code base is clean and well documented, you might more likely have a great set of "perfect commits" that you will never need.

js2 · 3 years ago
> see if I can chat with the concerned people

The concerned people quit and now I'm left supporting their untested, undocumented, unreviewed code that doesn't have commit messages more meaningful that "now it works", "fix it", "make it work", "another try".

So now answering every "why?" question takes me much longer than if the damn code was just documented in the first place.

I often go back to my own code, go to make a change, can't remember why I did it a particular way, run "git blame, git show" and thank myself for having the foresight a year ago to spent an extra minute or two when the code was fresh in my mind writing down why I did something a particular way.

If that doesn't float your boat, fine, but I don't want to work with you.

The git repo itself is the epitome of what I aim for in my commit messages. Here's an example:

https://github.com/git/git/commit/d3775de0745c975e2d13819a63...

Literally a single line change, the addition of:

    BASIC_CFLAGS += -O0
to the top level Makefile, and 9 paragraphs of explanation in the commit message.

grogers · 3 years ago
Writing good code and writing good commits definitely clusters together. It's part of a general pride in quality of work, organization, and planning.

Going back into the commit history can be useful to figure out if something suspicious was introduced intentionally or if it was just a mistake. Beyond about 3-6 months, the author will generally not have any context to remember small code details, so leaving breadcrumbs for your future self and team can be nice.

Deleted Comment

quickthrower2 · 3 years ago
History is useful because blame often get’s it wrong. So it is a manual blame.
keyle · 3 years ago
Until the requirements change 12 times over 12 months, and that time invested writing prose with all the stuff that goes with it goes down the drain.

Don't get me wrong, I'm not saying commit crap. I'm saying "Yes But". Use code as documentation, use Tests as defence turrets, forget the poetic muppetry. The code should reflect the now and mark the pitfalls. If the code is well written, I should be able to understand the why it's done this way. Don't send me off to some document on confluence written by the dude that left 2 years ago.

simonw · 3 years ago
That is the exact argument I'm making this article: if your documentation is on confluence it will inevitably go out of date.

The solution to that is to keep the documentation in the same repository of the code, and update it in lock step every time you make an implementation change that affects the documentation.

wruza · 3 years ago
For a counter example, most of my work projects usually live up to 3-4 years (with 0.5-1 year of iterative development) and I have never investigated a commit from the past nor had to understand what it does, because that state of a project has nothing to do with today’s and checking it out or blaming is usually completely meaningless, unless it’s from the last week or two.

I believe that most of these high-culture commit advices come from a specific (maybe common in bigcorps maintenance phase, idk) development process and/or paradigm, which may or may not be present at your workplace (though you may pretend they are, as we all do sometimes to look better). Personally and company-wise we are using RCSs simply to merge parallel work and to have an undo stack a little deeper than an editor could provide. That’s it. Nobody ever goes to the history, unless there is a technical issue with RCS itself.

I’m not suggesting to write “new task” in a commit message, but am not fond of writing markdown poems there either, or committing every 3 minutes because changes “must” be short. Commit when it’s done, for some reasonable definition of “it” and “done”, and explain it in one line in under a minute, this is my empirical rule. If you have more to tell, tell it in the comments. (That’s what your future self will really be grateful for.)

Pretty sure that many people who nod to all this noble culture actually silently relate more to the above.

Edit: should have read this subthread before being cautious in my statements. Glad that there are people who share these views. Because reading on how to do it The Right Way and then waiting for “the obvious benefit” over decades may be confusing and anxiety inducing af.

js2 · 3 years ago
You've presented a strawman argument. No one is suggesting committing every 3 minutes, nor writing markdown poems.

Commit your work in logical blocks that make sense and explain what the change is doing and why so that a developer other than yourself can understand why the change was made. That's all. Unless you expect 100% code churn, you'd be surprised what really survives in the history. And even in the course of churning code, I often refer back to commit messages for the existing code.

As I've mentioned elsewhere, I've been doing this a long time (more than two decades) at startups and fortune 500s, on both open and closed source code, on projects of all sizes. I'm surprised frankly that 3-4 years on a single code base has not been enough time for you to have never wished you'd written a better commit message.

I literally lost an hour today working backwards from an error message to what the code was actually complaining about. No comments in the code. No documentation on the API endpoint. No commit message or PR description to explain the change beyond "make it work now with IPA uploads." In my experience, all these things go together and are signs of a conscientious developer.

The code tells the computer what to do. The commit message tells other developers what the code is supposed to do and why it's supposed to do it.

This has nothing to do with nobility. It's about sustainable development that tries to avoid technical debt, about being forward thinking, and about courtesy to future maintainers of the code you've written. Someday that future maintainer may be you.

mianos · 3 years ago
Is this irony? Given the time.to write the odd comment, tidy a bit of code write a commit message, I know what I ask for. People read the code. I am sure some people jump right to the commit log but I have never met one in my 40 years as an active developer. Commit messages matter. A short statement of intent is a good idea. Dangerfiles are the penultimate expression of bikeshedding.
brian_cunnie · 3 years ago
> An issue is more valuable than a commit message

Watch out! An issue is more ephemeral than a commit message: The issue might not always last, but the commit message will. If, say, a project is re-hosted from GitHub to an internal GitLab instance. Or maybe the original project dies, and all the active work is done on a fork. In such cases, the original issues are gone, but the commit messages stay.

So put everything that's important into the git commit message. Don't assume the issue will always be there.

I learned this the hard way when the company I worked at used something similar to GitHub issues (Pivotal Tracker stories) to record the important background to a commit, which was great until the story was purged (because, say, the Tracker project was deleted).

Izkata · 3 years ago
Indeed, we went [something old] -> FogBugz -> Jira and cvs -> svn -> git. All cases from that original system have been lost and some but not all of the FogBugz cases were imported to Jira (and even the ones that were imported are hard to find), while the repos went through an import process so all commits and their messages have been preserved. The git history is a bit wonky because of how branches work differently, but "blame", check commit, "blame commit~1" to iterate backwards in the history to track a change do all still work as expected.

We now have been having to stop devs from using Gitlab as their primary issue tracker instead of Jira because the non-devs use Jira for everything and we need to keep them in the loop.

8n4vidtmkvmk · 3 years ago
its worth porting the issues to the new platform. i ported bitbucket to GitHub which was annoying because you can't choose the bug # so you have to import then in order and get it on the first try.. but at least everything lines up now
Macha · 3 years ago
In larger companies this is rarely your decision to make and in three different issue tracker migrations I've seen the team responsible for maintaining the issue tracker instead to keep the old tracker in a read only state until they either have to renew the vendor contract or make security updates, at which point the issues go down the drain. This may all have happened before you joined, so it's not likely you could have done the migration yourself (and the security team may have disabled generating API keys which automated migration tools need, anyway)
thom · 3 years ago
Somehow I’m really wary of all the energy that goes into source control workflows and commit messages. It’s all meta-information and the only time it really matters is when something has gone catastrophically wrong. If you genuinely have to go looking at the history of a codebase to find out why something weird has happened, that feels like a codebase crying out for a better structure. New features shouldn’t be major surgery, they should be instances of established abstractions. I’m very interested in languages that make that modularity easier but if you can’t delete your entire Git history with no negative consequences that just seems like a disaster to me, on both an individual and industry-wide scale.
cratermoon · 3 years ago
> the only time it really matters is when something has gone catastrophically wrong

Or in other words, it matters most when it's most needed and the recovery process is glad to have them.

Think about the airline industry and how many times the causes of an accident have been found in the months-old maintenance records of an aircraft, or in an examination of pilot training from last year.

Now imagine throwing away all the information that could have led to finding a cause and putting in place mechanisms and practices to prevent a recurrence. This is how the software industry continues to stumble over the same problems that were known and described in the 70s.

thom · 3 years ago
I don’t think it’s _necessary_ when something goes wrong. I think it’s a sign of a process going wrong (which is just as likely as your commit message process going wrong). If you can’t work out why a piece of code looks the way it does or why it even exists, the fix isn’t better commit messages, it’s better code.
rtpg · 3 years ago
The very obvious place where this is useful is when seeing a random `if` statement and wondering why it's there. Git can pull up the commit it was added in, and the right context.

This can also quickly identify who added it (so you can ask someone "hey, if I changed this to that would that make sense in your opinion?")

Like with many things in life it's not a guarantee of exactitude or rightness, but it can be very helpful information!

thom · 3 years ago
But we can both accept that the meaning not being clear in the code is bad, right? And the ideal solution would be the code being clearer, not the commit messages being clearer? I just think I’d always put my effort into the code and incentivise that over everything else. I mean, if you have a highly involved Git workflow already, aren’t you doing code reviews and catching this stuff anyway?
glacials · 3 years ago
Author might be interested in Git notes (`git notes --help`). Notes are arbitrary text content added to commits that don’t affect the commit’s identity, so they can be amended without rewriting history.
simonw · 3 years ago
Yeah I've not used those myself yet but I've been wondering if they might provide a path to "back up" my issue comments to the repository itself.
strictfp · 3 years ago
Ooooh, thanks for mentioning that!
andirk · 3 years ago
1) Write commit message. 2) THEN start doing work. If you get up for a spliff break, 1st thing after sitting back down is re-read commit message. Stay on track. 3) Stage files that are kempt, like a mini commit within the commit. 4) Once all files are staged, look at them and look at the message to make sure you have mentioned all that changed, including sometimes _why_. 5) Commit! But don't always push! Only push if you really like what you did. 6) Repeat. Squash similar commits together. 7) Rebase in master or merge in master. 8) Push! 9) Repeat! 10) PR!

Your commits will read like a story of the what and sometimes the why. PRs go smoothly. All is well. Lots of steps but it takes very little extra time.