This creates precisely the kind of commit messages that I regularly scold junior developers for :)"
In my opinion, commit messages should clarify the intent of WHY you changed things. I can already see WHAT you changed from the diffs.
But of course, any tool can only work with the what, they cannot know that these lines are related to a bug report filed in a technically unrelated system.
Personally, I think that the following is a good approach:
PROJ-2354 add/modify/remove/... WHAT to implement/fix/... WHY
with the code showing the HOW.
Ideally, with the commit/merge request having a textual description and/or a list summary for the overall changes, alongside some diagrams/images/gifs/videos, as well as further discussion where applicable. Oh and an issue management system of some sort with the original (business) requirements, notes from requirements engineering, as well as information about testing. Something like architecture decision records (ADR), script snippets, Markdown Wiki documentation or install instructions can also live in the repo. Then, with a decent test coverage and CI setup, it can also be pretty safe to merge the changes, because most of the stuff concerning them will be known and understood.
But at the end of the day, there will be as many opinions as there are people.
For some, there is no need for longer commit messages (e.g. with multiple lines, like a separate subject/body with explanation) which is more or less my case because that information will be in the merge/pull request. Others will say that filling out merge/pull requests is unnecessary because the commits should have that information (I disagree, but I've heard that stance). Some other people won't even bother with commit messages because in their eyes working code at the end of the day is all that matters (once again disagreed, but we've all seen "code fixes" in the log before). And some will have way different workflows, like not using a web UI of some sort for discussion but instead relying on commit logs and mailing lists.
Use whatever workflow feels adequate for you and your colleagues.
I would go for the combination: first what, then why.
Links break, so a link to why is not good enough when it comes to long-lived code. A good commit message should however start by completing the sentence "when committed this will ...". This makes reading the one-line summaries of the git log the clearest to interpret what happened.
In my way of working, the 'why' can go into the overall PR and the 'what' into the individual commits. Both are important - the reason for changes and a concise summary if what you've done.
As I always say in these conversations, PR descriptions and comments are ephemeral. Your git history should be forever, but you’re not guaranteed to be in the same repo on the same host for eternity.
I have already worked on multiple projects that got handed to us as a .git/ folder. Commit messages referencing non-existent issues abound.
I now make my whole team ensure that nothing crucial is left to live in the PR alone.
I fully agree that this is an (even) easy(ier) way to write crap commit messages.
> But of course, any tool can only work with the what
Well, it could be a lot better at least - imagine passing a Jira ticket, telling it it's a bug fix or feature (the script could determine from the API); then you could probably get it not only to neatly summarise 'why' for the subject line but also have a go at relating it to the diff for the body.
You can still write a multi-message commit with two messages:
1. Short summary of what is being changed
2. Explain WHY
I think the point is that even if 1. is missing it can be worked-back by reading through the diff. But if 2. is missing then the future generations have no way of finding out reasons behind some decisions.
The summary of changes can be inferred, or generated automatically with gpt ;)
"Why", on the other hand can be lost.
Especially during refactoring. Let's say you removed some assertion / safety check from a function, because you verified that it's not necessary there. Without explaination in a commit, someone may not get your reasoning.
Same thing with renaming variables, reordering the code etc.
Comments may be useful in some cases, but in many cases there won't be a right place to put them in.
the body can summarise and expand on (..if you know what I mean) the diff as well as explaining why:
Due to <arcane language reason> in this case <syntax> was
interpreted as a baz, when clearly the author in <blame commit>
intended foo, which would return the response with bars here
as expected.
This commit fixes the issue by adding an explicit semicolon,
thus forcing the foo interpretation.
That's probably overkill for a simple syntax error (unless it really is that arcane in which case it might be a bit of a teaching moment/object lesson).
But "why" is very important for the future code owners. Year or tho later someone else adding a new fix may have a question about the existing parts to avoid breaking them. And the only thing he can rely is `git blame` to figure out "why it's implemented in this particular way"
Generated commit message could explain the why, if the code changes had comments explaining this sort of thing. Someone still needs to document why changes are being made, but at least you only need to do it once. And maybe GPT-3 can do a good job of selecting the relevant info and summarizing the why of the change?
Would you happen to know the justification behind "capitalize every commit subject line"[1]? I can understand finding it more appealing, but talking about it being as important as limiting the subject to 50 chars and not ending it with a period (which has a sensible justification), not as much.
I just had a realization. Usually in my private repos I do “what;why” so I can go back to commits when I brake stuff. But I should be using branches for what and commits for why…
A bad commit (that one of my coworkers always does) is "update file.ext". Says nothing other than the name of the file that was updated, which ends up with tons of repeat commit messages for common files and provides zero info that wasn't already included in the commit itself.
Another poor commit is a description like "adds padding". It's a little too vague and doesn't really tell you much that wasn't already apparent by looking at the change itself.
A better commit might be something more like "Add variable padding to ProductLogo component, fixing logo overflows for issue#78". It summarizes the change, the intended outcome of the change, the reason for the change and a reference to an issue all in one short sentence.
You don't have to go into overwhelming detail for every minor front end change but if you're intelligently tracking and squashing your commits writing them well can help a lot later on if you ever need to understand the context of an older commit or even a given line in the codebase.
Subjective and also depends on the culture where you work. I’ve worker at places where the majority of the “why” Is in a JIRA ticket, so the commit message better reference that ticket number. Not so at other places. See what I mean?
> In my opinion, commit messages should clarify the intent of WHY you changed things. I can already see WHAT you changed from the diffs.
And I'd scold you for doing that if I were your superior. The WHY should be in the pull request, not in the commit message. a commit message should succinctly explain WHAT was changed from an architecture/organisation perspective.
'Change rounding to thousandths' isn't overly helpful, and probably apparent.
'Fix overspending bug' is vague.
'Fix overspending issue by rounding to thousandths instead of hundredths' is the ideal commit msg here, as it gives a brief what and why. Possibly even with a ticket number, though I see how after years and switching systems that becomes less useful. More useful is briefly describing the why as a code comment, using good judgement of course.
I would say the opposite. The manager who receives the PR (merge request in gitlab) needs to know what has changed (if it is not obvious from the diff) to assess the change before accepting it. He has to know what has changed, for example to decide which non regression tests to performe.
The final user of the software will receive a changelog (a list of commit messages) that shall identify the bugs that have been fixed and the new user requirements that have been added. He needs to know why the code has changed to know what he has to do.
Some examples of commit messages it generates would be useful, especially compared to good commit messages like the ones usually found in the Linux kernel.
If your code is on Github or you use Copilot, it's already part of it. So this is just taking advantage.
Also, didn't we, at this point, establish that source code means nothing. I mean, we could have FB source code today and do nothing with the huge network and compute capability.
Unless you have some legal issues, I dont see a peoblem here.
Why put the commit message there then? You could just use a git-client that adds this text as description for commits. There is generally little use to store automatically generated content in databases, the input for generation should be enough.
Just a few days ago, another implementation of exactly the same concept was discussed at length: “Gptcommit: Never write a commit message again (with the help of GPT-3)”, https://news.ycombinator.com/item?id=34444953
It was a fun project, and I think I will re-use some parts (prompt generation, selection via fzf), but for the specific use-case I think the assumption that a meaningful commit message can be generated by just looking at the changes is flawed, since it's not really possible to distill intent from a git diff.
In my opinion, commit messages should clarify the intent of WHY you changed things. I can already see WHAT you changed from the diffs.
But of course, any tool can only work with the what, they cannot know that these lines are related to a bug report filed in a technically unrelated system.
If the why isn't obvious and there's no link to a tracking system that explains it, it's fine if it's in the message body.
I do want the why in comments, though.
Ideally, with the commit/merge request having a textual description and/or a list summary for the overall changes, alongside some diagrams/images/gifs/videos, as well as further discussion where applicable. Oh and an issue management system of some sort with the original (business) requirements, notes from requirements engineering, as well as information about testing. Something like architecture decision records (ADR), script snippets, Markdown Wiki documentation or install instructions can also live in the repo. Then, with a decent test coverage and CI setup, it can also be pretty safe to merge the changes, because most of the stuff concerning them will be known and understood.
But at the end of the day, there will be as many opinions as there are people.
For some, there is no need for longer commit messages (e.g. with multiple lines, like a separate subject/body with explanation) which is more or less my case because that information will be in the merge/pull request. Others will say that filling out merge/pull requests is unnecessary because the commits should have that information (I disagree, but I've heard that stance). Some other people won't even bother with commit messages because in their eyes working code at the end of the day is all that matters (once again disagreed, but we've all seen "code fixes" in the log before). And some will have way different workflows, like not using a web UI of some sort for discussion but instead relying on commit logs and mailing lists.
Use whatever workflow feels adequate for you and your colleagues.
Links break, so a link to why is not good enough when it comes to long-lived code. A good commit message should however start by completing the sentence "when committed this will ...". This makes reading the one-line summaries of the git log the clearest to interpret what happened.
I have already worked on multiple projects that got handed to us as a .git/ folder. Commit messages referencing non-existent issues abound.
I now make my whole team ensure that nothing crucial is left to live in the PR alone.
> But of course, any tool can only work with the what
Well, it could be a lot better at least - imagine passing a Jira ticket, telling it it's a bug fix or feature (the script could determine from the API); then you could probably get it not only to neatly summarise 'why' for the subject line but also have a go at relating it to the diff for the body.
You can still write a multi-message commit with two messages:
1. Short summary of what is being changed
2. Explain WHY
I think the point is that even if 1. is missing it can be worked-back by reading through the diff. But if 2. is missing then the future generations have no way of finding out reasons behind some decisions.
"Why", on the other hand can be lost.
Especially during refactoring. Let's say you removed some assertion / safety check from a function, because you verified that it's not necessary there. Without explaination in a commit, someone may not get your reasoning.
Same thing with renaming variables, reordering the code etc.
Comments may be useful in some cases, but in many cases there won't be a right place to put them in.
Compare:
But "why" is very important for the future code owners. Year or tho later someone else adding a new fix may have a question about the existing parts to avoid breaking them. And the only thing he can rely is `git blame` to figure out "why it's implemented in this particular way"
Would you happen to know the justification behind "capitalize every commit subject line"[1]? I can understand finding it more appealing, but talking about it being as important as limiting the subject to 50 chars and not ending it with a period (which has a sensible justification), not as much.
[1] https://cbea.ms/git-commit/#capitalize
Deleted Comment
Another poor commit is a description like "adds padding". It's a little too vague and doesn't really tell you much that wasn't already apparent by looking at the change itself.
A better commit might be something more like "Add variable padding to ProductLogo component, fixing logo overflows for issue#78". It summarizes the change, the intended outcome of the change, the reason for the change and a reference to an issue all in one short sentence.
You don't have to go into overwhelming detail for every minor front end change but if you're intelligently tracking and squashing your commits writing them well can help a lot later on if you ever need to understand the context of an older commit or even a given line in the codebase.
But this might be a good start: https://www.conventionalcommits.org/en/v1.0.0/#summary
We've used this as a starting point and adapted it to our needs (E.g. some simplification, defining the possible values for scope, etc.)
I must have seen these commit messages so many times if I had a penny each time, I would be rich by now.
And I'd scold you for doing that if I were your superior. The WHY should be in the pull request, not in the commit message. a commit message should succinctly explain WHAT was changed from an architecture/organisation perspective.
'Change rounding to thousandths' isn't overly helpful, and probably apparent.
'Fix overspending bug' is vague.
'Fix overspending issue by rounding to thousandths instead of hundredths' is the ideal commit msg here, as it gives a brief what and why. Possibly even with a ticket number, though I see how after years and switching systems that becomes less useful. More useful is briefly describing the why as a code comment, using good judgement of course.
The final user of the software will receive a changelog (a list of commit messages) that shall identify the bugs that have been fixed and the new user requirements that have been added. He needs to know why the code has changed to know what he has to do.
Deleted Comment
[1] https://whatthecommit.com/
It was a fun project, and I think I will re-use some parts (prompt generation, selection via fzf), but for the specific use-case I think the assumption that a meaningful commit message can be generated by just looking at the changes is flawed, since it's not really possible to distill intent from a git diff.