This is a great idea. Interestingly, Google has a tool that will analyse your git history and identify "hotspots" i.e code that is regularly associated with commit messages with words like "fix".
I'm wondering if the same general idea is applicable to other types of commits given your list. For example, if you are regularly adding features and a certain part of the code base is touched, perhaps with a lower ratio of "refactor" commits, that code could be a solid candidate for refactoring.
Not a bad idea actually. It's like the PowerShell 'approved verbs'. At first I was like 'meh' but after a while it makes sense as it greatly improves discoverability. Looking at a couple of repositories I contribute to this also looks close to what 'good' committers tend to use automatically.
The current buzzword for feature flags is 'branch by abstraction'.
The idea is that instead of making a version control branch to change feature A to feature B and then merging back into the mainline of development, you build an abstraction over the thing you want to change, build a new implementation of that abstraction, switch out the two and then (if you like) remove the abstraction, all within the main line of development.
So instead of history that looks like this:
* Merge branch 'feature-cookie-login' into master
|\
| * Polish up cookie feature
| |
. * Switch from tokens to cookies
. |
. * Clean up and refactor login code
|/
.
.
.
Your history looks like this:
* Stop abstracting the authentication type.
|
* Switch from auth tokens to session cookies
|
* Add a SessionCookie authentication type.
|
* Start abstracting the authentication tokens as a generic authentication type
|
.
.
.
But with any completely arbitrary commits interspersed between those commits, as none of them break other code. The first one creates an interface, the second one reimplements the interface, the third switches the used implementation and the optional fourth removes the abstraction and deletes the old implementation.
The idea is usually to allow committing partially completed or deployed features, which are hidden by config flags until you're ready to activate them. When the feature is fully baked and effectively always on in production, you just remove the flag to make it permanent.
Using feature flags can increase a team's productivity by encouraging multiple commits a day, every day. It can also make rollbacks faster.
I particularly like this because it doesn't interfere with the flow of the commit message's first line in explaining what it does. There are too many commits out there that waste half the first line with the ticket number, area of code, etc.
Instead of 'JAT-1241: app/index.js(opt): Optimised the index', 'Optimised the index' should be fine. Tools can understand that, and can already work out which files changed.
Some of the active verbs are also commands that automatically close/reference issues right out of the box on GitHub & BitBucket (& I'm sure on GitLab too)
This is a really nice way doing commits. It's the kind of simplicity that can be made into a simple document and shared easily, or printed out and put somewhere everyone can see.
Might try and get my team into using this.
At the moment our teams commit messages are a mangled mess of everyone's own 'commit language'. It can be really tricky to quickly scan over commit logs and get a feel of where development has been heading over the last x weeks.
I use similar strategy with my team. In addition I ask them to summarize in one line the job they are going to do before starting the job... which is related to the task description in the task board. That is normally the commit message. Works (most of the time...).
How do you enforce such commits messages? People makes mistakes, or forget stuff. But when you have a pull request, all intermediate commits are already pushed to central repository. They already are public. You can't change them anymore. Pre-commit hook?
The commit-msg hook. You can use it to validate your project state or commit message before allowing a commit to go through. The git docs demonstrate using this hook to check that your commit message is conformant to a required pattern.
There are so many articles like this, and all of them focus so much on prescriptive rules for commit messages. There's a similar set of articles on how to deal with branching and merging. Somehow, everybody comes to slightly different conclusions.
The discussion that needs to happen before this is to understand what tools you want to make available to your developers in the future. Using git's history as a first class debugging tool is powerful, but it's by no means mandatory to provide. There's also a real cost to providing each of the tools.
- Do you want bisect to be available? Well, then you should have most commits represent a fully-functional version of the software. Consider squashing branches when you merge them.
- Do you want narrative documentation around strange choices? Fine-grained commits are a great place to put those thoughts, but they may discourage devs from writing those thoughts in inline comments.
- Do you want ownership via git blame? Line-by-line changes may help you identify who wrote the code, but that might prevent your developers from ever fully transferring ownership, which could create bottlenecks in startups that have a few long-tenure devs and a lot of recently hired devs.
I really like to think of git history as a context tool, like monitoring or unit testing or documentation. It's worthwhile to sit down with your team, define what you want them to be able to do with commit history, and build your commit style from there.
My personal, subjective impression: Commits are getting smaller and smaller nowadays. As in: In the subversion days, many people commited only few times a day, sometimes not for several days. SVN commits of course involved a sync with the server (a "push" in git lingo), and thus usually represented a much larger increment with a substantial change to the code base [X]
With git, it became very common to structure changes to a code base in many, very small commits. Rename a variable? Commit. Write some docs? Commit. Of course, the overall changes when developing a feature did not become smaller, they are now just distributed over many more commits. So I'd argue that a SVN commit was often conceptionally closer to what we now have with a git pull-request.
Why does this matter? Because It is kind of hard and not helping anyone if you describe your renaming of a local variable with an extensive docstring.
What I do miss however, is a good description of the overall change. I.e. now often the description in the merge commit is just the autogenerated message, but this is where I would like people to really take the time and describe the change extensively. This is why I like `--squash` merges, because they let people focus on the relevant parts in their description. I know, rewriting history is bad, but overall, I favour reading a history book than 18th century newspapers.
[X] not saying that there weren't small one-line-change commits, but overall they were rarer.
Never thought of that usage of merge commits. This is a great place to write the couple paragraphs that you might have in a Pull Request, better than squashing IMO.
I've found that for smaller commits, if you have something long you want to explain in the commit message body... you should probably put it in a code comment!
If you don't think it merits a code comment, it's probably not important enough for people to look up the commit message body either (if only because the commit message body is less likely to be seen).
Changing public history is bad, because it makes collaboration and two devs working on one branch harder.
But I do not see a problem with rewriting history on a branch, if (and only if) you kind of know that no one else is pulling the changes. Or, when merging a PR, a rewrite is okay too, if the next feature will be branched off of the trunk, too.
Also, mercurial's tooling seems to help https://www.mercurial-scm.org/wiki/ChangesetEvolution with rewritten history by making it easier to track history rewrites. Basically I think this is a path in version control systems worth exploring.
Not only not a problem, but a must in my book and I'm fairly sure I'm not alone. For me it's like a new workflow which I always wanted but never could have without git. A lot of days for me now consist of creating a lot of small commits and then every couple of hours when a single 'thing' is finished, start an interactive rebase and create a storyline which is easy to read, understand and follow. This can be even one commit sometimes if it makes sense. And in repos I manage myself an if the change spans several days it's usually big and I might create a seperate branch and have a merge commit so it's extra clear all commits belong to feature/xxx.
I find tons of small commits a clutter and waste of time. I don't see any reason for doing so. On the contrary I can see disadvantage - reading and understanding a history later may become difficult task. After all what counts is your full chunk of work, reviewed via pull request, and merged to master. It should be treated as a whole.
Has it really become so common with git? I don't see such trend around me.
>On the contrary I can see disadvantage - reading and understanding a history later may become difficult task.
I'm replying to you but this is directed at everybody who advocates squash merge and discourages small commits.
IMO this is a tooling problem, plain and simple. When I am committing to Git, I am using the "write" components of Git which are incredibly powerful. I can commit in as small a chunk as I want and preserve the richest history of all the small changes I've made, knowing full well that the state of the code at HEAD will not be degraded for doing so. If I make two small independent changes, I can feel free to branch them separately and then merge them together to show that they could have been performed in any order.
When you read my history, you are using the "read" components of Git. Unfortunately these are not as powerful. You can do some nice things, like if you want to treat history as a straight line you can use `git log --first-parent` and you'll see only the merge commits (as if all merges had been squash-rebases).
It would be much better if you were able to collapse or expand any sequence of linear commits to gloss over the lower level details. But as far as I'm concerned, this is a problem with the "read" components of Git, not the "write" components, and so I will continue to use the "write" components to their full power. And the best part is that if I do it this way, we can improve the "read" components and allow the reader to collapse my verbose history, but we will never be able to expand pre-collapsed history.
The main reason I request commits to be split up is for ease of code review. It's much easier to review three commits that each do one easily comprehensible small thing than one commit that does three things at once. It's also better if you find there's a bug -- you can bisect down to a commit that's fairly small where the bug should be easy to see, rather than one that's enormous and where the bug is hard to find among all the other changes.
Small, incremental commits are an asset with git blame, git bisect and git revert. I find it much easier to deal with too many small ones, rather than too few large ones. Especially if you keep the convention that master is always "merged into", i.e. "left of the merge", i.e. "parent 1".
> After all what counts is your full chunk of work, reviewed via pull request, and merged to master. It should be treated as a whole.
I find the PR mechanism works great for the view of the whole, whereas the individual commits are great for the pieces. So in my commit history, you can read the timeline, and then if you want to see the commits squashed down, you click on the individual PR. On the PR screen (assuming you're using GitHub), it has a nice list of the subject lines of each of the individual commits.
Commits can serve as a supplement to documentation. When you properly commit the different logical steps that led to the current state of the code, it becomes incredibly easier for another team member to get why and how you have implemented things a certain way.
Would be interesting if there was a way to annotate a set of commits, like "commit ???? - ????: refactored A,B, and C" so you'd get the advantage of small commits and clearer messages.
This is what PRs are good for. Also, with my particular approach to commits, I always have at least one issue associated to a commit, and I'm always working on a particular branch associated to the issue. I pick an emoji that captures the issue/branch in a single concept, and I have that in my subject line. This is combined with my git commit template mechanism, and I like it. At a glance, I can see which commits belong together, and if I want to look at the whole, I go to the PR.
I think you can do that in a merge commit, sort of.
The more I think about it, the stranger a strong aversion to rewriting commit history for clarity is. In university if I did some math / physics calculation, I would often start, and once I got somewhere, make a clean copy of the successful work to have a concise and revised version.
Unfortunately, I'm guilty of the opposite: I rarely, rarely commit. Maybe one commit per point. I have to consciously remind myself to commit more often.
But kind of it was also the tooling. Most svn projects I worked on were trunk-based and thus integrated much tighter than git feature-branch based code. However, the times I merged subversion branches, I kind of was sure that subversion lost some changes.
If you like this format, then you may want to try a similar format that uses the same purpose, plus uses words that easier to read and that make more sense to people in more cultures.
We use Add, Fix, Refactor, Reformat, Optimize, etc.
Agreed. I started using these on private & professional projects a year ago (and mostly got the team to use them, too) and it's a pleasure to browse the git log!
In the beginning the definition of "scope" is a bit wonky per project. However, once it solidifies you can easily start going through your log looking for "feat(endpoint)" to find new routes that have been added to an API for example.
I've been writing commit messages this way for a little under a year (I think this is the same guide I used when I was looking for a more consistent form to write them and to avoid the dreaded -m "Fixed some things").
One thing I noticed is that it's increased my confidence in my commits; at the moment that I go to write the commit message because I'm describing why I made the decisions I did it breaks logical inconsistencies between what I've actually done and what I think I've done. If I'm able to explain all the change I'm much more confident that it's correct.
Another is that bad commit messages have trained people not to read them. Often people will ask why I've made a change and then discover that the commit message contains the answer to their question!
Why? Because the git CLI doesn't wrap properly? To borrow a quote, that seems like a 'you' problem, not a me problem.
Maybe I'm just biased because these days I almost entirely interact with git through a GUI (either desktop client or web interface), and though I use the CLI occasionally (mostly for branch management, sometimes for quick commits) I can't think of the last time I used it for any type of history viewing -- pretty much any GUI is going to do a better job of that.
My team often uses markdown (mainly bulleted lists) and the output looks terrible when you insert manual line breaks (because markdown interprets that as meaning that you explicitly want a line break there) and you're viewing it on a screen/viewport that is either larger or smaller than 72 characters wide.
Unless you're explicitly using a publishing format (eg, LaTeX, PDF, postscript), the function of wrapping text should be a concern of the rendering of the output, not the origin.
Am I missing something here? Is there any other reason to manually wrap text besides the git CLI's handling of it as a viewer?
Linus Torvalds answered exactly this question [0]. Not that that means you should unblinkingly take it on authority, but the original reasoning is: the renderer doesn't alway know when a line should be wrapped. Examples: a stack trace, or long log line, or essentially any other quoted artifact that has a specific pre-determined format.
The relevant quote from the link:
Some things should not be word-wrapped. They may be some kind of
quoted text - long compiler error messages, oops reports, whatever.
Things that have a certain specific format.
The tool displaying the thing can't know. The person writing the
commit message can. End result: you'd better do word-wrapping at
commit time, because that's the only time you know the difference.
I understand this rationale, but I think most developers will encounter commit messages written by bozos who don't press enter after every 72 characters, far more frequently than commit messages that contain stack traces or other fixed-format artifacts. (Disclosure: I am one of these bozos.) The tool flubs every non-wrapped paragraph just so it can preserve the occasional blob of ASCII art.
If the tool applied reasonable wrapping heuristics and got it wrong once in a while, it could easily offer a `--no-wrap` option to let users see the message exactly as it was composed.
Sounds like your markdown interpreter has an issue, or you're leaving lots of white space at the end of your lines.
Generally, in markdown, if you insert a line break, it won't translate to an explicit line break unless you put two in a row, or if there is 2+ spaces at the end of the line.
First off, the commit message is plain text (by design) and can't be "wrapped" automatically, and any tool that tried would be insane.
The reason for 72 characters is that the CLI, like lots of other presentation mechanisms (including quoting in other commits or in code), wants to indent your message for readability. And the uniform standard width for terminals has been 80 characters for like four decades now.
Must it be? I dunno. I can imagine a uniform agreement among a broad team that everyone will assume a 100 character line and all tools should enforce that. Maybe a little more, but not that much because even on a modern screen you want to have two full terminals of text readable at a time.
But that's just a number. You'd still be told by your commit message style guide (or checkpatch.pl, or whatever) to wrap your lines manually at 92 characters. Is 25% more bytes on a line really worth yelling about?
I really like that Gerrit code reviews allow reviewers to comment on the commit message in the same way as on the code. The way to ensure useful commit message practices is the review process, if you ask me.
We agree on a short list of leading active verbs:
Add = Create a capability e.g. feature, test, dependency.
Cut = Remove a capability e.g. feature, test, dependency.
Fix = Fix an issue e.g. bug, typo, accident, misstatement.
Bump = Increase the version of something e.g. dependency.
Make = Change the build process, or tooling, or infra.
Start = Begin doing something; e.g. create a feature flag.
Stop = End doing something; e.g. remove a feature flag.
Refactor = A code change that MUST be just a refactoring.
Reformat = Refactor of formatting, e.g. omit whitespace.
Optimize = Refactor of performance, e.g. speed up code.
Document = Refactor of documentation, e.g. help files.
https://github.com/joelparkerhenderson/git_commit_message
I'm wondering if the same general idea is applicable to other types of commits given your list. For example, if you are regularly adding features and a certain part of the code base is touched, perhaps with a lower ratio of "refactor" commits, that code could be a solid candidate for refactoring.
Here's the tool i mentioned anyway https://google-engtools.blogspot.co.uk/2011/12/bug-predictio...
Start = Begin doing something; e.g. create a feature flag.
Stop = End doing something; e.g. remove a feature flag.
Could someone explain these and give a few examples?
The idea is that instead of making a version control branch to change feature A to feature B and then merging back into the mainline of development, you build an abstraction over the thing you want to change, build a new implementation of that abstraction, switch out the two and then (if you like) remove the abstraction, all within the main line of development.
So instead of history that looks like this:
Your history looks like this: But with any completely arbitrary commits interspersed between those commits, as none of them break other code. The first one creates an interface, the second one reimplements the interface, the third switches the used implementation and the optional fourth removes the abstraction and deletes the old implementation.Using feature flags can increase a team's productivity by encouraging multiple commits a day, every day. It can also make rollbacks faster.
Instead of 'JAT-1241: app/index.js(opt): Optimised the index', 'Optimised the index' should be fine. Tools can understand that, and can already work out which files changed.
Some of the active verbs are also commands that automatically close/reference issues right out of the box on GitHub & BitBucket (& I'm sure on GitLab too)
Might try and get my team into using this.
At the moment our teams commit messages are a mangled mess of everyone's own 'commit language'. It can be really tricky to quickly scan over commit logs and get a feel of where development has been heading over the last x weeks.
The discussion that needs to happen before this is to understand what tools you want to make available to your developers in the future. Using git's history as a first class debugging tool is powerful, but it's by no means mandatory to provide. There's also a real cost to providing each of the tools.
- Do you want bisect to be available? Well, then you should have most commits represent a fully-functional version of the software. Consider squashing branches when you merge them.
- Do you want narrative documentation around strange choices? Fine-grained commits are a great place to put those thoughts, but they may discourage devs from writing those thoughts in inline comments.
- Do you want ownership via git blame? Line-by-line changes may help you identify who wrote the code, but that might prevent your developers from ever fully transferring ownership, which could create bottlenecks in startups that have a few long-tenure devs and a lot of recently hired devs.
I really like to think of git history as a context tool, like monitoring or unit testing or documentation. It's worthwhile to sit down with your team, define what you want them to be able to do with commit history, and build your commit style from there.
With git, it became very common to structure changes to a code base in many, very small commits. Rename a variable? Commit. Write some docs? Commit. Of course, the overall changes when developing a feature did not become smaller, they are now just distributed over many more commits. So I'd argue that a SVN commit was often conceptionally closer to what we now have with a git pull-request.
Why does this matter? Because It is kind of hard and not helping anyone if you describe your renaming of a local variable with an extensive docstring.
What I do miss however, is a good description of the overall change. I.e. now often the description in the merge commit is just the autogenerated message, but this is where I would like people to really take the time and describe the change extensively. This is why I like `--squash` merges, because they let people focus on the relevant parts in their description. I know, rewriting history is bad, but overall, I favour reading a history book than 18th century newspapers.
[X] not saying that there weren't small one-line-change commits, but overall they were rarer.
http://lkml.iu.edu/hypermail/linux/kernel/1702.2/03492.html
I've found that for smaller commits, if you have something long you want to explain in the commit message body... you should probably put it in a code comment!
If you don't think it merits a code comment, it's probably not important enough for people to look up the commit message body either (if only because the commit message body is less likely to be seen).
https://github.com/ribasushi/dbix-class/commit/1cf609901
Something like that I take it? :)
But I do not see a problem with rewriting history on a branch, if (and only if) you kind of know that no one else is pulling the changes. Or, when merging a PR, a rewrite is okay too, if the next feature will be branched off of the trunk, too.
Also, mercurial's tooling seems to help https://www.mercurial-scm.org/wiki/ChangesetEvolution with rewritten history by making it easier to track history rewrites. Basically I think this is a path in version control systems worth exploring.
Has it really become so common with git? I don't see such trend around me.
I'm replying to you but this is directed at everybody who advocates squash merge and discourages small commits.
IMO this is a tooling problem, plain and simple. When I am committing to Git, I am using the "write" components of Git which are incredibly powerful. I can commit in as small a chunk as I want and preserve the richest history of all the small changes I've made, knowing full well that the state of the code at HEAD will not be degraded for doing so. If I make two small independent changes, I can feel free to branch them separately and then merge them together to show that they could have been performed in any order.
When you read my history, you are using the "read" components of Git. Unfortunately these are not as powerful. You can do some nice things, like if you want to treat history as a straight line you can use `git log --first-parent` and you'll see only the merge commits (as if all merges had been squash-rebases).
It would be much better if you were able to collapse or expand any sequence of linear commits to gloss over the lower level details. But as far as I'm concerned, this is a problem with the "read" components of Git, not the "write" components, and so I will continue to use the "write" components to their full power. And the best part is that if I do it this way, we can improve the "read" components and allow the reader to collapse my verbose history, but we will never be able to expand pre-collapsed history.
I find the PR mechanism works great for the view of the whole, whereas the individual commits are great for the pieces. So in my commit history, you can read the timeline, and then if you want to see the commits squashed down, you click on the individual PR. On the PR screen (assuming you're using GitHub), it has a nice list of the subject lines of each of the individual commits.
E.g. https://github.com/ibgib/ibgib/pull/180
The more I think about it, the stranger a strong aversion to rewriting commit history for clarity is. In university if I did some math / physics calculation, I would often start, and once I got somewhere, make a clean copy of the successful work to have a concise and revised version.
This was often the source of merge hell. Half of what makes git merges easier is the smaller commits that it encourages.
type(scope) message
e.g.
feat(button) added play button
Types are:
- feat: A new feature
- fix: A bug fix
- docs: Documentation only changes
- style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
- refactor: A code change that neither fixes a bug nor adds a feature
- perf: A code change that improves performance
- test: Adding missing or correcting existing tests
- chore: Changes to the build process or auxiliary tools and libraries such as documentation generation
We use Add, Fix, Refactor, Reformat, Optimize, etc.
See my comment on this thread or https://github.com/joelparkerhenderson/git_commit_message
However, in the linked article no visual separation between type and message (and no scope) is something I consider less useful.
In the beginning the definition of "scope" is a bit wonky per project. However, once it solidifies you can easily start going through your log looking for "feat(endpoint)" to find new routes that have been added to an API for example.
One thing I noticed is that it's increased my confidence in my commits; at the moment that I go to write the commit message because I'm describing why I made the decisions I did it breaks logical inconsistencies between what I've actually done and what I think I've done. If I'm able to explain all the change I'm much more confident that it's correct.
Another is that bad commit messages have trained people not to read them. Often people will ask why I've made a change and then discover that the commit message contains the answer to their question!
> Wrap the body at 72 characters
Why? Because the git CLI doesn't wrap properly? To borrow a quote, that seems like a 'you' problem, not a me problem.
Maybe I'm just biased because these days I almost entirely interact with git through a GUI (either desktop client or web interface), and though I use the CLI occasionally (mostly for branch management, sometimes for quick commits) I can't think of the last time I used it for any type of history viewing -- pretty much any GUI is going to do a better job of that.
My team often uses markdown (mainly bulleted lists) and the output looks terrible when you insert manual line breaks (because markdown interprets that as meaning that you explicitly want a line break there) and you're viewing it on a screen/viewport that is either larger or smaller than 72 characters wide.
Unless you're explicitly using a publishing format (eg, LaTeX, PDF, postscript), the function of wrapping text should be a concern of the rendering of the output, not the origin.
Am I missing something here? Is there any other reason to manually wrap text besides the git CLI's handling of it as a viewer?
The relevant quote from the link:
[0] https://github.com/torvalds/linux/pull/17#issuecomment-56611...EDIT: small clarification and formatting
If the tool applied reasonable wrapping heuristics and got it wrong once in a while, it could easily offer a `--no-wrap` option to let users see the message exactly as it was composed.
Generally, in markdown, if you insert a line break, it won't translate to an explicit line break unless you put two in a row, or if there is 2+ spaces at the end of the line.
See [1] for an illustration
[1](https://johnmacfarlane.net/babelmark2/?text=This+line%0Ashou...)
First off, the commit message is plain text (by design) and can't be "wrapped" automatically, and any tool that tried would be insane.
The reason for 72 characters is that the CLI, like lots of other presentation mechanisms (including quoting in other commits or in code), wants to indent your message for readability. And the uniform standard width for terminals has been 80 characters for like four decades now.
Must it be? I dunno. I can imagine a uniform agreement among a broad team that everyone will assume a 100 character line and all tools should enforce that. Maybe a little more, but not that much because even on a modern screen you want to have two full terminals of text readable at a time.
But that's just a number. You'd still be told by your commit message style guide (or checkpatch.pl, or whatever) to wrap your lines manually at 92 characters. Is 25% more bytes on a line really worth yelling about?
This text box I'm replying to you with is a plain text textbox. It word-wraps just fine.
There's myriads of plain text inputs and outputs you encounter every day and they all word wrap just fine.
The terminal is an aberration in that regard.
https://github.com/torvalds/subsurface-for-dirk/blob/0f58510...
I always send this to people as I teach them git.