I've long been in the practice of "commit early, commit often". If one use case works I commit, if the unit tests pass I commit. The code may be a mess, the variables may have names like 'foo' and 'bar' but I commit to have a last known good state. If I start mass refactoring and break the unit tests, I can revert everything and start over.
I also push often because I'm forever aware disks can fail. I'm not leaving a day's worth of work on my local drive and hoping it's there the next morning.
I've become increasingly aware that my coworkers have nice clean commit histories. When I look at their PRs, there are 2-4 commits and each is a clean, completely functioning feature. No "fix misspellings and whitespace" comments.
What flow do you follow?
When I'm developing, but before I create a PR, I'll create a bunch of stream-of-consciousness commits. This is stuff like "Fix typo" or "Minor formatting changes" mixed in with actual functional changes.
Right before I create the PR, or push up a shared branch, I do an interactive rebase (git rebase -i).
This allows me to organize my commits. I can squash commits, amend commits, move commits around, rewrite the commit messages, etc.
Eventually I end up with the 2-4 clean commits that your coworkers have. Often I design my commits around "cherry-pick" suitability. The commit might not be able to stand on its own in a PR, but does it represent some reasonably contained portion of the work that could be cherry-picked onto another branch if needed?
Granted, all of the advice above requires you to adhere to a "prefer rebase over merge" workflow, and that has some potential pitfalls, e.g. you need to be aware of the Golden Rule of Rebasing:
https://www.atlassian.com/git/tutorials/merging-vs-rebasing#...
But I vastly prefer this workflow to both "merge only," where you can never get rid of those stream-of-consciousness commits, and "squash everything," where every PR ends up with a single commit, even if it would be more useful to have multiple commits that could be potentially cherry-picked.
(This works for me because auditing commit history is not important where I work, if it were I would organize commits better.)
*glances and $corp git repo and sees 'updates' 'fix' 'updates'. Sigh.
edit: googled, "Never rebase while on a public branch" i.e. a shared branch
I see some people whose projects (Furnace Tracker, PipeWire, previously Famitudio) seem to make progress very quickly without getting noticeably slowed down by technical debt, despite sloppy programming and unorganized commit logs (push-to-head). Meanwhile I move slowly, dread reviewing hundreds of lines of my own code, and produce technical debt (regrets) anyway, not as many surface-level lintable errors but plenty of entrenched mistakes. I wish I could move faster, but instead struggle to make progress.
Only the keybindings are a bit weird if you're not accustomed to Vim bindings:
- Open tig
- Change into the staging view with `s`
- Select your file using the arrow or `j` and `k` keys
- Press Return to show the diff
- Navigate to the line(s) in question with `j` and `k` (arrow keys will switch files)
- Stage parts with `1` (single line), `2` (chunk parts), `u` (chunks) or split chunks with `\`
- "Leave" the diff with `q`
- You can find the keybindings with `h` in the help screen, which also uses Vim keys -- like manpages usually do
1. I `git checkout PUBLIC -b CLEANUP` to a new branch.
2. Do a `git difftool CHANGES`, which opens each changed file in vimdiff one at a time.
3. For each file, I use :diffput/:diffget or just edit in changes I want.
4. Commit these changes on the CLEANUP branch.
5. Use `git difftool CHANGES` again to see the remaining diff.
6. Repeat until the diff comes back empty!
My unstructured changes tend to contain a handful of small typo fixes, white spacing, localized refactors, and 1 or 2 larger refactors and a behavioral change. Once they're all broken out, It's usually easy enough to use `git rebase -i` and reorder the smaller changes first, put out PRs for just those first, etc.
I've tried most Git clients on Mac over the years and kept gravitating back to Sourcetree.
I only tend to use it for this particular workflow (picking out very granular changes on a line-by-line basis). Otherwise, 90% of my git stuff is via IDE integrations or command line.
[0] https://www.sourcetreeapp.com
Edit: looks like it's back from the dead again :) https://github.com/gitx/gitx
Additionally, git itself comes with a simple `git gui` command that allows you to do partial commits on a line by line basis. It also has a nice "amend last commit" mode.
Shameless plug, but here are other efficiency tips I wrote about, for working in a high demanding environment: https://dimtion.fr/blog/average-engineer-tips/
I've seen someone post on HN, apparently seriously, that the history is more important than the source.
I know a (potentially) really good developer that spends his time pulling in the recent patches and reorganizing them to make an alternate history that is prettier somehow.
sure, every once and a while it because useful/necessary to bisect, and a 'clean' history might help with that.
but seriously - why do we fetishize this? this is a medium where the amount of writing vastly outweighs the amount of reading.
when people are looking for a bug do they seriously find value in seeing how the code evolved? or do they just figure out why it doesn't work? is there an implicit assumption that the code all worked at some point and the task is to find out when/how it was broken?
just really confused
Obviously it's possible to go too far; not every commit needs an attached essay. Many of my commits are just "fixed typo" or "added unit test for X", but then sometimes I'll write a short paragraphs or two explaining my rationale, referencing the commits that came before
https://docs.github.com/en/pull-requests/collaborating-with-...
- commit early/commit often. I usually push one commit when I think the feature is done. While others do a review of my code, I commit to improve the code/fix the issues found by others. The advantage here is that future readers looking at the history of file X line N can know what other files were introduced alongside file X (as a reader of big codebases, this is a nice side effect). I don't like hiding defects either from the git history (one could in theory squash all the commits of a given PR in order to keep the "history clean"... In my experience having a trace of bugs fixed at PR time, or other subtle details is also worth it and serves as documentation of what not to do).
In the cases I need to work through many days in a single feature, and only if the feature is so complicated/critical than I cannot reproduce it from scratch by myself again, then yes I push the progress upstream. This is usually not the case though: I stash progress. I tend to open small PR and usually I remember what I've done (so I could write the entire code again easily). Plus, hard drives fail, sure but they are also quite reliable. In 20 years of work I never experienced losing "non critical" work because of disk failure (for critical work, I for sure have a different workflow).
My team (and myself) prefer this workflow:
- One commit per PR. This allows for easy reverts and cherry-pick.
- One developer per branch. You can do a few devs per branch, but rebases need to be coordinated and carefully handled, because:
- No merge commits. Only rebase onto latest main. Which means force-pushing PR branches and, thus, rewriting history (other devs working on same branch need to be aware that history has changed).
If you're constantly rebasing onto main, then all of your working commits sit on top of the latest code in main. Which means you do not have to deal with tricky merge conflicts, where your commits may weave in and out of the main branch at various points in time because you were doing "git merge" at random points. In addition, if you squash your commits before doing a rebase this will also make merge conflicts rather trivial, because you're only dealing with one set of merge conflicts on one commit.
That's the big picture, team workflow. For my personal workflow, I rely on "git add -p" and stashes. The only time I do a commit and push up code is: a) when I have a large change and want to make sure I don't lose it or b) others have already reviewed my PR and I want to keep the new changes separate to make their life easier when reviewing a 2nd time. I use "git reset --soft HEAD~<number-of-commits>" to squash instead of "git rebase -i" because I find it easier and quicker.
I must emphasize this point: learn "git add -p". It's extremely useful in the case where you have some changes like debugging code or random unrelated changes that you do not want to commit. It's a filtering mechanism.
I had never even considered that some teams might have multiple developers active on the same branch.
How does that even scale? I would imagine that in a team of 10, you would be rebasing 90% of your day and only 10% doing actual work?
a big project of mine is about 2500 commits ahead. rebasing this beast is partially automated, but still I get about 2000 upstream changes through once a month. you need scripts to rebase and to rollback for a wrong choice.
it scales trivially.
1.a. a lot of dirty commits/wip commits
1.b. a few of clean commits, when I spot changes that I know they are already a commit by itself
Before opening the PR:
2. `git log -p`: I inspect the commits I've done and I decide what should go together, what should be edited and what can stay as it is
3. `git rebase -i`: I apply the changes I've decided during 2
4. repeat 2 and 3 until I'm happy with the results
5. the last `git rebase -i`: reword almost every commit, as almost all the commits at this point have placeholder descriptions
I'm very happy with this strategy. It requires some time to get used to it but at the end my PRs are very clean and well-thought.
That way, you won't be afraid to lose your recent work by messing something up, because you have the stash, and won't be afraid to lose your whole project/progress because you have a recente backup of it.
For instance, I have a backup script that runs everytime I shutdown my work computer so I won't have to worry if suddenly my hard drive gives up on everything.