I have to admit that I learned a lot of these things fairly recently. The large repository stuff has been added into core piece by piece by Microsoft and GitHub over the last few years, it's hard to actually find one place that describes everything they've done. Hope it's helpful.
I've also had some fun conversations with the Mercurial guys about this. They've recently started writing some Hg internals in Rust and are getting some amazing speed improvements.
I'm also thinking of doing a third edition of Pro Git, so if there are other things like this that you have learned about Git the hard way, or just want to know, let me know so I can try to include it.
"git fza" shows a list of modified/new files in an fzf window, and you can select each file with tab plus arrow keys. When you hit enter, those files are fed into "git add". Needs fzf: https://github.com/junegunn/fzf
"git gone" removes local branches that don't exist on the remote.
"git root" prints out the root of the repo. You can alias it to "cd $(git root)", and zip back to the repo root from a deep directory structure. This one is less useful now for me since I started using zoxide to jump around. https://github.com/ajeetdsouza/zoxide
This one I'm less sure about. I haven't yet gotten it to the point where I really like using it, but I'm sharing since someone might find it useful as a starting point:
It's intended to be used for creating a cherry-picking branch. You give it an branch name, let's say "node", and it creates a branch with that as its parent, and the short commit hash as a suffix. So running "git brancherry node" creates the branch "node-abc1234" and switches to it.
The intended workflow being you cherry pick into that branch, create a PR, which then gets merged into the parent.
meta-tip: you can also put your aliases that start with '!' into stand-alone shell scripts named `git-fza` (e.g.) and then call it as `git fza` which will search your PATH for `git-fza` and invoke it as if it's built-in.
I do this for some of my more complicated aliases because I generally think it's poor form to embed shell scripts into configuration languages. (Looking at you, yaml.)
In the part about whitespace diffs, you might want to mention ignore-revs-file [0]. We check an ignore-revs file into the repo, and anyone who does a significant reformat adds that SHA to the file to avoid breaking git-blame.
One thing about git I learned the hard way is the use of diffs and patches (more accurately, 3-way merges) for operations like merging, cherry picking and rebasing. Pro-git (correctly) emphasizes the snapshot storage model of git - it helps a lot in understanding many of its operations and quirks. But the snapshot model can cause confusion in the case of the aforementioned operations - especially rebasing.
For example, I couldn't understand why the deletion/dropping of a commit during a rebase caused changes to all subsequent commits. After all, I only asked for a snapshot to be dropped. I didn't ask for the subsequent snapshots to be modified.
Eventually, I figured out that it was operating on diffs, not snapshots (though storage was still exclusively based on snapshots). The correction on that mental model allowed me to finally understand rebasing. (I did learn later that they were 3-way merges, but that didn't affect the conclusions).
That assumption was eventually corroborated somewhere in Pro-Git or the man pages. But I couldn't find those lines again when I searched it a second time. I feel that these operations can be better understood if the diff/patch nature of those operations are emphasized a bit more. My experience on training people in rebasing also supports this.
PS: Thanks for the book! It's a fantastic example of what software documentation should look like.
I guess while it's true the storage layer is snapshot based, as you say, that only gets you so far conceptually, and it's probably best to focus on the _operation_ you're doing, as rebase, cherry-pick, apply-patch, etc are easier to think in terms of diffs.
When I used to use Phabricator, the fact that I could always fall back to handing it a raw patch file to submit changes also made it easier to reason about (regardless of what the server and client were actually doing).
What I'd stress out is that rebasing is nothing else than automated cherry-picking, as it's hard to imagine cherry-picking in any other way than 3-way merge or patch operation.
> Eventually, I figured out that it was operating on diffs, not snapshots
The snapshot include all the history that led to the current snapshot. So even if you did a squash instead of dropping, you're changing everything that depends on that
I met you and we chatted for a bit at a bar after hours at a tech conference years ago, before you dropped you were a GitHub co-founder towards the end. You actually gave me some advice that has worked out well for me. Just wanted to say thanks!
One question that I have is what is happening to large file support within Git? Has that been merged into the core since Microsoft changes have also made it into core. Obviously there is a difference in supporting very many small files or a few very large files but won't it make sense to roll LFS into core as well?
What a great question. If I recall correctly, the LFS project is a Go project, which makes it difficult to integrate with Git core. However, I believe that the Git for Windows binary _does_ include LFS out of the box.
There was a discussion very recently about incorporating Rust into the Git core project that I think had a point about LFS then being viable due for some reason, but I'd have to find the thread.
I remember watching your FOSDEM talk on YouTube, where you asked whether people have rerere turned on _and_ know what it is, in one question. I have it on, but only the faintest of clues what it is! Just git things, I suppose.
Hey little feedback on the terminal images in your posts. I'm viewing this on a phone, and it would be better if the terminal images were just the terminal (some are) and not surrounded by a large blank space which is your wallpaper. This would make it a bit easier to read on small screens, without the need to zoom in!
First off, I loved your presentation. And your book. As someone who actually bothers to read most of github's "Highlights from Git" blogs, that the, I was somewhat familiar with some of them, but it was still very informative.
Also liked your side-swipe at people who prefer rebase over merge, I'm a merge-only guy myself...
I also took a look at GitButler and it looks like it could potentially solve one of my pain points.
If you're looking for things which are confusing to beginners, for a future version of your book, there are many useful / interesting / sometimes entertaining git discussions/rants here on HN. One of the recent ones is:
I watched the FOSDEM talk yesterday, and I laughed hard when I heard "Who use git blame -L? Does anybody know what that does?" because it suddenly looked like the beginning of a git wat session. But it was really informative, I learned a lot of new things! Thanks
A part of Git's complexity is due to the fact that it was originally meant to be just the plumbing. It was expected that more user-friendly porcelain would be written on top of the git data model. Perhaps that is still the best bet at having a simple and consistent UI. Jujutsu and Got (game of trees) are possible examples.
It's a collection of hacky tools for manipulating a DAG of objects, identified by a SHA-1 hash. If you look at it this way, you wouldn't expect any consistency in the CLI interface.
this describes all of unix. as soon as scripts were allowed to use commands, those commands could never be changed. lest we have a nerd riot on our hands
I understand your sentiment but git is really not all that hard. And knowing a few things that go beyond bog-standard checkout/commit/push, especially history-rewriting activities, will greatly improve quality of commit-history - which might not be of much use for you but might help other engineers working on your project to make easier sense of what's going on.
And on another note, git is probably one of the longer-lasting constants in our industry. Technologies develop and change all the time, but for git, it looks like it's here to stay for a while, and it's probably one of the tools we interact with most in day-to-day dev-work. Might be worth having a bit of a look at :)
Isn’t that where most interest starts? A computer really is a tool. I know for me, it was an unfortunate discovery at the very start of my interest in computing that to do the things I wanted I had to deal with all these tedious bits of programming.
Even today I’d like to skip most of the underlying tedious bits although I understand knowledge and willingness to deal with much of those underlying tedious bits are what keep money flowing into my account regularly. That’s about the only saving grace of it. There are so many ideas I’d love to explore but the unfortunate fact is there’s a lot of work to develop or even glue together what one needs to test out, not to mention associated infrastructure costs these days. Even useful prototypes take quite an endeavor.
My feeling is that the git interface is a leaky abstraction. I also don't want to learn git tricks, but unfortunately I learned more about it than I wanted to.
> do not want to learn git tricks. I just wanna use it as simple as possible.
Simplicity is in the eye of the beholder. A single trick can save you a whole lot of work. Take for example interactive rebate which allows you to update your local branches to merge and reorder local commits. If you had to do everything by hand you would certainly have to work a lot more.
I had the same experience for a long time and then I took a bit of time to have a deeper look behind the curtain and I have to say, once you grasp the data-model of git itself (a branch is a pointer to a commit, a commit is a pointer with metadata to a tree, a tree is...), many of the commands start to make sense all of a sudden, or at the very least "stop looking dangerous".
As it's one of those rare tools that's probably meant to stay for quite some time and we interact with quite frequently, it was time well spent for me, and it turns out it's really not as hard as the scary-looking commands imply.
As long as you remember that the reflog exists (and it hasn’t run gc, but usually you immediately know when you’ve messed up), you’ll be fine. It’s exceedingly hard to break your repo beyond repair without trying to do so.
It's unfortunate that the weight of ecosystem and tooling (and the 800 point Microsoft-owned GitHub gorilla) has effectively locked the profession into using git. I don't hate it, I'm just keenly aware that a better approach is possible.
I wish someone with deep pockets would hook the pijul team up with the money and talent they need to make pijul a full-featured alternative with first-class hosting tools. The way it models change is principled and based on solid theory, and I'm convinced that a markedly better tool than git could be built on that foundation.
Totally agree. However, then coworkers who don't understand even the simple git commands mess up their branches (somehow), and... then my git tricks save the day (unfortunately).
I don't totally disagree. I love Git and I find all these things very cool, but I know it's overhead a lot of people don't want. The post is on the blog of the new GUI that I'm trying to build to make the cool things that Git can do much faster and more straightforward, so maybe check it out if the CLI isn't your favorite thing.
Beyond a junior engineer, I’d expect an engineer to know more than the basics if they’ve been using git for their entire career so far.
Git is the power saw for software engineers. You don’t want someone who can’t keep all their fingers and toes anywhere near your code.
Not knowing git, when you’ve been interacting with it for years, is a red flag for me. I’m not expecting people to know the difference between rebase and rebase --onto, but they should at least know about the reflog and how to unfuck themselves.
Learnt something new about core.fsmonitor. Thanks.
On the subject of large monorepos, I wish "git clone" has a resume option.
I had this issue back in 2000s when trying to clone the kernel repo on a low bandwidth connection. I was able to get the source only after asking for help on a list and someone was kind enough to host the entire repo as a compressed tar on their personal site.
I still have this problem occassionally while trying to clone a large repo on corporate vpn that can disconnect momentarily for any reason(mainly ISP level). Imagine trying to clone the windows repo(300GB) and then losing the wifi connection for a short time after downloading 95%.
It is wild that both git and docker, the two major bandwidth intensive software of modern development stack don't have proper support (afaik) to resume their downloads.
I suppose you could do this by shallow cloning and then expanding it multiple times. But yes, the fetch/push protocols really expect smaller repos or really good inet connections and servers.
I read (and upvote) anything git related by Scott Chacon. He was instrumental in me forming my initial understanding of the git model/flow more than 10 years ago, and I continue to understand things better by consuming the content he puts out. Thanks Scott!
Id like to see anyone else solve the challenge of many people contributing code towards different releases, different features, hotfixes, tagging releases, going back to find bugs, with an "easier" interface.
It's like people who want a low level language that hides all complexity of the system - they are literally exclusive to each other. Im happy with git, its not that hard to learn, and some people need to just grow some (metaphorical) balls and learn git.
That's why I'm a huge shill for gitkraken. It's a paid product so I'm a little hesitant sometimes but I've used them all and nothing compares to the power it unleashes. It completely lifts the curtain on the black box that many developers experience in the terminal and puts the graph front and center. It exposes rebasing operations in an effortless and intuitive visual way that makes git fun. As a result, I feel really proficient and I'm not scared of git at all. I can fix just about anything and paint the picture I want to see by carefully composing commits rather than being at the mercy of the CLI. I still see CLI proficiency as a valuable skill but it's so painful sometimes to watch seasoned 10 yr developers try to solve the most basic problems or completely wreck the history in a project because they're taught you can't be a real engineer if you don't use the git CLI exclusively. Lately I've resorted to arguing "use the CLI but you should at least be looking at the graph in another window throughout the day - which you can do for free in vs code, jetbrains, or even the CLI"
For example: anytime one of my teammates merges a pr, I see it and I rebase my branch right away. As a result my branch is always up to date and based on main so I never run in to merge hell or drop those awful "fix conflicts" commits in the history.
I never really understood why the majority of developers insist on using the git CLI, when modern UI clients like GitKraken [0] are perfectly usable and very helpful. :shrug:
Knowledge of the CLI transfers to writing scripts. Knowledge learned from using Git in scripts transfers to day-to-day use.
Also, if I SSH into my Raspberry Pi that I'm using as server, I don't want to feel useless just because I'm forced to use a CLI.
I'm not entirely against using a GUI, it's just that at this point I'm more efficient using the CLI, and I don't want to spend effort searching for a GUI that is:
* Good-looking.
* Is native, not an outdated vendored copy of a web browser.
* Doesn't have telemetry, or at least it's disabled by default.
* Is fully open source; not open-core or proprietary.
* I can reasonably expect that it won't disappear 5 years in the future.
* That it doesn't make things more confusing. Like for example Visual Studio[1] having a button that very ominously says "Accept merge", when it really means "Mark conflict as resolved". If an IDE wants to use its own cute way of labelling things, good; I can accept more friendly terms that make it more approachable for a wider audience. But at least don't make things confusing for people that already expect certain words to mean certain things.
* That I can trust that it won't "helpfully" do fancy stuff, like having a button saying "Commit changes" that "helpfully" also pushes to remote. I don't know if any GUI does this, but my trust is low.
[1]: I had to use it at a previous company because it was the only realistic way to work with their codebase.
Shrug back at ya. I find the cli perfectly usable. I use an editor plugin as well but if I'm already on the command line I use it there. Having to switch to a different program just to make a commit kills the desire to commit often.
I do as little with git as possible unless im facinng some very specific issues, so for me at least it seems overkill to use a GUI for essentially just push, pull, and checkout.
Looks neat, but I tend to get way too distracted by graphical interfaces. I assume it's really a question of personal preference. CLIs are faster to use, but have a bigger learning curve. (we will probably not solve that debate here, but I do wonder sometimes, whether to recommend the CLI or not)
Most of my git usage on the CLI is nothing fancy, just a few commands, but I keep a text file for some tips/tricks I don't use regularly.
Thanks, I knew about -committerdate but not that you can set it as default sort, super useful. A few notes...
1. git columns gets real confusing if you have more data than fits the screen and you need to scroll. Numbers would help...
2. git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
3. Regarding git blame a little known but super useful script is https://github.com/gnddev/git-blameall . (I mean, it's so little known I myself needed to port it to Python 3 and I am no Python developer by any stretch.)
> git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
“gc” is disabled for the scheduled maintenance. It’s enabled as task when running “maintenance run” explicitely.
It would not collect loose objects faster than gc would, because it just runs gc.
Is there a way to do the reverse sort? A simple solution to branches going off screen seems to be have the latest branch last. That's what I do (although with a custom script).
I have to admit that I learned a lot of these things fairly recently. The large repository stuff has been added into core piece by piece by Microsoft and GitHub over the last few years, it's hard to actually find one place that describes everything they've done. Hope it's helpful.
I've also had some fun conversations with the Mercurial guys about this. They've recently started writing some Hg internals in Rust and are getting some amazing speed improvements.
I'm also thinking of doing a third edition of Pro Git, so if there are other things like this that you have learned about Git the hard way, or just want to know, let me know so I can try to include it.
"git fza" shows a list of modified/new files in an fzf window, and you can select each file with tab plus arrow keys. When you hit enter, those files are fed into "git add". Needs fzf: https://github.com/junegunn/fzf
"git gone" removes local branches that don't exist on the remote.
"git root" prints out the root of the repo. You can alias it to "cd $(git root)", and zip back to the repo root from a deep directory structure. This one is less useful now for me since I started using zoxide to jump around. https://github.com/ajeetdsouza/zoxide
However, I've been using this git pager/difftool: https://github.com/dandavison/delta
While it's not structural like difft, it does produce more readable output for me (at least when scrolling fast through git log -p /scanning quickly)
The intended workflow being you cherry pick into that branch, create a PR, which then gets merged into the parent.
I've been wanting something like this for years...
I do this for some of my more complicated aliases because I generally think it's poor form to embed shell scripts into configuration languages. (Looking at you, yaml.)
[0] https://git-scm.com/docs/git-blame#Documentation/git-blame.t...
For example, I couldn't understand why the deletion/dropping of a commit during a rebase caused changes to all subsequent commits. After all, I only asked for a snapshot to be dropped. I didn't ask for the subsequent snapshots to be modified.
Eventually, I figured out that it was operating on diffs, not snapshots (though storage was still exclusively based on snapshots). The correction on that mental model allowed me to finally understand rebasing. (I did learn later that they were 3-way merges, but that didn't affect the conclusions).
That assumption was eventually corroborated somewhere in Pro-Git or the man pages. But I couldn't find those lines again when I searched it a second time. I feel that these operations can be better understood if the diff/patch nature of those operations are emphasized a bit more. My experience on training people in rebasing also supports this.
PS: Thanks for the book! It's a fantastic example of what software documentation should look like.
https://jvns.ca/blog/2024/01/05/do-we-think-of-git-commits-a...
I guess while it's true the storage layer is snapshot based, as you say, that only gets you so far conceptually, and it's probably best to focus on the _operation_ you're doing, as rebase, cherry-pick, apply-patch, etc are easier to think in terms of diffs.
When I used to use Phabricator, the fact that I could always fall back to handing it a raw patch file to submit changes also made it easier to reason about (regardless of what the server and client were actually doing).
The snapshot include all the history that led to the current snapshot. So even if you did a squash instead of dropping, you're changing everything that depends on that
I met you and we chatted for a bit at a bar after hours at a tech conference years ago, before you dropped you were a GitHub co-founder towards the end. You actually gave me some advice that has worked out well for me. Just wanted to say thanks!
One question that I have is what is happening to large file support within Git? Has that been merged into the core since Microsoft changes have also made it into core. Obviously there is a difference in supporting very many small files or a few very large files but won't it make sense to roll LFS into core as well?
There was a discussion very recently about incorporating Rust into the Git core project that I think had a point about LFS then being viable due for some reason, but I'd have to find the thread.
https://xkcd.com/1597/
The part about `--force-with-leash` could include `--force-if-includes` as well. `--force-with-leash` doesn’t do much if you fetch often.
https://stackoverflow.com/questions/65837109/when-should-i-u...
https://youtu.be/aolI_Rz0ZqY?t=910
First off, I loved your presentation. And your book. As someone who actually bothers to read most of github's "Highlights from Git" blogs, that the, I was somewhat familiar with some of them, but it was still very informative.
Also liked your side-swipe at people who prefer rebase over merge, I'm a merge-only guy myself...
I also took a look at GitButler and it looks like it could potentially solve one of my pain points.
If you're looking for things which are confusing to beginners, for a future version of your book, there are many useful / interesting / sometimes entertaining git discussions/rants here on HN. One of the recent ones is:
https://news.ycombinator.com/item?id=38112951
Thanks for the info.
most of the time trying the main URL + /rss works
also the tag is there <link rel="alternate" type="application/rss+xml" title="GitButler" href="https://blog.gitbutler.com/rss/">
https://mergebase.com/blog/doing-git-pull-wrong/
TLDR: don’t be afraid of rewriting history but ALWAYS do “git pull -r —autosquash “
Kudos to all who love git, for me, it's just a tool I have to use.
this describes all of unix. as soon as scripts were allowed to use commands, those commands could never be changed. lest we have a nerd riot on our hands
And on another note, git is probably one of the longer-lasting constants in our industry. Technologies develop and change all the time, but for git, it looks like it's here to stay for a while, and it's probably one of the tools we interact with most in day-to-day dev-work. Might be worth having a bit of a look at :)
Deleted Comment
So I'm happy for the 'complexity' of git.
Kudos to all who love programming, for me, it's just a tool I have to use.
Even today I’d like to skip most of the underlying tedious bits although I understand knowledge and willingness to deal with much of those underlying tedious bits are what keep money flowing into my account regularly. That’s about the only saving grace of it. There are so many ideas I’d love to explore but the unfortunate fact is there’s a lot of work to develop or even glue together what one needs to test out, not to mention associated infrastructure costs these days. Even useful prototypes take quite an endeavor.
Simplicity is in the eye of the beholder. A single trick can save you a whole lot of work. Take for example interactive rebate which allows you to update your local branches to merge and reorder local commits. If you had to do everything by hand you would certainly have to work a lot more.
Hopefully the incantation is on the Cheat Sheet and I don't make it worse.
As it's one of those rare tools that's probably meant to stay for quite some time and we interact with quite frequently, it was time well spent for me, and it turns out it's really not as hard as the scary-looking commands imply.
I wish someone with deep pockets would hook the pijul team up with the money and talent they need to make pijul a full-featured alternative with first-class hosting tools. The way it models change is principled and based on solid theory, and I'm convinced that a markedly better tool than git could be built on that foundation.
Deleted Comment
This isn't a rhetorical question.
Deleted Comment
Git is the power saw for software engineers. You don’t want someone who can’t keep all their fingers and toes anywhere near your code.
Not knowing git, when you’ve been interacting with it for years, is a red flag for me. I’m not expecting people to know the difference between rebase and rebase --onto, but they should at least know about the reflog and how to unfuck themselves.
On the subject of large monorepos, I wish "git clone" has a resume option.
I had this issue back in 2000s when trying to clone the kernel repo on a low bandwidth connection. I was able to get the source only after asking for help on a list and someone was kind enough to host the entire repo as a compressed tar on their personal site.
I still have this problem occassionally while trying to clone a large repo on corporate vpn that can disconnect momentarily for any reason(mainly ISP level). Imagine trying to clone the windows repo(300GB) and then losing the wifi connection for a short time after downloading 95%.
It is wild that both git and docker, the two major bandwidth intensive software of modern development stack don't have proper support (afaik) to resume their downloads.
† https://git-scm.com/docs/git-bundle
Id like to see anyone else solve the challenge of many people contributing code towards different releases, different features, hotfixes, tagging releases, going back to find bugs, with an "easier" interface.
It's like people who want a low level language that hides all complexity of the system - they are literally exclusive to each other. Im happy with git, its not that hard to learn, and some people need to just grow some (metaphorical) balls and learn git.
Great tips, thank you!
Some great extra git command are there.
https://github.com/tummychow/git-absorb
https://leahneukirchen.org/dotfiles/bin/git-attic
It lists files that were deleted and which commit deleted them.
[0] https://www.gitkraken.com/
Also, if I SSH into my Raspberry Pi that I'm using as server, I don't want to feel useless just because I'm forced to use a CLI.
I'm not entirely against using a GUI, it's just that at this point I'm more efficient using the CLI, and I don't want to spend effort searching for a GUI that is:
* Good-looking.
* Is native, not an outdated vendored copy of a web browser.
* Doesn't have telemetry, or at least it's disabled by default.
* Is fully open source; not open-core or proprietary.
* I can reasonably expect that it won't disappear 5 years in the future.
* That it doesn't make things more confusing. Like for example Visual Studio[1] having a button that very ominously says "Accept merge", when it really means "Mark conflict as resolved". If an IDE wants to use its own cute way of labelling things, good; I can accept more friendly terms that make it more approachable for a wider audience. But at least don't make things confusing for people that already expect certain words to mean certain things.
* That I can trust that it won't "helpfully" do fancy stuff, like having a button saying "Commit changes" that "helpfully" also pushes to remote. I don't know if any GUI does this, but my trust is low.
[1]: I had to use it at a previous company because it was the only realistic way to work with their codebase.
Most of my git usage on the CLI is nothing fancy, just a few commands, but I keep a text file for some tips/tricks I don't use regularly.
1. git columns gets real confusing if you have more data than fits the screen and you need to scroll. Numbers would help...
2. git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
3. Regarding git blame a little known but super useful script is https://github.com/gnddev/git-blameall . (I mean, it's so little known I myself needed to port it to Python 3 and I am no Python developer by any stretch.)
“gc” is disabled for the scheduled maintenance. It’s enabled as task when running “maintenance run” explicitely.
It would not collect loose objects faster than gc would, because it just runs gc.
3. Nice, I may have to try this