Unified versus Split Diff

> For a large change, I don’t want to do a “diff review”, I want to do a proper code review of a codebase at a particular instant in time, paying specific attention to the recently changed areas,

obviously every team and ticket is different, but IMO unless the person doing the review is some sort of principal engineer mostly responsible for the code at large, this does not align with I would personally consider a code review. In my book a general code review is simple sanity check by a second pair of eyes, which can result in suggestions to use different API or use an API slightly differently. If commits in the PR are properly ordered/squashed it is relatively easy to review incremental changes in isolation anyway.

I guess the complaint author has is really about review types. I do not have terminology ready at hand, but there is run of the mill sanity check review and then there is deep, architectural feature premerge review. The author complains about the latter when most of the code reviews in web tools are of the former variety.

dylukes · 2 years ago

> In my book a general code review is simple sanity check by a second pair of eyes, which can result in suggestions to use different API or use an API slightly differently.

This is an impoverished view of code review. Code review is a principal mechanism for reducing individual code ownership, for propagating conventions, and for skill transfer.

A good code review starts with a good PR: one that outlines what its goals were and how it achieved them.

First item on a good review then: does the code achieve what it set out to do per the outline?

Second: does it contain tests that validate the claimed functionality?

Third: does it respect the architectural conventions of the system so far?

Fourth: is the style in line with expectations?

Fifth, and finally: do you have any suggestions as to better API usage?

A code review that is nothing more than a sanity check is useless and could have been done by CI infrastructure. Code review is a human process and should maximally take advantage of the things only humans could do. Leave the rest to machines.

An implicit question in several of the above is "will this set a good example for future contributions?"

crdrost · 2 years ago

I go back and forth on this.

On the one hand, I really like constant deep feedback. I really like the consistency benefits of having another person say “that’s too much, I find that unreadable.”

On the other, I have now been at a lot of places where it was very hard to get my code reviewed, latencies of days and sometimes weeks if folks are in a particularly heinous crunchtime... And then when it does get reviewed, the stuff that I worked really hard on to get the right speed or to properly centralize cache invalidation... Suddenly someone is like “I would have done it from this other approach” and you have no idea whether it's tractable or not.

While I have never been at a place that did this, I have in my head the idea that the code should be an unfolding collective conversation, kind of like when folks are all collaborating on a shared Google Doc, I see that you are editing this section and I throw in a quick comment “don't forget to add XYZ” and then jump to a different part that I won't be stepping on their toes with. So the basic idea would be to get everybody to merge to `main` like 2 or 3 times a day if possible. In that case code review really is just “make sure this doesn't break the build or break prod, everything is behind a feature toggle, if I don't like the design I will comment on the code near the design or sketch a proof of concept showing that my approach is superior in clarity or speed or whatever”... Nobody ever takes me up on this culture shift it seems.

mattarm · 2 years ago

Those are good things to consider in review, but I maintain that the answer might be "no" to one or more of those questions and still be acceptable.

I'm old enough to have worked in the pre-code-review era. Things were fine. People still learned from each other, software could still be great or terrible, etc. It wasn't appreciably worse or better than things are today.

> An implicit question in several of the above is "will this set a good example for future contributions?"

Which in my experience can be an almost circular requirement. What do you consider a good example? As perfect as perfect can be? Rapid development? Extreme pragmatism?

The more experienced I get, the less I complain about in code review, especially when reviewing for a more junior dev, and especially for frequent comitters. People can only get so much out of any single code review, and any single commit can only do so much damage.

Put another way, code review is also about a level of trust. Will the committer be around next week? Are they on the same team as me? If yes, give them some leeway to commit incremental work and make improvements later. Not all incremental work need occur pre-commit. Mention areas for improvement, sure, but don't go overboard as a gatekeeper.

Things are obviously going to be different when reviewing code from what amounts to a stranger on a mission critical piece of code, etc.

phillipcarter · 2 years ago

As a counterpoint, individual code ownership can be a fantastic model too. It lets engineers specialize and takes advantage of social systems that form naturally anyways. I’ve not personally seen group ownership work well, and in practice, it’s still an individual who knows a given area the best.

jeffreygoesto · 2 years ago

Well put. I'd like to add that individual code ownership must not necessary be lived as a loner sitting on that code. Being an owner means hatching, keeping clean and also knowing when getting help is beneficial. Accepting your bias and asking for a look from a different perspective is not getting rid of or dispersing ownership in my view, it is a central part of it.

Often this includes getting feedback if that change can make problems outside the feature implemented or in the future, like introducing dependencies that cost more in total than they help locally.

mannykannot · 2 years ago

This is an ideal (if not utopian) vision of software review (though please let's handle style checking automatically already!) It implicitly requires, however, such a thorough examination of the code that diffs would seem to be an irrelevant distraction.

In practice, however, starting from the changes in a system that was previously working well enough is a very effective way of focusing limited human attention on where the problems are likely to be, optimizing for error detection with limited resources over the broader knowledge-dissemination goals espoused in this approach. If we are going to use diffs, then this brings us back around to the topic if the article.

Personally, I find having any form of diff embedded in what I am trying to understand just makes it harder to follow, so I move the diffs onto a secondary screen and use it as a guide and reference to what has been changed. The author of the article seems to want the same, but the mock-up is somewhat misleading, as it only has substitutions and additions. Where something is removed or refactored to another place, the two views will no longer line up as depicted here.

ozim · 2 years ago

Code style check should be automated. I do not accept discussions on style in PR in my team besides pointing out obvious deviations which should be automated. Discussion on that should be way before making PR.

Architectural changes/discussion should be discussed by developers way before PR on slack or in a call. Most features should not change architecture and team should make effort to align architecture all the time at least before someone makes PR. Unless of course PR is PoC to showcase approach.

friendzis · 2 years ago

As I have said - every team is different :)

In response to your first point I guess things really depend on where you store PR metadata and how ephemeral/permanent it is. Some teams store that information in the ticket, some fill implementation notes with change request, some add that to the PR, some discuss in their standup (or similar) meeting.

Regarding 2-3, you are right, I just lumped them all under umbrella term.

Maybe we could use terms like "PR review" and "code review". The former is a shallow LGTM check, while the latter may involve code checkout, poking around, architectural discussions, pair/team programming and so on. In my book they are entirely different beasts and web tools are geared (not without justification) towards the former, where both types of diff should serve the purpose.

kazinator · 2 years ago

This is why you should insist on a changes being reviewed by at least two pairs of eyes. Someone responsible for that area of the code might raise those kinds of objections. Someone who is more of an outsider can focus on other problems.

E.g. I'm kind of a human compiler; I can say things like, you have undefined behavior here, in a completely unfamiliar code base where the maintainer of that might miss the issue, too busy pointing out that it doesn't respect architectural conventions, and is not tested.

lapcat · 2 years ago

In my book, superficial code review produces only superficial results. This is how so many bugs get shipped despite the ubiquity of code review.

Call me crazy, but I've always approached code review from the standpoint of a tester: I build and run the code, see whether it does what it's supposed to do, look for possible weak points, and try to break it.

The diffs are the beginning rather than the end. They show you where to start poking around in the code. The proof is in the pudding, though.

You might respond, "That's what testers are for, not engineers." But who can test better than someone who understands the code and knows exactly where to look and what to look for?

I know this attitude puts me in the minority. The majority of engineers seem to have an inborn horror at the prospect of, gasp, manually running code. (They're also strangely averse to using debuggers.) They'll do anything in the world and write vast infrastructure to avoid it. Automate all the things! But in my view, automation is just more possibly buggy code. Who tests the tests? Quis custodiet ipsos custodes?

eru · 2 years ago

Automated tests aren't so much to make sure that the test you just wrote works. You are right that manual testing can do that better sometimes.

Automated tests are so that the code you just implemented still works next months, when lots of other people have made unrelated changes, and weren't always aware of the interactions with other parts of the code.

The automation is important, so that the tests get run, even when people forget or are under time pressure.

lvncelot · 2 years ago

I agree about what to expect from a code review - but for me, certain ways of working need to be in place for this to function properly.

A code review is the "last line of defense" (well, disregarding CI), but in the teams I've worked in, that meant that the general idea of what the PR is introducing has been discussed by multiple people at that point. (This could be either a pairing session, an in-depth explanation, or just a coffee chat, depending on the complexity) This way the PR isn't about reviewing the general strategy (splitting off a new module, introducing a huge dependency, reworking the API surface), but just about reviewing the tactics used to implement that strategy.

Without the communication beforehand, doing only sanity checks on parts of the code in isolation does run the risk of fracturing code ownership. ("What's that module doing?" "Beats me, ask bob")

That all notwithstanding, I like the author's idea of what a diff view should look like, regardless of the "mode" of reviewing.

jerf · 2 years ago

I think you have some good truths in your post, but I also think they're rendered somewhat irrelevant by the fact that a "diff review" is, in my repeated and frequent experience, simply incomprehensible. I can't do any review with an incomprehensible mass of red and green lines interspersed with each other. It's not a useful view of the code, or at least not a useful primary view.

I push through and do it anyhow, because this is one of those "they're not paying me to have an endless party" sort of things. But I completely 100% agree with the author that it is not a very useful view.

codemac · 2 years ago

Most of modern technology company culture has strayed too far from doing more formal processes for managing development.

By not having a basic process for defects, features, etc, every engineer has to make up and invent the process for how they will ship their feature through collaboration.

In most cases this ends up with doing little to nothing, with code review being the only form of collaboration on the topic that is in any meaningful detail. So we're left trying to stuff all possible interventions into a single moment.

robertlagrant · 2 years ago

The problem is that this isn't a great way to catch areas of the codebase that should've been updated, but weren't.

apocalyptic0n3 · 2 years ago

> unless the person doing the review is some sort of principal engineer mostly responsible for the code at large [..] In my book a general code review is simple sanity check by a second pair of eyes

I don't necessarily disagree with you, but I don't agree either.

A typical workflow should be:

1. Engineer writes code 2. Engineer does manual and automated testing to verify things work correctly 3. Engineer commits, pushes, and creates a PR/MR 4. CI runs test suite 5. Another Engineer reviews PR/MR 6. Approval causes merge which causes deployment to staging 7. QA 8. UAT 9. Repeat steps 1-8 until everybody is happy

In this, we need to place checks on #1 and #2.

In my eyes, QA and UAT are superficial reviews/sanity checks on #2.

#3 is a check in #1. However, it relies on A) previous engineers implementing good tests and B) the current engineer doing the same. This means trusting people are doing the right things, but a Review implies you don't completely trust that.

#5 is a check on #3 and #1. They're essential. Having a full understanding of what the code looks like is more important in my mind than what was actually changed. The changes don't show how other code is interacting with those changes (which should be, but isn't always, captured by unit tests).

That's why I think the article author's preferred view is ideal. In fact, that's roughly what I work with when reviewing without realizing it (diff in GitLab/GitHub/Sublime Merge, code in JetBrains where most of the review happens).

gumby · 2 years ago

For most changes I want more context than -c provides, because I see that your change is syntactically correct, but semantically, was your change from < to <= appropriate? I need to see the definition of the things you're comparing to.

Often -c is enough, but not always. I don't believe you can make a hard and fast rule. I think all you can really do is have a diff that allows you to 'jump to definition' or 'list callers', just like you can when looking at the code itself.

delaaxe · 2 years ago

I'd call the latter an audit

I'm probably missing something here: author says that the split diff doesn't work for him, but doesn't say why.

His ideal diff is pretty much the same as a split-diff but with redundant context removed (context is only on the left, not on both left and right). What utility does he get out of removing redundant context on the RHS of the pane?

klauserc · 2 years ago

The author writes

> I need to run tests, use goto definition and other editor navigation features, apply local changes to check if some things could have been written differently, look at the wider context to notice things that should have been changed, and in general notice anything that might be not quite right with the codebase, irrespective of the historical path to the current state of the code.

The editors/IDEs I've used usually only offer rudimentary support in diffs (goto defn, but only in the same file; maybe auto-completion, but usually not for newly added items; no refactoring functionality)

lelanthran · 2 years ago

> The editors/IDEs I've used usually only offer rudimentary support in diffs (goto defn, but only in the same file; maybe auto-completion, but usually not for newly added items; no refactoring functionality)

Sure, but (to me it seems that) that doesn't necessitate the display change he is advocating for.

A diff program that has nice code navigation (and/or other IDE features) would be great, but what does that have to do with whether it is displaying the common split-diff, or his take on the split-diff.

pama · 2 years ago

Maybe time to try Emacs? The diff support is great. And magit is native to Emacs.

psd1 · 2 years ago

I'm early in my emacs journey, but used VSCode for years. The Github extension is absolutely amazing for pull requests, since you have all your LSP and regex code navigation goodness. I only used it for a small percentage of reviews, but it made it vastly easier to grok the impact of every change.

CraigJPerry · 2 years ago

One area where vscode+gitlens plugin beats IntelliJ, the plugin adds in fully featured editable RHS to diff view

IshKebab · 2 years ago

The author is the author of the Rust VSCode extension so I'm guessing he is familiar with VSCode.

If you use Gitlens's "compare working tree with..." you get the split view and you can edit the "current" version and all IDE tools work.

friendzis · 2 years ago

LHS shows current code, without any modifications. By showing full context in one view (with code browsing capabilities!) and changes in the other, the author is able to browse the code and see what changes, if any, are applied in a particular context.

In my understanding the author wants to flip the diff around: instead of looking at changes themselves, the author wants to look at code and see if there are any associated changes.

The article is light on detail, but my guess would be that author wants to look at code and browse to implementation/callsite to check if appropriate changes are there.

lelanthran · 2 years ago

> LHS shows current code, without any modifications. By showing full context in one view (with code browsing capabilities!) and changes in the other, the author is able to browse the code and see what changes, if any, are applied in a particular context.

That's the bit I don't understand - how does the "changes in the other" help if displayed only as the unified snippet[1]? To my mind, the magic sauce bit is the full code navigation, and there's no reason that the RHS pane has to be limited to a unified diff when it can simply show the whole file with higlighted lines, the way split diff views do on the RHS.

[1] Perhaps (and I'm only guessing here), that the RHS must show the entire unified diff for the entire changeset, and not just the unified diff for the current file. To me, that makes the proposal an upgrade from "either show unified diff, or split-diff, file-by-file".

exclipy · 2 years ago

I'm not the author, but I can empathise with them.

The diff highlighting can be surprisingly distracting and it can definitely help readability to turn it off. I certainly do turn it off from time to time in my diff viewer so I can see the code with fresh eyes, but I don't have the option to show the unified diff in a split pane.

knubie · 2 years ago

A third (fourth?) option worth mentioning here is difftastic[0], which uses "structural" diffing (as opposed to line diffing) for more granular diff highlighting.

[0] https://github.com/Wilfred/difftastic

mega_dean · 2 years ago

A fourth (fifth?) option worth mentioning is patdiff: https://opensource.janestreet.com/patdiff/ From what I remember, it sometimes (35%) made diffs easier to read, usually (60%) made no difference, and rarely (5%) made them harder to read. I used it a few years ago though, so I don't remember specifically what the problem was. The only reason I stopped using it was because I started using magit for git diffs.

roryokane · 2 years ago

I agree that the patience diff algorithm generally produces better results, but you don’t need the patdiff tool for that. You can configure Git’s own diffs to use patience: https://git-scm.com/docs/git-config#Documentation/git-config...

I see that patdiff also provides word-level diffing. You could instead get that feature with `git diff --word-diff` or Delta (https://github.com/dandavison/delta).

eviks · 2 years ago

This is a great approach indeed, pity it's not integrated widely enough (think it only recently got a proper structured json output other tools could use)

videlov · 2 years ago

Thank you for sharing this project! I have been searching for something like this for some time

grotorea · 2 years ago

I also wonder if it's possible to go beyond this project and have git itself work on the syntax level instead of pure text.

m463 · 2 years ago

ediff in emacs does this. Refine is what highlights the words that are different.

ParetoOptimal · 2 years ago

This blog post also details using difftastic within emacs and has neat comparisons:

https://shivjm.blog/better-magit-diffs/

rmwaite · 2 years ago

`git diff --word-diff` has something close to this and it’s really nice.

mi_lk · 2 years ago

only side-by-side view, unfortunate for my unified view taste

nix0n · 2 years ago

It has a --display=inline option now.

JNRowe · 2 years ago

That kinda feels like how I review with vim.

* A little scripting around opening the PR, which basically performs a "vimdiff <(git show baseref:file) file"-style dance on the changes(see :h diff). Using vim's tabs is great for this as they're really only views, so you can hold individual buffers open in distinct states at the same time.

* Scroll locking still works as expected in the main view, but you can avoid it in a separate tab when needed.

* [c and ]c move between hunks from the set of changes as they exist in the PR not in the working directory.

* dp and dg allow you to mark hunks as "done" by pushing/pulling the hunk in to the read-only diff buffer so that they're now hidden from the highlighted changes in the live buffer.

* Changes you make in the live buffer are available to commit directly, or push as a comment.

* All your regular editor things work as expected in the current state of the tree; go to definition, build integration, popup docs, etc.

Working like this means you're viewing changes against the PR's base, but have a clean working directory. That, to me, feels like a significant improvement over matklad's solution of having the working directory be in an unclean state to view the changes.

The environment I work in makes this behaviour super nice as changes will often be added with a --fixup commit, and then the tooling mangles them back together with a git-interpret-trailers call to attribute the fixup commit's author to the original commit at merge time. It also pulls text comments out of the PR and attaches a Reviewed-by trailer where appropriate, or the +1 equivalent to tack an Acked-by trailer on.

kerneis · 2 years ago

What do you mean exactly by "push as a comment" and "pulls text comments"? Is it some sort of custom logic specific to your work place?

Oops, poorly worded and using internal terminology. I'll try again ;)

We use some internal tooling to make things work, but the concepts are generic git and $forge.

Push a comment: I'll make a hand-wavy suggested edit, then a vim mapping basically performs a ":`<,`>w !curl". If you only used GitHub, then piping to something like "gh pr comment"¹ could perform a similar-ish role(albeit a little weaker).

Pull comments: We automate merges so that PR text(including replies) are available in the repo. For the general discussion they're attached as notes², and for code comments(including --fixup commits) they're processed to assign the correct attribution via trailers³ to the original commit at merge time. Most of the attribution functionality could be re-implemented with "git rebase --autosquash" and by providing a "git interpret-trailers" wrapper as $EDITOR.

¹ https://cli.github.com/

² https://git-scm.com/docs/git-notes

³ https://git-scm.com/docs/git-interpret-trailers

iot_devs · 2 years ago

This is an huge issue for us in a very big and complex codebase with a lot of engineers working on it.

Code review is hard because the diff always looks reasonable, the tests always pass and all the basic stuff are always checked.

However, it happens often that, even if the changes looks reasonable they are wrong.

The whole architecture may drift after one bad change that looks reasonable.

As always this is not strictly a problem with the tooling, but more of culture and knowledge sharing.

And we are not going to solve it with a better tool.

zmj · 2 years ago

> the tests always pass

> it happens often that, even if the changes looks reasonable they are wrong

I’m having trouble reconciling these two statements.

When I review code, I’m rarely looking for bugs. Instead I’m looking for tests that would catch those bugs.

thfuran · 2 years ago

Do you have tests that verify that an abstraction hasn't sprung a leak?

talent_deprived · 2 years ago

A somewhat minor nitpick, the word huge begins with a consonant sound, not a vowel sound. It would be correct to write "a huge issue" not "an huge issue".

beaugunderson · 2 years ago

depends if you pronounce it 'uge or not ;P

(which is why you see some people write an history, because of the 'istory pronunciation)

phailhaus · 2 years ago

If you hit `.` in GitHub, you'll get dropped into a full IDE inside the browser. I've found this to be invaluable for reviews, because it lets you see the changes within the context of the entire file, rather than just seeing snippets. I'm much more likely to catch subtle design issues that way.

flanbiscuit · 2 years ago

ok this is really cool, thanks for the tip! I've been in this IDE before but only when making edits to files directly in Github. I did not know I could shortcut to it directly from a commit diff.

By the way, this shortcut also works in Gitlab (just tried it)

jtreminio · 2 years ago

This is game changing, thank you for telling us this.

Offpics · 2 years ago

Thats bonkers, thanks

buro9 · 2 years ago

I miss some of the capabilities from p4merge such as the split diff having a margin inbetween showing which parts of the file correspond to the other file, whilst also showing the impact of the diff on both sides: https://www.perforce.com/manuals/p4merge/Content/P4Merge/dif...

I also like their 3-way merge capability https://www.perforce.com/manuals/p4merge/Content/P4Merge/dif...

these features never made it into the web based diff tools that are widespread, I think the 3 way diff is a be a good way to show result of a difficult merge

jakub_g · 2 years ago

FYI if using GitHub, on github.com pull request page, press `.`, or change domain name to github.dev.

This will open VSCode in browser, with the pull request in diff view.

Advantages:

- you see whole files there

- diff algo is different than on github.com, sometimes more readable for complex diffs

inferiorhuman · 2 years ago

Disadvantages:

* S-L-O-W

* Glitchy

Which is a travesty because a side by side diff view with an out-of-line list of PR comments could be incredibly useful if it didn't involve spinning up a whole VSCode instance in the browser. In fact it's so useful that GH used to have a split view available without forcing people into their buzzword AI ML crypto blockchain cloud enabled dev environment nonsense.

Good idea, abysmal execution (which pretty much sums up all of GH these days I suppose).

GH still has a split view available. Above the diff, you have a settings icon which shows dropdown where you can changed between unified and split diff.

vasergen · 2 years ago

thanks for the tip