Unfortunately Github doesn't have a way to render symbols for whitespace, but you can tell by selecting the spaces that the previous version had leading tabs. Linus changed it so that the tokens `default` and the number e.g. `12` are also separated by a tab. This is tricky, because the token "default" is seven characters, it will always give this added tab a width of 1 char which makes it always layout the same as if it were a space no matter if you use tab widths of 1, 2, 4, or 8.
This gets to a dimension of the problem that is often overlooked: Git web viewers, like every other code viewer we use carries its own notion of the position of the tab stops.
Notably, this includes CLI shells, connected to a "terminal emulator", where what is being emulated is an ancient piece of hardware:
A far-downstream consequence of this is that source code formatted to an assumption of tab stops at other than 8-column intervals, as is not uncommon in Javascript, produces unreadable CLI output from diff, git-diff, ...
Parser fails to parse data --> Fixed by modifying ingested data
Just that the ingested data is a part of the Linux kernel codebase.
Quite some hubris to proceed and apply such a "fix" by making a commit to the Linux kernel...
Arguably it would be fine if the community benefited from the parser. After all it's just a custom, undocumented format that lives specifically in this one repo.
But I totally sympathise with Linus' annoyance when the issue is with an external tool, and the author didn't explain which tool or give any reason why it's hard to fix that tool.
You cant use “hubris” as a pejorative and go on to claim that Linus is the good guy. There’s clearly hubris on both sides, and only one of the sides made a big deal about it.
Developers change things for parsers all the time. That’s, like, coding.
It's both Linus's plain arbitrary right, and his plain job, his defined role and office, to make exactly such decisions for this project. That's not hubris. It's just a role that affects a lot of people.
What makes it hubris on one side is "Who do you think you are making such a change to the Linux kernel that everyone else will have to accept?"
The reason things like that are phrased as questions is to allow for the possibility that there might be an answer.
For one of these parties, the question is rhetorical.
For one of these parties, the question is not rhetorical.
It's hubris to think that the issue with a parser to identify whitespaces properly is warranting to change code in the KERNEL of arguably the most widespread operating system in the world.
And THEN not even providing more justification for this in the description.
I'm not claiming anywhere that Linus is the good guy. Or bad guy.
In this case I agree with his stance that whatever this parser is, it should better fail harder in order to get it fixed.
And if you know how he reacts when he's making a big deal of something, you know that this one isn't one of those times...
There's only two ways to write code where the lines of text stay legible regardless of tab size configuration:
1. Use all spaces
2. Use tabs for indentation and spaces for alignment
Unfortunately, only individual developers seem competent on their own to do #2, so everyone who cares about readability inevitably practices #1 by default.
A special case of (2) that is easy to do is to use tabs* for indentation and not do column alignment at all. To be clear, by "column alignment" I think we are both referring to patterns like this:
This is, e.g., what Go uses for struct fields, and what some Python style guides use for hanging function definitions. Regardless of tabs/spaces preference, both of these are independently bad because they churn diffs unnecessarily: if you change `afterward` to `afterward2` then you need to change all the nearby lines, and likewise if you change `my_long_function` to `my_longer_function`. Some formatters, like Black, Prettier, and (mostly) Rustfmt, avoid this pattern entirely, and they are better for it.
* You can do this and still use spaces if you prefer, too.
> Use tabs for indentation and spaces for alignment
I'm not surprised that this isn't something that projects have been able to adopt successfully very often because I've never found it very intuitive that those are separate things. In what way is "indentation" not also a form of "alignment"?
The other problem with it is that it assumes that people have visible whitespace on, and that their tools even have that option to show whitespace, otherwise it’s like navigating the Fuchsia Gym.
I don’t mind if people use tabs but mixing the two is not great.
Aligning a character on one line with an arbitrary character on another line is purely a choice of style, not a requirement.
It is perfectly doable to do only tabs, but many end up mixing in spaces.
The curse of space-only files is in people that manage commit indentation errors, breaking auto-detection in some editors, which propagate to even more indentation errors... All it takes is an inattentive reviewer, or review-less merge.
Readability of the code is not mere style, and can directly translate into errors being more visible. Compare:
a_variable = (
'lorum ipsum dolor sit amet ' +
'my poor memory has left me quite upset ' +
'for i cannot remember what word comes next '
'in this long descriptive text' +
'surely this is bound for the incinerator ' +
'but remember any haiku can end, refrigerator.'
)
But now if I choose to align certain characters:
a_variable = (
'lorum ipsum dolor sit amet'
+ ' my poor memory has left me quite upset'
* ' for i cannot remember what word comes next'
' in this long descriptive text'
+ 'surely this is bound for the incinerator '
+ ' but remember any haiku can end, refrigerator.'
)
… the errors in the first version are now plainly obvious. (Both the missed space, as well as the missed +.)
(This is an example. Yes, there are languages for which you don't need the +. There are some for which you do, however. There are also some that resist having the + moved about: for example, in Javascript, the parens become required, or you'll trigger the horrid auto-semicolon "feature".)
> Use tabs for indentation and spaces for alignment
The pains is, most website or editor never handled that well enough. You end up have mixed tab/space at unexpected position and never knew about it.
Just banning the tab is probably not the most 'correct' option to fix it. But it is the most feasible one to get the job done. Because fixing all the tool, editors and websites is nearly impossible for an average man.
Thankfully there's a trend of excessive merge conflicts caused by reformatters.
Reformat-on-every-commit really only works for highly-centralized, tightly-coupled, monorepo-using monolithic organizations. Basically the exact opposite of kerneldev. For those folks reformat-on-every-commit works great.
What do you mean by reformat? Any decent code formatter keeps a consistent style. Getting conflics only happens if you missconfigure your editor or don't have checks to catch invalid formatting before merging to remote.
It's much worse for monolithic organizations, because they develop complex software and reformatting scrambles code history and it becomes difficult to untangle business logic.
actually, what we want is code management tools that work with tokenized code and do not depend on formatting. i want a diff tool that shows me exactly which tokens have changed, and which haven't, regardless of how they are laid out. when we get that, then we should get even less merge conflicts.
My bad, I should have added "/s" (because Cunningham's Law). It was a reference to Futurama, where a problem was not solved at all.
But on a more serious note, in my experience I've not had any issues with Go or Rust codebases (for example). Not using their formatters is heavily frown upon, so I haven't really seen any reformat happen at all; not in my bubble at least[1].
Other languages, on the other hand? Yeah, good luck with trying to have consistent formatting. Even if a project has formatting rules "enforced", there's always (always) going to be an exception, bikeshedding, etc.
[1]: Unless it's someone obviously very junior. The few times I've noticed badly formatted code in Go, has been in random repos from someone who clearly didn't have that much programming experience in general (looking at how code was written).
It's good that Linus is really exercising those 3rd party tools! They should send some money his way for helping them test their code.
This web view renders tabs as spaces, so it's not possible so see what was changed.
https://github.com/torvalds/linux/commit/d5cf50dafc9dd5faa1e...
https://github.com/torvalds/linux/blob/d5cf50dafc9dd5faa1e61...
Unfortunately Github doesn't have a way to render symbols for whitespace, but you can tell by selecting the spaces that the previous version had leading tabs. Linus changed it so that the tokens `default` and the number e.g. `12` are also separated by a tab. This is tricky, because the token "default" is seven characters, it will always give this added tab a width of 1 char which makes it always layout the same as if it were a space no matter if you use tab widths of 1, 2, 4, or 8.
Notably, this includes CLI shells, connected to a "terminal emulator", where what is being emulated is an ancient piece of hardware:
https://en.wikipedia.org/wiki/Teletype_Model_33
Semantics of the ASCII tab byte code:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...
A far-downstream consequence of this is that source code formatted to an assumption of tab stops at other than 8-column intervals, as is not uncommon in Javascript, produces unreadable CLI output from diff, git-diff, ...
Just that the ingested data is a part of the Linux kernel codebase. Quite some hubris to proceed and apply such a "fix" by making a commit to the Linux kernel...
But I totally sympathise with Linus' annoyance when the issue is with an external tool, and the author didn't explain which tool or give any reason why it's hard to fix that tool.
Unfortunately until somebody spends the time to create a Kconfig test suite, the kernel itself needs to be the test case for this oddity.
https://docs.rs/nom-kconfig/latest/nom_kconfig/
Developers change things for parsers all the time. That’s, like, coding.
It's both Linus's plain arbitrary right, and his plain job, his defined role and office, to make exactly such decisions for this project. That's not hubris. It's just a role that affects a lot of people.
What makes it hubris on one side is "Who do you think you are making such a change to the Linux kernel that everyone else will have to accept?"
The reason things like that are phrased as questions is to allow for the possibility that there might be an answer.
For one of these parties, the question is rhetorical.
For one of these parties, the question is not rhetorical.
And THEN not even providing more justification for this in the description.
I'm not claiming anywhere that Linus is the good guy. Or bad guy.
In this case I agree with his stance that whatever this parser is, it should better fail harder in order to get it fixed.
And if you know how he reacts when he's making a big deal of something, you know that this one isn't one of those times...
1. Use all spaces
2. Use tabs for indentation and spaces for alignment
Unfortunately, only individual developers seem competent on their own to do #2, so everyone who cares about readability inevitably practices #1 by default.
You can never use only tabs.
* You can do this and still use spaces if you prefer, too.
I'm not surprised that this isn't something that projects have been able to adopt successfully very often because I've never found it very intuitive that those are separate things. In what way is "indentation" not also a form of "alignment"?
I don’t mind if people use tabs but mixing the two is not great.
Deleted Comment
It is perfectly doable to do only tabs, but many end up mixing in spaces.
The curse of space-only files is in people that manage commit indentation errors, breaking auto-detection in some editors, which propagate to even more indentation errors... All it takes is an inattentive reviewer, or review-less merge.
(This is an example. Yes, there are languages for which you don't need the +. There are some for which you do, however. There are also some that resist having the + moved about: for example, in Javascript, the parens become required, or you'll trigger the horrid auto-semicolon "feature".)
Sure, but could i not say the same of using any indentation at all?
The pains is, most website or editor never handled that well enough. You end up have mixed tab/space at unexpected position and never knew about it.
Just banning the tab is probably not the most 'correct' option to fix it. But it is the most feasible one to get the job done. Because fixing all the tool, editors and websites is nearly impossible for an average man.
- If you have project wide automatic code formatting: Tabs
- Otherwise: Spaces
Nowadays, most of my projects use option 1.
What do you mean?
> You can never use only tabs.
You can.
Reformat-on-every-commit really only works for highly-centralized, tightly-coupled, monorepo-using monolithic organizations. Basically the exact opposite of kerneldev. For those folks reformat-on-every-commit works great.
But on a more serious note, in my experience I've not had any issues with Go or Rust codebases (for example). Not using their formatters is heavily frown upon, so I haven't really seen any reformat happen at all; not in my bubble at least[1].
Other languages, on the other hand? Yeah, good luck with trying to have consistent formatting. Even if a project has formatting rules "enforced", there's always (always) going to be an exception, bikeshedding, etc.
[1]: Unless it's someone obviously very junior. The few times I've noticed badly formatted code in Go, has been in random repos from someone who clearly didn't have that much programming experience in general (looking at how code was written).
Deleted Comment