> we calculate the swear factor as the number of swearwords divided by the lines of code
That's what I suspected. Assuming that most swear words will be contained in comments, what this is actually measuring is the ratio of comments to code. In other words, code that is more heavily commented is better.
I think we already knew this.
That said I would like to see a more critical analysis. First control for comment density. Then compare code quality to swearing in comments and also variable names.
It is anecdata, but I can confirm this is the case for my code.
I tend to focus more on documenting the surprising code paths, not the mundane. And when my code needs to do something special because some other component (library, hardware, API) has issues, there's usually some colourful language describing the sad state of the world outside my control.
Also any mature, well maintained code will have found a lot more of those truly ‘wtf’ bugs and edge cases, which often involve same colourful language when we finally figure them out.
>...code that is more heavily commented is better.
>I think we already knew this
Who's "we"?
In my many years of software development, I've found a very large fraction of developers use very few, or even zero comments, and it's getting worse. Just look at the posts below here: there's a bunch of people arguing that comments are useless or harmful. It's no wonder that software sucks so much these days, since apparently no one believes in documentation or code maintenance any more.
> It's no wonder that software sucks so much these days, since apparently no one believes in documentation or code maintenance any more.
I think this comment explains why software gets worse in many cases:
> Of course nowadays, this is legacy nonsense. Everything uses UTF-8 for
"char", and what doesn't is broken and terrible anyway. But the old ways
stayed with us, and the stupidity of it as well.
The problem is the "legacy nonsense" tends to accumulate over time & as people depend on it, takes a long time to finally remove.
> They are so hilariously misdesigned and
insufficient, I can't even fathom how this shit was _standardized_.
They did their best given their circumstances & abilities. Now we must forever pay the price.
> Several decades later, the moronic standard committees noticed that this
was (still is) kind of a bad situation. Instead of fixing the situation,
they added more garbage on top of it. (Probably for the sake of "compatibility").
At least they tried...
> All in all, I believe this proves that software developers as a whole
and as a culture produce worse results than drug addicted butt fucked
monkeys randomly hacking on typewriters while inhaling the fumes of a
radioactive dumpster fire fueled by chinese platsic toys for children
and Elton John/Justin Bieber crossover CDs for all eternity.
> In other words, code that is more heavily commented is better.
It could also be that understanding code in any non-trivial project is likely to back the developer into a corner where they become frustrated and swear at the computer.
More importantly, the lack of swearing might be a sign that the devs lack the competence to know when they are cornered.
I think anger is a sign the developer actually cares about what they are doing. In my experience, people who don't care aren't at all irritated by the imperfections of the software they have to use, they just accept it, slay their dragon and move on. People who care tend to get very angry about what's ultimately philosophical matters.
He don't seem to use the swear factor anywhere. The actual statistical comparison (Table 3.1) is simply mean SoftWipe score of repos with swears (5.87) vs. mean SoftWipe score of repos with 4+ stars (5.41). The increase is due to 2-3 clusters of swear repos with SoftWipe score ~7.5 and ~20k lines of code. It seems like he deduplicated the repos based on URL, not content, and Github could have biased the results returned in the GitHub search, so I wonder if it is simply sample bias.
Isn’t commented code no longer considered a good idea in most companies?
I used to work for a bank and the policy is no comments unless absolutely necessary, because comments become out of date. Doxygen is the only real comments allowed.
It really depends on the comments. Best practice is to comment “why” something was done, not “what” is being done, or “how”. Every programmer can read code, most code should be pretty self-explanatory. But in any sufficiently complex routine, there are going to be some things that a programmer struggled to get working the first time, that they had to work around, or that was simply unintuitive. These should be commented. Generally I’ve found uncommented code bases to be thrown together haphazardly and to be of lower quality. Same with those that only trivially leave comment breadcrumbs about what is being done, duplicating the code itself.
No, that is stupid. People just don't want you to write comments like
// Set foo to true
foo = true;
Somebody saw one too many comments like that and overreacted. As long as you a) don't write comments describing what is self evident from the code, and b) try to make the code as descriptive as possible, then it's fine. Comment away.
Inline comments are a reflection on the authors' abilities to write good comments. They can be kinda useless, actually-bad, or really helpful.
One canonical example of a "good comment" is explaining why a strange or not-the-least-complex approach was taken to implementing a certain solution. The code is like chesterton's fence, and the comment is a post explaining why it's there. That way, future readers can better assess for themselves whether it's worth their time trying to tear down the fence.
No-comments are better than low-effort, low-quality, unmaintained comments, for sure.
You can imagine a world where all the projects that aren't realistically going to spend the effort on high-quality maintained comments makes the correct choice to skip comments unless absolutely necessary. And where projects that are realistically going to put effort into high-quality, maintained comments, do so.
In this world, comment density would correlate highly with code quality per line of code. Profanity might not, I'm not sure. I do think you'd still find profanity in high-effort, high-quality, maintained comments, but it might indicate lower quality surrounding code, not higher.
And it would still be unclear whether the existence of comments are a cause of higher quality code, or just a proxy for amount of effort and care taken per line of code.
While I carefully keep others away from my code the notes say I have a complicated relationship with my future self. While comments should be the least useful to me I've tried many formats and found that it is spectacular to have the full elaborate comment above each bite sized nugget of code. I mean that what was solved as a single thing after breaking down the larger problem.
The result is that I don't read any code at all. The whole thing is compiled to the native format that is human language. The code is great for illustration.
If I keep it in separate files as documentation it takes to much effort to find and update. It takes needles extra effort and is less precise.
It is just a personal preference of course but if one had any experience writing code in any language it should be easy to grasp say at 4 am while drunk.
I'm not sure the ratio of comments to LoC is a sign of good quality code.
Too many comments might actually be a bad thing. It's more lines to maintain, and sometimes the comments just tell what the code is doing where there is no need to.
If you have a process where every commit is well documented, you don't need much comments since you can rely on whatever is your analogue for git blame. It's not a lack of comments, it's actually the opposite but aside from the code base.
When I worked at SAP where VCS for ABAP is ancient and has no analogue for git blame we had a practice of putting a SAP Note next to every code change, since some of the things that we had to implement are dictated by business/legislation, so you need a proper explanation from time to time. Without it, the code becomes unmaintainable.
I get where they are going with it - every block of code should probably be obvious and final since it has one item to do well. Unfortunately there are always times when n random fields will be used for a conditional that is completely non-obvious. Comments will always be necessary to some degree.
that's why three and I think the fourth will leave our company after original developer left undocumented and hard to understand code (it's pretty complex and has tons of hacks to work on different OS'es), first year they learn what the hell is that code, how it works and they are not allowed to comment anything (I asked few times why we waste so much time for basic stuff, they said it is unnecessary...ok I guess). Now they hired original developer for tons of money just to consult newest developer and explain him code. Reasonable I guess.
I never write comments, and I think they typically have a negative correlation to quality (this code is so f**ed up, plain english is, of all things, more clear and precise than code!). Unless you are releasing code publicly, and you are documenting the public API, I've never seen a valuable comment. I've seen plenty of harmful comments. However I wholeheartedly endorse using this otherwise useless but common language feature for swearing.
I think swearing in comments indicates you are unburdened by bureaucracy and pointy haired bosses (because they prohibit such things), which would certainly lead to better code.
Possible explanation: swearing is more likely to be committed into code by people who either (1) own the code, or (2) know they're too valuable to be punished. So it self-selects.
I personally have very different commenting styles between my work and personal projects. Not that any of it's good.
Alternative explanation in the same vane as your theory:
The cognitive and time cost of compliance for language policing takes away from valuable programming and planning involved in developing solutions. (i.e. "banned words" [swear words] and politicalized words [whitelist/blacklist,etc])
Antoher possibility is the people who don't want to deal with that are gone and we're seeing a loss of their contributions.
I'd bet a lot of the non-profanity code is people open sourcing code just to be impressive on resumes or for school, where the profanity code is probably real code.
Sounds likely to be a classic case of correlation != causation
Rorschach test for programmers: give your confident gut feeling explanation for this phenomenon.
I'll do mine: there's likely a correlation between needing to maintain a professional conduct which includes forgoing foul language (you're programming at work) and writing code under time pressure where getting a product ready for release is more important than strict adherence to clean programming practice (you're programming at work).
Take almost any two things like this and you're actually virtually guaranteed to draw out some weak, but quite likely statistically significant, correlation.
What lies behind that correlation is probably a entropic mishmash of so many factors that it defies human explanation, and also, defies any attempt to try to "harness" the forces that seem to appear. It could be that all the siblings to the comment are right all at once.
I'll cop to just glancing at the graphs, but they don't look out of line for this effect to me intuitively.
Also backing this is that more-or-less the same article/thesis could easily have been written for the opposite correlation.
My gut feeling: when you start to submit swear words in your code, it indicates that you "breathe" the code and know it in and out.
The other extreme: if you have no idea what you are doing, you might try to mimic "corp speak" in your code to hide the fact that you actually have no clue.
In other words: it needs some confidence in your ability to assess some aspect of the code in order to use swear words.
This seems unlikely to be true in this case because the study was looking at github projects, and it seems unlikely the sample had enough code from "uptight" work places, to have an affect one way or another
The developer who knows what they're doing is also more likely to be 1) overworked because they do much of the useful stuff and 2) cognizant of bureaucracy which gets in the way of them doing useful stuff.
I remember there being a startup in the Dotcom era, I forget the name but for people familiar with Cambridge, MA it was where the IDEO is now. They were notorious for a few things, but one of them was writing open source software with a lot of profanity.
I thought this was cool, and was talking excitedly about it to my boss and some of the senior devs. They were less amused. Cut 20 years later and I too am less impressed by this.
Not that I think it's *bad* per se, I'm not clutching pearls or anything. But I never find myself thinking what the code really needs are profanities in the comments. Whereas back then I thought it'd be funny/cool and went out of my way to do so when I could. Which wasn't often.
Swearing for the sake of it does look childish, yes. I've noticed that in a few streaming TV shows, where they've gotten too excited over a lack of censorship that they just end up looking like teenagers who still think saying "fuck" is an act of rebellion
On the other hand, I'd like to write something like "this is a bit shit but will be replaced later" because that's how I naturally speak. Sanitising it to "crap" or "poor" just makes me feel like I'm teaching a youth club or something, and it is a minor pipeline stall in my train of thought while I do a mental synonym search
I wonder if swearing can help "free the mind" in some way, with the "rebellion" opening up more, perhaps non-standard/out of the box, "fucking good" ideas?
I hear this, comments generally should not draw attention to themselves. For this, short & terse win. I routinely look to cut any unnecessary words from comments.
It was the most painful code review where I asked someone to remove a joke they wrote in the comments. It was a good joke, funny, short, in good taste, I loved it, but.. distracting and unnecessary.
I mean i think the article is implying that. However i think the bigger thing is the correlation is misleading due to the sample being the long tail of github projects, which i dont think is representitive of "production" open source projects and certainly not software in general.
Nobody suggested causation. The idea that you can improve code quality by adding profane comments is so self-evidently absurd that nobody would even suggest such a thing. Except you kind of just did.
I skimmed the paper, and it looks like they are looking for swearing _anywhere_ in the repos' code, not just comments.
I would be curious to see the ratio of swearing in comments vs code identifiers. I'd also be curious to see if the repos with swearing in their comments just have more comments in total. Perhaps the correlation is, "code with more comments is more likely to be higher quality".
The jury is still out if I'm a good programmer, but I did one time need to use a hashmap that had to grow to about ~100gb in size. Because of that, I ended up calling it "bigassHashTable".
It makes me happy that it remained being called that for quite awhile.
I remember a day at a previous job when our CEO came in and told us we weren't an early stage startup anymore and had to start acting like it. Remove profanity and inside jokes from the code, and no more Quake during lunch breaks. Morale took a big hit that day.
The best programmers I've worked with swore at their coworkers regularly, but never in their code.
They were not great people, and I'd happily kick them in the face if I would encounter no legal or professional repercussions, but, there definitely does seem to be some correlation (in my experience) between being abrasive and being a skilled programmer.
I'm sure the top comment here will be something like "this is invalid because no way can you assign a numerical value to code quality! wtf?!"
I'm withholding my own judgement on that.
For anyone curious, the authors are coming up with a code quality score using an open-source tool called SoftWipe[0]. From the paper:
> SoftWipe is an open source tool and benchmark to assess, rate, and review scientific
software written in C or C++ with respect to coding standard adherence. The coding
standard adherence is assessed using a set of static and dynamic code analysers such
as Lizard (https://github.com/terryyin/lizard) or the Clang address sanitiser (https:
//clang.llvm.org/). It returns a score between 0 (low adherence) and 10 (good adherence). In order to simplify our experimental setup, we excluded the compilation warnings, which require a difficult to automate compilation of the assessed software, from the analysis using the --exclude-compilation option.
While at Sun in the early 2000's, I was part of the due diligence team for an acquisition and had two days to review the entire code base of a 3 year old, 50 person software team.
This was standard practice, and the M&A policies knew that there was no way to actually understand all the code so there was a policy document to describe what to look for.
Of course the red flag things were unexpected 3rd party copyrights and/or license terms in case the code was encumbered.
But "swear words" were on the yellow flag list, in addition to "ToDo", "XXXX", and "Fix Me" types of things.
I remember thinking about places I have been in the past and that the people used those style comments tended to be the better programmers.
I mentioned this to the person leading the evaluation, and was told that point of noticing these kinds of comments was to look a more closely at the nearby code and try to decide if major functionality was missing or being faked.
It all worked out for that acquisition, but I remember being curious about whatever deal had gone bad in the distant past that made them codify this specific practice.
Correlation is not causality. Swearing in the comments will not magically make your code better, but fixing a hidden bugs that you have been chasing for weeks will certainly make you swear when fixed.
I'm fond of pointing out, despite every time I get downvoted, that causation is the thing we have no knowledge of, and therefore correlation is all we have. As Feynman said about gravity, there is no how or why to gravity, as far as we know it's simply a property of matter. But of course, that means we only know that because of the perfect correlation between matter and gravity, including every time we conduct an experiment about it; but still we have no cause to point to.
A reasonable working definition of causality, used by almost all scientists today, is that X causes Y if a change in X, unaccompanied by any other change, changes Y. At root, this is indeed a statement about correlations, but it's a special kind of correlation, which is hard to estimate from observational data where many other things may change along with X.
While that may be the case, the correlation coefficient of matter and gravity is so close to 1 that we can't tell the difference and the correlation coefficient of swearing in code to good code is far less.
Sounds like you're suggesting a causal relationship the other way, though. As per this explanation, putting effort into debugging edge cases will statistically cause the comments to swear more.
My pet theory is that this is because honest, emotional comments are much more useful than the usual “professional” style that try to hide it when you have no clue what you’re doing.
When it’s clear someone was stuck, frustrated, banging their head against the wall etc while writing a particular bit of code, you can refactor a lot less defensively because you know the crappy parts weren’t secretly there for a reason.
I love real, honest, emotional comments. Pour all the frustration in there. Future you and your colleagues will thank you.
> we calculate the swear factor as the number of swearwords divided by the lines of code
That's what I suspected. Assuming that most swear words will be contained in comments, what this is actually measuring is the ratio of comments to code. In other words, code that is more heavily commented is better.
I think we already knew this.
That said I would like to see a more critical analysis. First control for comment density. Then compare code quality to swearing in comments and also variable names.
I tend to focus more on documenting the surprising code paths, not the mundane. And when my code needs to do something special because some other component (library, hardware, API) has issues, there's usually some colourful language describing the sad state of the world outside my control.
Who's "we"?
In my many years of software development, I've found a very large fraction of developers use very few, or even zero comments, and it's getting worse. Just look at the posts below here: there's a bunch of people arguing that comments are useless or harmful. It's no wonder that software sucks so much these days, since apparently no one believes in documentation or code maintenance any more.
I think this comment explains why software gets worse in many cases:
> Of course nowadays, this is legacy nonsense. Everything uses UTF-8 for "char", and what doesn't is broken and terrible anyway. But the old ways stayed with us, and the stupidity of it as well.
The problem is the "legacy nonsense" tends to accumulate over time & as people depend on it, takes a long time to finally remove.
> They are so hilariously misdesigned and insufficient, I can't even fathom how this shit was _standardized_.
They did their best given their circumstances & abilities. Now we must forever pay the price.
> Several decades later, the moronic standard committees noticed that this was (still is) kind of a bad situation. Instead of fixing the situation, they added more garbage on top of it. (Probably for the sake of "compatibility").
At least they tried...
> All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.
Yeah! Time to get back to work...
Credit to https://news.ycombinator.com/item?id=36626018 for pointing this out.
It could also be that understanding code in any non-trivial project is likely to back the developer into a corner where they become frustrated and swear at the computer.
More importantly, the lack of swearing might be a sign that the devs lack the competence to know when they are cornered.
paper: https://cme.h-its.org/exelixis/pubs/JanThesis.pdf
I wouldn’t be surprised if code quality goes up with comment curses and down again with commit message curses.
You can really feel the author's rage at the state of the world.
Saying "control for comment density" presumes one knows how to even do that or how to even define it.
How do you decide that a given line of code or comment should weigh more or less than another?
If a codebase has both a lot of swear words and a lot of all other words, so what?
Deleted Comment
Deleted Comment
I used to work for a bank and the policy is no comments unless absolutely necessary, because comments become out of date. Doxygen is the only real comments allowed.
Code gets out of date as well, so let's just stop writing it altogether..
1. An odd business requirement (share the origin story)
2. It took research (summarize with links)
3. Multiple options were considered (justify decision)
4. Question in a code review (answer in a comment)
And the article on how/what/why in code: https://max.engineer/maintainable-code
Inline comments are a reflection on the authors' abilities to write good comments. They can be kinda useless, actually-bad, or really helpful.
One canonical example of a "good comment" is explaining why a strange or not-the-least-complex approach was taken to implementing a certain solution. The code is like chesterton's fence, and the comment is a post explaining why it's there. That way, future readers can better assess for themselves whether it's worth their time trying to tear down the fence.
You can imagine a world where all the projects that aren't realistically going to spend the effort on high-quality maintained comments makes the correct choice to skip comments unless absolutely necessary. And where projects that are realistically going to put effort into high-quality, maintained comments, do so.
In this world, comment density would correlate highly with code quality per line of code. Profanity might not, I'm not sure. I do think you'd still find profanity in high-effort, high-quality, maintained comments, but it might indicate lower quality surrounding code, not higher.
And it would still be unclear whether the existence of comments are a cause of higher quality code, or just a proxy for amount of effort and care taken per line of code.
The result is that I don't read any code at all. The whole thing is compiled to the native format that is human language. The code is great for illustration.
If I keep it in separate files as documentation it takes to much effort to find and update. It takes needles extra effort and is less precise.
It is just a personal preference of course but if one had any experience writing code in any language it should be easy to grasp say at 4 am while drunk.
Too many comments might actually be a bad thing. It's more lines to maintain, and sometimes the comments just tell what the code is doing where there is no need to.
When I worked at SAP where VCS for ABAP is ancient and has no analogue for git blame we had a practice of putting a SAP Note next to every code change, since some of the things that we had to implement are dictated by business/legislation, so you need a proper explanation from time to time. Without it, the code becomes unmaintainable.
I think swearing in comments indicates you are unburdened by bureaucracy and pointy haired bosses (because they prohibit such things), which would certainly lead to better code.
Dead Comment
I personally have very different commenting styles between my work and personal projects. Not that any of it's good.
"This is bullshit" is an important realization. If you can't say it, then things will stay miserable.
(But I concede that effort and productivity are not the same thing.)
The cognitive and time cost of compliance for language policing takes away from valuable programming and planning involved in developing solutions. (i.e. "banned words" [swear words] and politicalized words [whitelist/blacklist,etc])
Antoher possibility is the people who don't want to deal with that are gone and we're seeing a loss of their contributions.
Who's the narc on your team that would even point it out? It's not like HR has some commit hook on the repos filtering for this stuff...
Sounds likely to be a classic case of correlation != causation
I'll do mine: there's likely a correlation between needing to maintain a professional conduct which includes forgoing foul language (you're programming at work) and writing code under time pressure where getting a product ready for release is more important than strict adherence to clean programming practice (you're programming at work).
Everyone post your favourite conjecture!
Take almost any two things like this and you're actually virtually guaranteed to draw out some weak, but quite likely statistically significant, correlation.
What lies behind that correlation is probably a entropic mishmash of so many factors that it defies human explanation, and also, defies any attempt to try to "harness" the forces that seem to appear. It could be that all the siblings to the comment are right all at once.
I'll cop to just glancing at the graphs, but they don't look out of line for this effect to me intuitively.
Also backing this is that more-or-less the same article/thesis could easily have been written for the opposite correlation.
Places uptight enough that developers never swear in comments are uptight in other ways that lead to poor team dynamics which hinders quality.
The other extreme: if you have no idea what you are doing, you might try to mimic "corp speak" in your code to hide the fact that you actually have no clue.
In other words: it needs some confidence in your ability to assess some aspect of the code in order to use swear words.
I thought this was cool, and was talking excitedly about it to my boss and some of the senior devs. They were less amused. Cut 20 years later and I too am less impressed by this.
Not that I think it's *bad* per se, I'm not clutching pearls or anything. But I never find myself thinking what the code really needs are profanities in the comments. Whereas back then I thought it'd be funny/cool and went out of my way to do so when I could. Which wasn't often.
On the other hand, I'd like to write something like "this is a bit shit but will be replaced later" because that's how I naturally speak. Sanitising it to "crap" or "poor" just makes me feel like I'm teaching a youth club or something, and it is a minor pipeline stall in my train of thought while I do a mental synonym search
It was the most painful code review where I asked someone to remove a joke they wrote in the comments. It was a good joke, funny, short, in good taste, I loved it, but.. distracting and unnecessary.
if (*some_bullshit >= shit_tolerance){
fucks_given = 0;
exit(IM_DONE);
}
I would be curious to see the ratio of swearing in comments vs code identifiers. I'd also be curious to see if the repos with swearing in their comments just have more comments in total. Perhaps the correlation is, "code with more comments is more likely to be higher quality".
It makes me happy that it remained being called that for quite awhile.
They were not great people, and I'd happily kick them in the face if I would encounter no legal or professional repercussions, but, there definitely does seem to be some correlation (in my experience) between being abrasive and being a skilled programmer.
I'm withholding my own judgement on that.
For anyone curious, the authors are coming up with a code quality score using an open-source tool called SoftWipe[0]. From the paper:
> SoftWipe is an open source tool and benchmark to assess, rate, and review scientific software written in C or C++ with respect to coding standard adherence. The coding standard adherence is assessed using a set of static and dynamic code analysers such as Lizard (https://github.com/terryyin/lizard) or the Clang address sanitiser (https: //clang.llvm.org/). It returns a score between 0 (low adherence) and 10 (good adherence). In order to simplify our experimental setup, we excluded the compilation warnings, which require a difficult to automate compilation of the assessed software, from the analysis using the --exclude-compilation option.
[0]: https://github.com/adrianzap/softwipe
This was standard practice, and the M&A policies knew that there was no way to actually understand all the code so there was a policy document to describe what to look for.
Of course the red flag things were unexpected 3rd party copyrights and/or license terms in case the code was encumbered.
But "swear words" were on the yellow flag list, in addition to "ToDo", "XXXX", and "Fix Me" types of things.
I remember thinking about places I have been in the past and that the people used those style comments tended to be the better programmers.
I mentioned this to the person leading the evaluation, and was told that point of noticing these kinds of comments was to look a more closely at the nearby code and try to decide if major functionality was missing or being faked.
It all worked out for that acquisition, but I remember being curious about whatever deal had gone bad in the distant past that made them codify this specific practice.
I'm fond of pointing out, despite every time I get downvoted, that causation is the thing we have no knowledge of, and therefore correlation is all we have. As Feynman said about gravity, there is no how or why to gravity, as far as we know it's simply a property of matter. But of course, that means we only know that because of the perfect correlation between matter and gravity, including every time we conduct an experiment about it; but still we have no cause to point to.
> This means that swearing will not automatically improve the quality of your code.
When it’s clear someone was stuck, frustrated, banging their head against the wall etc while writing a particular bit of code, you can refactor a lot less defensively because you know the crappy parts weren’t secretly there for a reason.
I love real, honest, emotional comments. Pour all the frustration in there. Future you and your colleagues will thank you.
Everyone swears sometimes. If you never do it in front of others, it signals that you're always filtering yourself.