Open source code with profanity in comments is statistically better

From the research paper:

> we calculate the swear factor as the number of swearwords divided by the lines of code

That's what I suspected. Assuming that most swear words will be contained in comments, what this is actually measuring is the ratio of comments to code. In other words, code that is more heavily commented is better.

I think we already knew this.

That said I would like to see a more critical analysis. First control for comment density. Then compare code quality to swearing in comments and also variable names.

tremon · 3 years ago

It is anecdata, but I can confirm this is the case for my code.

I tend to focus more on documenting the surprising code paths, not the mundane. And when my code needs to do something special because some other component (library, hardware, API) has issues, there's usually some colourful language describing the sad state of the world outside my control.

taneq · 3 years ago

Also any mature, well maintained code will have found a lot more of those truly ‘wtf’ bugs and edge cases, which often involve same colourful language when we finally figure them out.

balaji1 · 3 years ago

that's *** brilliant I @#@#@ know that feeling

midoridensha · 3 years ago

>...code that is more heavily commented is better. >I think we already knew this

Who's "we"?

In my many years of software development, I've found a very large fraction of developers use very few, or even zero comments, and it's getting worse. Just look at the posts below here: there's a bunch of people arguing that comments are useless or harmful. It's no wonder that software sucks so much these days, since apparently no one believes in documentation or code maintenance any more.

briantakita · 3 years ago

> It's no wonder that software sucks so much these days, since apparently no one believes in documentation or code maintenance any more.

I think this comment explains why software gets worse in many cases:

> Of course nowadays, this is legacy nonsense. Everything uses UTF-8 for "char", and what doesn't is broken and terrible anyway. But the old ways stayed with us, and the stupidity of it as well.

The problem is the "legacy nonsense" tends to accumulate over time & as people depend on it, takes a long time to finally remove.

> They are so hilariously misdesigned and insufficient, I can't even fathom how this shit was _standardized_.

They did their best given their circumstances & abilities. Now we must forever pay the price.

> Several decades later, the moronic standard committees noticed that this was (still is) kind of a bad situation. Instead of fixing the situation, they added more garbage on top of it. (Probably for the sake of "compatibility").

At least they tried...

> All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.

Yeah! Time to get back to work...

Credit to https://news.ycombinator.com/item?id=36626018 for pointing this out.

jancsika · 3 years ago

> In other words, code that is more heavily commented is better.

It could also be that understanding code in any non-trivial project is likely to back the developer into a corner where they become frustrated and swear at the computer.

More importantly, the lack of swearing might be a sign that the devs lack the competence to know when they are cornered.

matheusmoreira · 3 years ago

I think anger is a sign the developer actually cares about what they are doing. In my experience, people who don't care aren't at all irritated by the imperfections of the software they have to use, they just accept it, slay their dragon and move on. People who care tend to get very angry about what's ultimately philosophical matters.

darkerside · 3 years ago

By the time you are swearing in your comments, I'm pretty sure you know you're cornered

Mathnerd314 · 3 years ago

He don't seem to use the swear factor anywhere. The actual statistical comparison (Table 3.1) is simply mean SoftWipe score of repos with swears (5.87) vs. mean SoftWipe score of repos with 4+ stars (5.41). The increase is due to 2-3 clusters of swear repos with SoftWipe score ~7.5 and ~20k lines of code. It seems like he deduplicated the repos based on URL, not content, and Github could have biased the results returned in the GitHub search, so I wonder if it is simply sample bias.

paper: https://cme.h-its.org/exelixis/pubs/JanThesis.pdf

tessierashpool · 3 years ago

as long as we’re designing the ideal experiment for someone else to do, let’s throw in the commit messages as well.

I wouldn’t be surprised if code quality goes up with comment curses and down again with commit message curses.

0cf8612b2e1e · 3 years ago

My favorite infamous example being the MPV C locale commit: https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

You can really feel the author's rage at the state of the world.

Brian_K_White · 3 years ago

The observation of the association is already valid since it doesn't try to say anything that no one can say.

Saying "control for comment density" presumes one knows how to even do that or how to even define it.

How do you decide that a given line of code or comment should weigh more or less than another?

If a codebase has both a lot of swear words and a lot of all other words, so what?

Deleted Comment

slowmovintarget · 3 years ago

That assumes that most comments contain cursing.

comfypotato · 3 years ago

No it doesn’t. If any more than 0% of comments contain profanity, then on average code with more profanity will be better.

mr_00ff00 · 3 years ago

Isn’t commented code no longer considered a good idea in most companies?

I used to work for a bank and the policy is no comments unless absolutely necessary, because comments become out of date. Doxygen is the only real comments allowed.

flatline · 3 years ago

It really depends on the comments. Best practice is to comment “why” something was done, not “what” is being done, or “how”. Every programmer can read code, most code should be pretty self-explanatory. But in any sufficiently complex routine, there are going to be some things that a programmer struggled to get working the first time, that they had to work around, or that was simply unintuitive. These should be commented. Generally I’ve found uncommented code bases to be thrown together haphazardly and to be of lower quality. Same with those that only trivially leave comment breadcrumbs about what is being done, duplicating the code itself.

paxys · 3 years ago

That's an idiotic policy and definitely not something that is industry standard.

Code gets out of date as well, so let's just stop writing it altogether..

hakunin · 3 years ago

Pasting again my 4 reasons to leave a code comment:

1. An odd business requirement (share the origin story)

2. It took research (summarize with links)

3. Multiple options were considered (justify decision)

4. Question in a code review (answer in a comment)

And the article on how/what/why in code: https://max.engineer/maintainable-code

IshKebab · 3 years ago

No, that is stupid. People just don't want you to write comments like

    // Set foo to true
    foo = true;

Somebody saw one too many comments like that and overreacted. As long as you a) don't write comments describing what is self evident from the code, and b) try to make the code as descriptive as possible, then it's fine. Comment away.

gen220 · 3 years ago

If you're curious to read a well-earned take on comments: http://antirez.com/news/124.

Inline comments are a reflection on the authors' abilities to write good comments. They can be kinda useless, actually-bad, or really helpful.

One canonical example of a "good comment" is explaining why a strange or not-the-least-complex approach was taken to implementing a certain solution. The code is like chesterton's fence, and the comment is a post explaining why it's there. That way, future readers can better assess for themselves whether it's worth their time trying to tear down the fence.

makeworld · 3 years ago

Wow, that seems strange to me. Seems like the policy should be to make comments when needed, and keep them up to date.

furyofantares · 3 years ago

No-comments are better than low-effort, low-quality, unmaintained comments, for sure.

You can imagine a world where all the projects that aren't realistically going to spend the effort on high-quality maintained comments makes the correct choice to skip comments unless absolutely necessary. And where projects that are realistically going to put effort into high-quality, maintained comments, do so.

In this world, comment density would correlate highly with code quality per line of code. Profanity might not, I'm not sure. I do think you'd still find profanity in high-effort, high-quality, maintained comments, but it might indicate lower quality surrounding code, not higher.

And it would still be unclear whether the existence of comments are a cause of higher quality code, or just a proxy for amount of effort and care taken per line of code.

throwaway14356 · 3 years ago

While I carefully keep others away from my code the notes say I have a complicated relationship with my future self. While comments should be the least useful to me I've tried many formats and found that it is spectacular to have the full elaborate comment above each bite sized nugget of code. I mean that what was solved as a single thing after breaking down the larger problem.

The result is that I don't read any code at all. The whole thing is compiled to the native format that is human language. The code is great for illustration.

If I keep it in separate files as documentation it takes to much effort to find and update. It takes needles extra effort and is less precise.

It is just a personal preference of course but if one had any experience writing code in any language it should be easy to grasp say at 4 am while drunk.

almet · 3 years ago

I'm not sure the ratio of comments to LoC is a sign of good quality code.

Too many comments might actually be a bad thing. It's more lines to maintain, and sometimes the comments just tell what the code is doing where there is no need to.

vezuchyy · 3 years ago

If you have a process where every commit is well documented, you don't need much comments since you can rely on whatever is your analogue for git blame. It's not a lack of comments, it's actually the opposite but aside from the code base.

When I worked at SAP where VCS for ABAP is ancient and has no analogue for git blame we had a practice of putting a SAP Note next to every code change, since some of the things that we had to implement are dictated by business/legislation, so you need a proper explanation from time to time. Without it, the code becomes unmaintainable.

smrtinsert · 3 years ago

I get where they are going with it - every block of code should probably be obvious and final since it has one item to do well. Unfortunately there are always times when n random fields will be used for a conditional that is completely non-obvious. Comments will always be necessary to some degree.

fx1994 · 3 years ago

that's why three and I think the fourth will leave our company after original developer left undocumented and hard to understand code (it's pretty complex and has tons of hacks to work on different OS'es), first year they learn what the hell is that code, how it works and they are not allowed to comment anything (I asked few times why we waste so much time for basic stuff, they said it is unnecessary...ok I guess). Now they hired original developer for tons of money just to consult newest developer and explain him code. Reasonable I guess.

jacobsenscott · 3 years ago

I never write comments, and I think they typically have a negative correlation to quality (this code is so f**ed up, plain english is, of all things, more clear and precise than code!). Unless you are releasing code publicly, and you are documenting the public API, I've never seen a valuable comment. I've seen plenty of harmful comments. However I wholeheartedly endorse using this otherwise useless but common language feature for swearing.

I think swearing in comments indicates you are unburdened by bureaucracy and pointy haired bosses (because they prohibit such things), which would certainly lead to better code.

chinchilla2020 · 3 years ago

you've never seen a useful comment?

Dead Comment

I'd bet a lot of the non-profanity code is people open sourcing code just to be impressive on resumes or for school, where the profanity code is probably real code.

Sounds likely to be a classic case of correlation != causation

bitofhope · 3 years ago

Rorschach test for programmers: give your confident gut feeling explanation for this phenomenon.

I'll do mine: there's likely a correlation between needing to maintain a professional conduct which includes forgoing foul language (you're programming at work) and writing code under time pressure where getting a product ready for release is more important than strict adherence to clean programming practice (you're programming at work).

Everyone post your favourite conjecture!

jerf · 3 years ago

Everything is correlated: https://gwern.net/everything

Take almost any two things like this and you're actually virtually guaranteed to draw out some weak, but quite likely statistically significant, correlation.

What lies behind that correlation is probably a entropic mishmash of so many factors that it defies human explanation, and also, defies any attempt to try to "harness" the forces that seem to appear. It could be that all the siblings to the comment are right all at once.

I'll cop to just glancing at the graphs, but they don't look out of line for this effect to me intuitively.

Also backing this is that more-or-less the same article/thesis could easily have been written for the opposite correlation.

dogleash · 3 years ago

> Everyone post your favourite conjecture!

Places uptight enough that developers never swear in comments are uptight in other ways that lead to poor team dynamics which hinders quality.

painted-now · 3 years ago

My gut feeling: when you start to submit swear words in your code, it indicates that you "breathe" the code and know it in and out.

The other extreme: if you have no idea what you are doing, you might try to mimic "corp speak" in your code to hide the fact that you actually have no clue.

In other words: it needs some confidence in your ability to assess some aspect of the code in order to use swear words.

bawolff · 3 years ago

This seems unlikely to be true in this case because the study was looking at github projects, and it seems unlikely the sample had enough code from "uptight" work places, to have an affect one way or another

lcnPylGDnU4H9OF · 3 years ago

The developer who knows what they're doing is also more likely to be 1) overworked because they do much of the useful stuff and 2) cognizant of bureaucracy which gets in the way of them doing useful stuff.

jghn · 3 years ago

I remember there being a startup in the Dotcom era, I forget the name but for people familiar with Cambridge, MA it was where the IDEO is now. They were notorious for a few things, but one of them was writing open source software with a lot of profanity.

I thought this was cool, and was talking excitedly about it to my boss and some of the senior devs. They were less amused. Cut 20 years later and I too am less impressed by this.

Not that I think it's *bad* per se, I'm not clutching pearls or anything. But I never find myself thinking what the code really needs are profanities in the comments. Whereas back then I thought it'd be funny/cool and went out of my way to do so when I could. Which wasn't often.

didntcheck · 3 years ago

Swearing for the sake of it does look childish, yes. I've noticed that in a few streaming TV shows, where they've gotten too excited over a lack of censorship that they just end up looking like teenagers who still think saying "fuck" is an act of rebellion

On the other hand, I'd like to write something like "this is a bit shit but will be replaced later" because that's how I naturally speak. Sanitising it to "crap" or "poor" just makes me feel like I'm teaching a youth club or something, and it is a minor pipeline stall in my train of thought while I do a mental synonym search

nomel · 3 years ago

I wonder if swearing can help "free the mind" in some way, with the "rebellion" opening up more, perhaps non-standard/out of the box, "fucking good" ideas?

seadan83 · 3 years ago

I hear this, comments generally should not draw attention to themselves. For this, short & terse win. I routinely look to cut any unnecessary words from comments.

It was the most painful code review where I asked someone to remove a joke they wrote in the comments. It was a good joke, funny, short, in good taste, I loved it, but.. distracting and unnecessary.

rfw300 · 3 years ago

I don’t think anyone is saying it’s causation, the correlation is in and of itself interesting!

bawolff · 3 years ago

I mean i think the article is implying that. However i think the bigger thing is the correlation is misleading due to the sample being the long tail of github projects, which i dont think is representitive of "production" open source projects and certainly not software in general.

MoSattler · 3 years ago

So, you're saying that my code won't improve simply by sprinkling F-Bombs everywhere?

mikrl · 3 years ago

The C code so impressive they had to remove it from K&R:

if (*some_bullshit >= shit_tolerance){

fucks_given = 0;

exit(IM_DONE);

}

bawolff · 3 years ago

Correct: fork bombs rarely help

passwordoops · 3 years ago

There's only one way to find out!

gweinberg · 3 years ago

Nobody suggested causation. The idea that you can improve code quality by adding profane comments is so self-evidently absurd that nobody would even suggest such a thing. Except you kind of just did.

zitterbewegung · 3 years ago

I would bet the opposite because I can make a blind assertion.

bawolff · 3 years ago

You're beting that people swear in code in order to impress future employers?