The most copied StackOverflow snippet of all time is flawed (2019)

I'm the author of #6 on the same list. It's definitely interesting to see it has been used thousands of times on GitHub, and who knows how many more in proprietary code. I don't think it's buggy, but I now think it could definitely be improved.

I think this shows an example of a big problem with StackOverflow compared to its initial vision. I remember listening to Jeff and Joel's podcast, and hearing the vision of applying the Wikipedia model to tech Q&A. The idea was that answers would continue to improve over time.

For the most part, they don't. I'm not quite sure if it's an issue of incentives or culture. Probably some of both. I think that having a person's name attached to their answer, along with a visible score really gives a sense of ownership. As a result, other people don't feel enabled to come along and tweak the answer to improve it.

Then, once an answer is listed at the top, it is given more opportunity for upvotes, so other improved answers don't seem to bubble up. This is a larger issue with most websites that sort by ratings. Generally they sort items based on the total number of votes, including hacker news itself. Instead, to measure the quality of an item, we should look at the number of votes, divided by the number of views. It may be tough to measure the number of views of an item, but we should be able to get a rough estimate based on the position on a page, for example.

If the top comment on a HN discussion is getting 100 views in a minute and 10 upvotes, but the 10th comment down gets 20 views and 5 upvotes, the 10th comment is likely a better quality comment. It should be sorted above the top ranked comment! There would still need to be some smoothing and promotion of new comments to get them enough views to measure their quality as well.

Such a policy on StackOverflow would also help newer, but better answers sort to the top.

jsmeaton · 5 years ago

An idea I've had for a long time is that "the community" can vote to override an accepted answer. There are many times when the accepted answer is incorrect, or a newer answer is now more correct, but the only person who can change an accepted answer is the OP.

I think community-based changes to the accepted answer would go a long way to solving your problem too, but it requires someone to be reviewing newer answers and identifying when there's another that would be more appropriate.

It'd incentivise writing newer answers to older questions. Correcting accepted answers that probably weren't ideal to begin with. A new "role" where users hunt through older questions and answers looking for improvements to make.

Stack Overflow answers are supposed to be community-based, but we unfairly prioritise the will of the original questioner *forever*. I don't think that's optimal.

irrational · 5 years ago

As a side gig I teach an intro to web development class online. Every semester I get students asking for help about why their code isn’t working. Nine times out of ten, they are trying to use some jQuery code they copied from stackoverflow because it is the accepted answer. They don’t yet know enough to recognize that it isn’t vanilla JavaScript (which they are required to use).

dotancohen · 5 years ago

  > but the only person who can change an accepted answer is the OP.

This system makes the person arguably _least qualified_ to understand the situation the single arbitrator as to which answer is accepted.

Was it the most efficient? First to answer? Copied-and-pasted right in with no integration work? Written by someone with an Indian username? Got the most upvotes? Made a Simpsons reference? Written by someone with an Anime avatar?

Breza · 5 years ago

Currently the only incentive to post a new answer to an old question is you get a special badge. That's neat but limited. I've gone through old R questions and posted answers with a more modern syntax and my answers rarely get much attention.

I'd be cautious about overriding an accepted answer. Imagine a situation where there's an easy-to-understand algorithm that's O(n^2) and the "Correct" algorithm that's O(n). If OP only has a dozen datapoints, the former might be the best answer for her specific problem, despite it clearly not being the right approach for most people finding the thread via Google in the future.

inglor · 5 years ago

They actually recently added this feature - you have a "this answer is outdated" button you can press. Note sure what the reputation threshold to see it is.

weinzierl · 5 years ago

"An idea I've had for a long time is that "the community" can vote to override an accepted answer."

I don't know if this is still a thing, but for some time in the past when an answer was edited more than a certain amount of times it automatically turned into what was called a "community wiki" answer.

zatkin · 5 years ago

Or you could just edit the accepted answer if it’s wrong? I’ve seen a few posts where the top contains an “UPDATE” that, in summary, links to another answer.

saganus · 5 years ago

One of the things that baffles me the most about SO is that I can't sort answers by _newest first_.

If I search for something related to javascript for example, I know there will be a ton of answers for older versions that I am most likely not interested in. However I can only sort by oldest first (related to date).

Old answers are definitely useful a lot of times, but the fact that there's not even the option to sort them the other way around tells me that SO somehow, at it's core, considers new answers less important.

A strange decision if you ask me, considering software changes so much over time.

If anyone has a possible explanation for this I'd love to hear it.

apnorton · 5 years ago

There are three buttons that act as sorting directions at the top of the answers section: "Votes," "Oldest," and "Active." The "Active" option sorts by most recently modified, which is _usually_ what you'd want instead of strictly newest. (i.e. an edit would update the timestamp, making that answer have a more recent activity date)

So, I guess the answer to your question of "why can't I" is "good news! you can" :)

ooOOoo · 5 years ago

This is why Stack Overflow has just started the "Outdated Answers project" in which users can set answers as outdated: https://meta.stackoverflow.com/questions/405302/introducing-...

acomjean · 5 years ago

I always thought the should have a language version. Eg python3, php7. JavaScript es6....

bachmeier · 5 years ago

> If I search for something related to javascript for example

As someone that's been learning a little JS over the last year, I quickly came to the realization that you skip over the SO links that come up in the search, and you go to one of the many other sites. I've had good luck with w3schools and mdn. SO is a lost cause for JS.

hansvm · 5 years ago

> we should look at the number of votes, divided by the number of views

Closer, but still not quite what you want probably or a few stray votes can make a massive impact just from discretization effects. What you really care about is which answer is "best" by some metric, and you're trying to infer that as best as possible from the voting history. Average votes do a poor job. Check out this overview from the interblags [0].

[0] https://www.evanmiller.org/how-not-to-sort-by-average-rating...

Matumio · 5 years ago

This isn't just a statistical problem, it's also a classical exploration/exploitation trade-off. You want users to notice and vote on new answers (exploration), but users only want to see the best answers (exploitation). The order you show will influence future votes (and future answers).

In addition, it's a social engineering problem. At least people with a western psychology seem to respond very strongly when a score is attributed to their person (as opposed to a group success like in a wiki). So you better make the score personal and big and visible, and do not occasionally sort by random just to discover the true score.

ddlatham · 5 years ago

I think that's a great example of the "smoothing" that I was alluding to, though not in a format accessible to most programmers. However it is still just using a function of upvotes and downvotes. I think true rating can be much better when you also incorporate number of opportunities to vote. Because having the opportunity to vote (by viewing an item, or purchasing it, or whatnot) and choosing not to vote is still a really useful piece of data about the quality of an item. Especially when you are comparing old items that have had millions of opportunities against new items with only thousands.

slightwinder · 5 years ago

> I think this shows an example of a big problem with StackOverflow compared to its initial vision. I remember listening to Jeff and Joel's podcast, and hearing the vision of applying the Wikipedia model to tech Q&A. The idea was that answers would continue to improve over time.

Interessting. As a random visitor this was something that never came to me from the way SO presents itself.

> For the most part, they don't. I'm not quite sure if it's an issue of incentives or culture.

I think it's more a problem of communication and UI. SO is not really the kind of site that animates people to answer or improve things. The overall design is also more technical and strange, not motivating and userfriendly.

Today for the first time I realized that there is a history for answers and an "improve"-Button that seems to allow me to change someone else answer. I only saw that because I expliciet looked for this because of this thread.

Wikipedia in the beginning was very vocal and motivating to engage all kind people to help and improve articles. SO never had that vibes for me. Additionally, it simply has not the interface that makes it simple to do this stuff. There are only this aweful comments under each answer, which are not really useful to discus an answer in all lenght and from all sides. Might be better to change them to a full fletched forum with some kollaboration editing and some small wiki-functionality or something like that.

I remember they tried to do some kind of wiki with high quality-code-parts, what happend to that?

rcthompson · 5 years ago

One of the really frustrating things about SO is that once you reach a certain rep threshold, you lose the ability to suggest edits, and instead gain the ability to just make the edits directly. I'm a lot more likely to do the former, because it helps ensure that if I actually made a mistake, it will be caught by the people voting on it. And so SO has lost out on a bunch of my suggested edits because they took away my ability to suggest edits.

analyte123 · 5 years ago

What would really help with the vision here is some way to comment and associate tests against posted code. I have corrected algorithms on Wikipedia that were obviously wrong with even a cursory test. Then people can adjust the snippet, debate the test parameters, or whatever else they need to do while maintaining some sort of sanity check. If it’s good enough for random software projects used by a dozen people, it’s probably good enough for snippets used by thousands of developers and even more users.

travisjungroth · 5 years ago

This post made me think the same thing. It would be nice to have a StackOverflow that was actually more code focused. People could write tests or code and actually run them.

cerved · 5 years ago

I always try and improve existing answers with edits. Often just adding important context when the answer is just a line of bash and adding links to source documentation.

There's very little gamification incentive to do so and often the edit queue is full. Still, there are lots of times where important caveats and information is pointed out in the comments and never added to the answer

ant6n · 5 years ago

The other day I asked a question about the c/c++ plugin of vscode, somebody swooped in to edit it to just be c++ because “c/c++ is not a programming language”. The question wasn’t answered. I wonder what’s the incentive for people to do something like that.

shkkmo · 5 years ago

> As a result, other people don't feel enabled to come along and tweak the answer to improve it.

It's worse than that. Edits have to go through a review process that is much more selective and often arbirarily rejects good edits.

matsemann · 5 years ago

Only if you're a low rep user, though. And no, many more bad edits are accepted, than good edits being rejected. By orders of magnitude.

bachmeier · 5 years ago

Editing answers is a complete waste of time. You can post a correction along with a copy and paste of the relevant section from the documentation, yet have your edit disappear without explanation.

lkrubner · 5 years ago

To correctly measure the quality of an item one needs to take something like Google's PageRank algorithm and apply it to people. That is, there needs to be some measure of the reputation of the person posting. This doesn't mean that a person who was correct in the past is necessarily correct right now, but it is true that people who are often correct tend to go on being correct, and people who are often wrong tend to go on being wrong. Careful people tend to continue to be careful, and sloppy people tend to continue to be sloppy. It's important to capture that reality and use it as a weight given to any particular answer.

L_226 · 5 years ago

Potentially a stupid question; why is it not possible to just make a MediaWiki site explicitly for SO questions? Does it exist already?

fragmede · 5 years ago

The technical cost/effort for someone like you or me to do that is minimal. The expensive part is the ongoing social maintenance fee aka moderation. As evident by the stack overflow drama re: Monica, it’s an unsolved (non-technical) problem that you could make your own mint to print money on, if you were able to fix any tiny part of it.

LoveMortuus · 5 years ago

Wouldn't a simple TTL - Time to live, solve that problem, of course with an option to see the graveyard.

This would mean that the same questions would get answered again and again over the years, but I think that could also solve the negative reputation problem of the website.

Two bird with one stone, or if you're Slovenian, two flies with one swat. ^^

ayewo · 5 years ago

For anyone else that is curious like I was, the #6 answer on that list is from 12 years ago: https://stackoverflow.com/a/140861/

macksd · 5 years ago

>> For the most part, they don't. I'm not quite sure if it's an issue of incentives or culture.

Classic example of "good is the enemy of best".

What’s wrong with a simple loop (like the one near the top)? Why does it have to branchless? Wouldn’t the IO take longer than missed branches/pipeline flushes?

Not to mention that the fixed version now has branches as well…

rkagerer · 5 years ago

Not sure why some programmers these days have aversion to simple loops and other boring - but readable - code.

Instead we have overused lambdas and other tricks that started out clever but become a nightmare when wielded without prudence. In this article, the author even points out why not to use his code:

Note that this started out as a challenge to avoid loops and excessive branching. After ironing out all corner cases the code is even less readable than the original version. Personally I would not copy this snippet into production code.

hnedeotes · 5 years ago

I'm not against using for loops when what you need is an actual loop. The thing is most of the times, previously, for loops where actually doing something for which there are concepts that express exactly what was being done - though not in all languages.

For instance, map - I know that it will return a new collection of exactly the same number of items the iterable being iterated has. When used correctly it shouldn't produce any side-effects outside the mapping of each element.

In some languages now you have for x in y which in my opinion is quite ok as well, but still to change the collection it has to mutate it, and it's not immediate what it will do.

If I see a reduce I know it will iterate again a definite number of times, and that it will return something else than the original iterable (usually), reducing a given collection into something else.

On the other hand forEach should tell me that we're only interested in side-effects.

When these things are used with their semantic context in mind, it becomes slightly easier to grasp immediately what is the scope of what they're doing.

On the other hand, with a for (especially the common, old school one) loop you really never know.

I also don't understand what is complex about the functional counterparts - for (initialise_var, condition, post/pre action) can only be simpler in my mind due to familiarity as it can have a lot of small nuances that impact how the iteration goes - although to be honest, most of the times it isn't complex either - but does seem slightly more complex and with less contextual information about the intent behind the code.

xelxebar · 5 years ago

I can't comment on the social phenomenon here, but there is indeed a decent technical argument for avoiding for loops when possible.

In a nutshell, it's kind of like "prinicple of least priviledge" applied to loops. Maps are weaker than Folds which are weaker than For loops, meaning that the stronger ones can implement the weaker ones but not vice-versa. So it makes sense to choose the weakest version.

More specifically, maps can be trivially parallelized; same for folds, but to a lesser degree, if the reducing operation is associative; and for-loops are hard.

In a way, the APL/J/K family takes this idea and explores it in fine detail. IMHO, for loops are "boring and readable" but only in isolation; when you look at the system as a whole lots of for loops make reasoning about the global behaviour of your code a lot harder for the simple reasone that for-loops are too "strong", giving them unweildy algebraic properties.

chousuke · 5 years ago

Very often processes are naturally modelled as a series of transformations. In those cases, writing manual loops is tedious, error-prone, harder to understand, less composable and potentially less efficient (depending on language and available tools) than using some combination of map, filter and reduce.

dragonwriter · 5 years ago

> Not sure why some programmers these days have aversion to simple loops and other boring - but readable - code.

Like goto, basic loops are powerful, simple constructs that tell you nothing at all about what the code is doing. For…in loops in many languages are a little better, but map, reduce, or comprehensions are much more expressive as to what the code is doing, but mostly address common cases of for loops.

While loops are weakly expressive (about equal to for…in), but except where they are used as a way (in language without C-style for loops) but there is less often a convenient replacement.

BrandoElFollito · 5 years ago

Disclamer: amateur developer for 25 years, no formal education in that area

a loop that iterates over indices when I want elements is not readable, e.g. I prefer

    for element in elements:

rather than

    for (i = 0 , i < len(elements), i++) { element = elements[i] ...

This is maybe where this aversion comes from, people usually [citation needed] want to iterate over elements, rather than indices.

nvarsj · 5 years ago

Yes, this plagues JDK8+ code. Every fashionable Java coder has to use an overly complex, lazy stream vs a simple loop in every case.

MauranKilom · 5 years ago

The irony is that a single log computation is going to take longer than the loop. (No idea if implementing a log approximation involves loops either.)

slavik81 · 5 years ago

https://code.woboq.org/userspace/glibc/sysdeps/ieee754/dbl-6...

I don't see any loops, but there are a number of branches. The code could probably be generalized using loops to support arbitrary precision, but I think any optimized implementation for a specific precision will have unrolled them.

bottled_poe · 5 years ago

Sounds like textbook example of when theory is misaligned with reality.

kortex · 5 years ago

Waiting for someone to post some fast-inverse-sqrt-esque hack to compute the logarithm. Although in Java that's probably not likely to be faster.

I wonder how fast it'd be to convert to string and count digits.

tzs · 5 years ago

Many architectures include a logarithm instruction. Does Java use that if available? Would it make a difference?

Deleted Comment

nn3 · 5 years ago

Besides log()'s implementation is certainly not branch-less.

It's the ostrich approach: if you don't see the branches they don't matter.

ceronman · 5 years ago

Simplicity FTW. The simple loop version is very easy to understand. It's probably really fast, as it's just a loop over seven items. And more importantly it's more correct. It doesn't use floating point arithmetic, so you don't have to worry about precision issues.

The logarithmic approach is harder to reason about, prone to bugs (as proven by this post). I'm baffled at the fact that tons of people considered it a more elegant solution! It's completely the opposite!

xxpor · 5 years ago

the original version had branches too, in fact a majority of the lines had them! ? is just shorthand for if.

enedil · 5 years ago

This isn't true, this form of conditionals can be compiled into cmov type of instructions, which is faster than regular jump if condition.

kruczek · 5 years ago

Exactly. As the article itself mentions:

> Granted it’s not very readable and log / pow probably makes it less efficient

So, the "improved" solution is both less readable and probably less efficient... where is the improvement then?

sixothree · 5 years ago

If it were me in my programming language, I would just use Humanizr and be freaking done with it.

xfer · 5 years ago

The real question is why is it a bug to report 1 mB instead of 999.9 kB for human readable output? It seems like a nice excursion to FP related pitfalls, but i don't think this is a problem to get entangled in that.

Groxx · 5 years ago

Because it doesn't print 999.9 kB or 1 mB.

It prints 1000.0 kB.