StackOverflow petition to allow removing AI generated content

Either answers are good or not.

It doesn't matter if they're generated by a 13-year-old in their bedroom, someone studying CS at university, a well-respected IC at a top tech company... or an AI.

If answers are good, keep them. If they're bad, downvote them. If they're redundant or off-topic or gibberish, delete them.

And to those asking why you would ever want AI-generated content on StackOverflow when you could just go to ChatGPT/Copilot/etc... it's because of all of the commentary. People are giving different answers, arguing the pros and cons, pointing out errors... all of the discussion around a StackOverflow question is usually just as valuable as any given answer, if not more so.

Of course, I understand that AI-powered accounts that can ask/answer hundreds of questions a minute are a problem simply because moderation can't keep up with them. But StackOverflow already has a lot of limits just for human accounts -- lots of actions you can't take until you've contributed certain amounts of value. Just extend these protections to do things like rate-limiting and so forth, to ensure that the normal ratio of content submitted vs. moderated stays constant and manageable.

koochi10 · 2 years ago

This doesn't work at scale. Stack overflow as a platform has been handling user generated input via moderators, voting, and testing. This is fine when there are only 26.8 million coders on the planet, most of which aren't posting on stack overflow regularly. With LLM's all of a sudden there is a huge influx of mediocre content on the platform that people can't handle. Inevitably this will erode trust in the platform. When someone posts a answer I assume they actually ran the code, and can verify the result. LLM's can spit out seemingly correct code that just doesn't work.

jsheard · 2 years ago

> This doesn't work at scale.

See also: the Clarkesworld saga of them being bombarded with mediocre AI-generated short stories. Filtering out bad submissions has always come with the territory, but they're suddenly drowning in them with the advent of LLMs which make it trivial to churn out vaguely story-shaped text on an industrial scale. The generated content isn't good by any measure, but it's "good enough" to pass the smell test and waste a curators time before they realise it has zero actual merit, and there's so much that it becomes a sisyphean task to sort through it.

Likewise with image generation, it's now incredibly easy to churn out images that look like something a person might make to express themselves, which are actually just a loosely guided slice through a statistical model of pre-existing images, passing the smell test for "good art" despite having zero actual intent or substance. It's spam, but for culture.

yyyk · 2 years ago

>>If answers are good, keep them. If they're bad, downvote them.

>This doesn't work at scale... with LLM's all of a sudden there is a huge influx of mediocre content

The GP's answer may not work at scale - however LLM detection doesn't work at all. So the only semi-workable solution is aggressive filtering and banning users who post trash (LLM or not).

Also, there's a need to think about score and trust mechanisms - the same mechanisms which can be used for filtering also provide an incentive for LLM use, is there a way to avoid that?

>When someone posts a answer I assume they actually ran the code, and can verify the result

I wish we lived in a world where this assumption wasn't naive.

crazygringo · 2 years ago

Again, you can handle this by rate-limiting and standard anti-abuse measures. To elaborate: don't allow new-ish accounts to post more than one question/answer per day, don't allow allow accounts to more than one question/answer per week/month if their previous content hasn't reached a certain quality threshold of votes, and so forth.

It's entirely possible to set up the system to prevent it from being flooded by content that moderation can't handle. In fact, StackOverflow has already been largely set up that way, and this will just require just a little more tweaking of the types of existing policies that have already been in place for a long time. People attempting to flood internet forums with low-quality content or outright spam isn't anything new.

rpastuszak · 2 years ago

Exactly, it’s not that useful if the answer I’m looking for exists on the platform but I can’t find it because of the signal vs. noise ratio. To me, usually, the context of the answer is more important than the answer text itself.

kristofferR · 2 years ago

How are they going to check for LLM usage?

I think it's way more likely that poor answers won't mention the usage of LLM's to generate the answer, while good answers aided by LLM's will more often mention it.

Punishing honesty just seems incredibly counterproductive.

Automatic detection is downright dystopian... being censored by an algorithm because it mistook my effort and work for a LLM.

wnevets · 2 years ago

> This doesn't work at scale.

Sounds like a job for AI

nologic01 · 2 years ago

> If answers are good, keep them. If they're bad, downvote them. If they're redundant or off-topic or gibberish, delete them.

Yeah, lets us keep providing free labor to help train somebody else's models, improve somebody else's infrastructure so that they can even more effectively dominate.

LelouBil · 2 years ago

I think you missed the fact that you are mainly doing this to make other users find relevant answers, on a free website.

Or maybe you would want to pay for a version of stack overflow that is only curated by employed people ?

But I guess you would lose much content.

Me1000 · 2 years ago

How is that any different than providing free labor to help someone learn how to build the next [X] so that they can more effectively dominate?

But more importantly banning AI generated content from Stack Overflow doesn’t solve the “problem” you’re describing.

matsemann · 2 years ago

If you as an answerer uses an LLM to generate content, then verify and vet it yourself based on your own knowledge before posting, I'd think it's fine.

But spamming thousands of answers an hour automatically and wanting the community to do all the work is just not sustainable I feel. It'll also kill the sense of community if half the actors are bots.

NotTheDr01ds · 2 years ago

Agreed - That's the basis of my "responsible use of AI on SO" post at https://meta.stackexchange.com/a/389675/902710

bastawhiz · 2 years ago

> If answers are good, keep them. If they're bad, downvote them. If they're redundant or off-topic or gibberish, delete them.

In practice this isn't possible. Lots of accepted answers are bad. Often for subtle reasons! An answer with a SQL injection vulnerability might get plenty of upvotes and be accepted, but it's objectively bad (even if it answers the question).

The problem is that there is no AI that's accurately fixing answers. AI only generates mediocre answers, it doesn't have the capacity to moderate mediocre answers. Humans simply can't keep up, or don't have the acumen to pick up on the subtle inaccuracies in accepted-but-bad answers (after all, that's often why they're looking for the answer).

Even with protections like rate limiting, I'm not sure you could prevent the majority of the damage that crappy AI can accomplish. Simply paying (pennies) for proxy servers with residential IPs gets around much of that, anyway.

x86x87 · 2 years ago

haha. no.

answers are not either good or not. it's not binary. there is a lot of nuance and sometimes the questions and followups contain details that can take the answers in a completely new direction.

just telling if an answer is good or not is a lot of work. sometimes it requires and expert to figure it out. when the answers are given by a human and it's a good faith effort both people in the loop benefit from it. when the BS generation is automated there is zero incentive for a human being to even look and correct the answers. what is the incentive to do so?

zuiper · 2 years ago

> when the BS generation is automated there is zero incentive for a human being to even look and correct the answers. what is the incentive to do so?

You hit a good point here. If users can't be bothered to spend time, effort, energy and cognition into answering a question, why should the readers and correctors do so?

sholladay · 2 years ago

The volume is the real problem, as you mentioned. People are worried about AI content because it’s hard to even review it or accurately detect it. I can’t remember the last time I looked at my email spam folder, even though I’m sure there are some useful messages in there, mostly because it would take too much time to go through it. If the inbox starts being filled with AI content that isn’t quite spam, it might start feeling like a chore to go through too, again mostly because of the volume. We will have to see if the existing mechanics of unsubscribe, report, downvote, etc. will be sufficient to hold back the tsunami.

My guess is the spam detection arms race will evolve into a content moderation arms race and we will end up with advanced AIs filtering and moderating content for us, to varying degrees of success and things like email, forums, and Q&A sites will become increasingly hands off.

raman162 · 2 years ago

Agreed. I have found myself sometimes asking Chat-GPT for guidance related to obscure error messages and occasionally its more useful than google or stack overflow.

rafark · 2 years ago

When it doesn’t hallucinate, which in my case it’s most of the time.

GrqP · 2 years ago

I generally agree if the content can be kept clean. Certainly, human users will have more to contend with.

There are just so many interesting questions by having chatbots interact with StackOverflow:

What happens when AI-powered accounts are allowed to vote?

What kind of questions might AI-powered accounts be asking or be interested in?

How will the questions/answers/responses change between model revisions?

You can imagine things spiraling out of control. Perhaps there needs to be a chatbot version of StackOverflow to service all the questions that interest chatbots. :P

rich_sasha · 2 years ago

I guess the premise is that it is easier to identify AI-generated content, with a low hit rate (?) of correctness, than to identify correct / incorrect answers based on merit alone.

Also StackOverflow is kind of gamified (a rare case of it working IMHO) and the rules of the game don't work so well when kind of good looking content is easy to generate. SE answers are hard to verify but easy to write. If writing becomes too easy, it is a recipe for spam - as has indeed happened.

Ben: And then, April 29th, I was, again, sitting in the agency, doing my thing, and I also had my personal email open, and I suddenly got an email from Jeff Atwood. But it said, “Who should work for Stack Overflow?” That was the subject line, and then the body of the message said, “You should,” and that was basically it. And that was kind of like this jaw-dropping thing, because from that moment on, it was this, “Okay, maybe the skill that I have is actually marketable,” kind of moment.

I used to be heavily active in curating SO, and yes, there is an incredibly strong sense of alienation from the corporation (which is what drove me away).

In the old days, most of the staff, from devs to management to the CEO, were active users of the site and hung out on Meta and in chat. They were easy to reach and happy to answer questions; and whenever they made major changes to the site they would go to Meta to ask for feedback and adjust their plans accordingly.

There was a noticeable shift in this dynamic starting around ~2016. Around this time, the company stopped focusing development effort on the core site functionality, and instead prioritized side products & attempts to monetize the site (most of which failed). Feature requests on Meta were almost completely ignored, and site features that had been on previously-announced timelines/roadmaps were never delivered. But the community was still as strong as ever; and so people started implementing these missing features themselves in the form of bots and userscripts. This was the "golden age" of moderation bots, and a really fun time to be a part of the community -- power users ran heavily modified frontends that could display all kinds of additional information and automate repetitive actions, and integrate with bots to do things that Stack Exchange's systems were bad at -- like flagging spam and low-quality posts; identifying plagiarism; and detecting flamewars in comments.

As the company grew, they hired a ton of middle management who were not active participants in the site, and largely did not care about the day-to-day. This was alright when they left us alone, but around 2018-2019 they began to take an openly hostile stance towards the "power users". Here's an excellent post from that time summarizing the general sentiment: https://meta.stackexchange.com/questions/331513

The short version is: the company began blaming power users for things like the site's "unwelcoming" reputation (which is really a symptom of the site's outdated and opaque moderation tooling, and power users had been clamoring for better tools for years). They began a pattern of rolling out features and UI changes that took major steps backwards in usability and accessibility -- and due to all the negative feedback these changes received on Meta, they announced staff would no longer participate on Meta because it was "too negative". A high-level manager famously quipped that the opinion of Meta was not relevant as it represented "0.015%" of Stack Exchange's userbase -- despite the fact that that 0.015% was responsible for the majority of content & moderation activity contributed to the site.

In late 2019 it got a whole lot worse when, in rapid succession: 1) the company updated the site terms-of-service to illegally change the license of user-submitted content, and 2) an employee abruptly revoked a volunteer's moderation privileges without due process, and then went to news outlets making false accusations about that user's behavior (https://meta.stackexchange.com/questions/333965). Shortly after that, Stack Overflow fired several well-loved and highly respected staff moderators, for undisclosed reasons (https://meta.stackexchange.com/questions/342039/). A lot of people, including myself, left the community in the wake of this.

Since then -- at least from my outsider perspective of checking in once in a while to see what's going on -- it seemed for a while like the company was learning from its mistakes.They apologized for ignoring Meta, began asking for and listening to community feedback once again, created new policies to protect volunteers from the kind of abuse that happened in 2019, and began implementing some of those long-ignored and long-overdue feature requests. But in 2021 the company got bought out by a VC firm that is even more aggressive about trying to monetize the site; they started cranking up advertising and pushing generally unwanted side products, but they mostly left the community alone.

That brings us to generative AI. As soon as ChatGPT came out, a deluge of users began copy/pasting Stack Overflow questions into ChatGPT and copy/pasting its answers into the answer box (usually with no editing or fact-checking effort). General consensus among the community seems to be that ChatGPT produces wrong or unhelpful information to an unacceptable degree, and that allowing machine-generated content on Stack Overflow defeats the purpose of the site (you go to ChatGPT if you want answers from a machine, but you go to SO if you want answers from a human). The staff supported this consensus and made it official policy -- but at the same time the CEO kept making rambling blog posts about how "AI is the future about Stack Overflow" and launching sweeping initiatives within the company to do...AI related things? Nobody really knows what he's talking about.

That all leads up to the events that triggered strike: out of nowhere, the company suddenly announced a few days ago that it was overruling previous community consensus and prohibiting users from deleting content on the basis of being AI-generated.

adamgordonbell · 2 years ago

Wow, great summary. I did not know about the relicensing bit. That seems problematic!

softwaredoug · 2 years ago

Before jumping to conclusions, be sure to check out this context:

https://meta.stackexchange.com/questions/389582/what-is-the-...

TL; DR - the policy is “we can’t tell if something is AI generated, so hard to justify removal on just that basis.”

> We recently performed a set of analyses on the current approach to AI-generated content moderation. The conclusions of these analyses strongly indicate to us that AI-generated content is not being properly identified across the network, and that the potential for false-positives is very high. Through no fault of moderators' own, we also suspect that there have been biases for or against residents of specific countries as a potential result of the heuristics being applied to these posts. Finally, internal evidence strongly suggests that the overapplication of suspensions for AI-generated content may be turning away a large number of legitimate contributors to the site.

minimaxir · 2 years ago

This follows from the recent case where Texas A&M students were falsely accused of using ChatGPT and subsequently failed, which caused a ton of negative publicity: https://www.rollingstone.com/culture/culture-features/texas-...

whatyesaid · 2 years ago

I would've thought the problem is bots that are mass solving instead of human + ChatGPT.

If a bot mass solves you can identify it by posting frequency and quality over time.

fabian2k · 2 years ago

SE has not provided the data here, and what they communicated is a mess. Some of their main arguments are rather dubious and didn't convince the mods.

Keep in mind that moderators have more than just the pure content of the answers available. This is usually not about a single post, but many of them.

isoprophlex · 2 years ago

I don't understand why you would, for the forseeable future, want to allow AI generated content on SO. Not with the current state of the art in generative AI.

If I want an AI generated answer, with all the pros and cons specific to LLMs, I'll just open chatgpt or turn on copilot... SO answers (used to be) in an entirely different league in terms of trustworthiness.

I'm totally on board with this moderator strike.

Edit: it appears there is a very high false positive rate, making it difficult to distinguish between some answers. So, there's that to consider...

lordnacho · 2 years ago

If someone has already put in the right prompt to get the best answer, you will now not have to do that.

Also, SO isn't just about getting the right answer, it's also about finding the pros and cons of each proposed solution.

There could still be value, if ChatGPT is right 60% of the time, there is still value in someone filtering that down to the correct answers.

If the code works just fine, why remove it? For small algorithms it should be good enough.

"Code working" isn't necessarily black-and-white. For a new user (the one asking the question), the code may appear to solve the problem, but may have corner-cases or even security risks. That's entirely possible with user-generated code as well, of course, but GPT/AI allows it to be produced at a much higher rate, with the person who posted the answer often not being capable of (or not caring to) validate or correct it.

dlivingston · 2 years ago

What's the motivation for Stack Overflow to allow AI-generated content in the first place?

These AI models were likely heavily trained on SO data, so any LLM-based answer is merely a regurgitation of thousands of human answers before it.

In addition, asking a question on Stack Overflow and having some LLM respond seems to me like the equivalent of asking GPT-X.Y directly, albeit with extra steps.

cosmolev · 2 years ago

There is a possibility that everything we output is merely a regurgitation of thousands of human answers, ideas, and thoughts we have encountered before. Including this comment of yours. And mine.

dvt · 2 years ago

There's this new trend of what I'll dub techno-nihilism, which is essentially a counterargument to the stochastic parrot argument. The former being: well what if WE are stochastic parrots, after all that's how we learn, right? Well yes, but actually no.

It's trivially false because ChatGPT was trained on something (in this case, Stack Overflow), which, in turn was trained on something else (maybe a book), and so on. So knowledge, imagination, and genuine creativity must exist somewehere down that chain. Everything can't be just repeating what was learned prior ad infinitum, or we'd have nothing new. Ironically, even the development of large language models is an exercise in creativity.

endisneigh · 2 years ago

I hate this sort of retort since it’s fundamentally meaningless. All atoms that will ever exist were in the singularity and exploded during the big bang to encompass all of the universe, and so we are all just moving along. Ok, so what.

omnicognate · 2 years ago

A possibility that every idea necessary to put the James Webb Space Telescope in place and analyse the data it collects from the universe's earliest and most distant galaxies was already present thousands of years ago, in fact must have somehow been built in to the first humans? I don't think there is.

joshuanapoli · 2 years ago

I'd guess that valid human-authored submissions were being rejected by moderators because they appeared to the moderator to be AI-authored. Moreover, there is probably a significant grey area where a human selects among multiple AI answers and refines them.

As a user of Stack Overflow answers, I don't care too much about who the author was. I do value the voting, comments and community on Stack Overflow, since they add confidence and color to an answer.

jatins · 2 years ago

Why is that a bad thing? People do ask very similar things on Stackoverflow so if an AI can answer that, let it!

lelandfe · 2 years ago

Probably scaling moderation to account for increasingly wrong and right-looking answers.

Deleted Comment

noirscape · 2 years ago

Not surprised - the universal issue with AI generated content is that it's near impossible to curate (not helped by what we'll call... a certain eagerness of its advocates to share the output of these tools) and often is wrong in imperceptibly tiny ways.

lukev · 2 years ago

This just makes obvious sense.

If someone wants a GPT-generated answer, they can use ChatGPT themselves.

If I'm bothering to post my question to a forum, the social contract is that I'd like a response from a human.

Anyone around who moderators on SO? Is there a general sense of alienation from corporate SO?

My understanding is that in the early days, a lot of the devs at SO were actually recruited from the SO and Meta moderation userbase but probably that doesn't scale.

Example:

https://corecursive.com/stack-overflow/

wordcloudsare · 2 years ago

Not a mod but mods have complained publicly about the sense of isolation for years

NobodyNada · 2 years ago

raincole · 2 years ago

I'd propose a different solution. For every single new question, auto-generate an answer with GPT and mark it as AI-generated.

At least the generated responses won’t start by asking me “why in the world would you want to do that?”

mediumsmart · 2 years ago

You nailed it.