33-46% of workers on MTurk used LLMs in a text production task

bravura · 3 years ago

So mechanical turk, which was a robot that actually contained a human, is actually a robot that contains a human that contains a robot.

We have achieved complete ouroboros.

sangnoir · 3 years ago

> We have achieved complete ouroboros.

That's a generous / romantic take. For me, the Human Centipede of online text generation is now a complete circle

neilv · 3 years ago

Awful, but strangely appropriate. Considering all the "I don't care about quality; just dump a pile of text" uses we're seeing.

nashashmi · 3 years ago

We call that reinforcement.

It’s like actors in movies copy real world people who copy actors in movies.

jeron · 3 years ago

RLHBARF - Reinforcement Learning from Human But Actually Robot Feedback

ry4nolson · 3 years ago

Q: Does art imitate life or does life imitate art?

A: Yes

dv_dt · 3 years ago

Watch some videos of various factory production lines, or ag planting for example. Machines that feed stuff to humans that may feed more machines.

yetanotherloser · 3 years ago

There's a great, if rather old, word for these machines: "tools".

bhuga · 3 years ago

Or perhaps "ourorobos" :)

cscurmudgeon · 3 years ago

And the final robot has millions-billions of humans fused using linear algebra.

Terr_ · 3 years ago

And the humans are trillions of tiny nanobots (after a there was a "Grey Goo" event billions of years ago) that formed a hivemind that isn't even aware of all its own parts. :P

treprinum · 3 years ago

It's robots all the way down!

awinter-py · 3 years ago

no it is only an ouroboros if biting its own tail -- this is more of a cyborg centipede situation

0x000xca0xfe · 3 years ago

Just wait until the LLM extensions are better. The future looks like this:

1. You use MTurk to brainstorm new product ideas

2. The human figures out your company and tasks the LLM with the brainstorming

3. The LLM hits you up for a chat about your company's future :)))

cyanydeez · 3 years ago

true orourboros comes when it's undetectable which level of the turtle you're on.

dxbydt · 3 years ago

33-46% of students in the local high school used LLMs in a text production task. I mean, they actually did.

[1] https://www.fastcompany.com/90841387/gpt-3-chatgpt-high-scho... [2] https://www.theatlantic.com/technology/archive/2022/12/opena... [3] https://www.nytimes.com/2023/01/12/technology/chatgpt-school...

AndrewKemendo · 3 years ago

The Turing Syndrome speeds up!

Turing Syndrome (Like the Kessler syndrome) is when the amount of AI generated data on the internet surpasses human generated content to the extent that it will eventually make it impossible to distinguish between the two

somenameforme · 3 years ago

I think this is probably a poor metric for it. Mechanical Turk is full of people trying to race their way through surveys to make a few pennies. So there's a motivation to just create some output that can't be automatically filtered out, like missing a 'please select D as the answer for this question' attention test, while otherwise just going as quickly as possible. It's a dream scenario for any sort of automation, no matter how crude. It's one step above MMO gold farming.

Tangentially related, Mechanical Turk and its various less well known clones which come down to the exact same thing, are increasingly the status quo for social science studies. They keep making really shocking discoveries with like 99.99999% statistical certainty given the huge sample sizes these services enable. Kind of weird how often they fail to replicate.

mike_hearn · 3 years ago

Sociology's reliance on MTurk is absurd but most professionally run polls are now the same thing, just dressed up to look nicer. This is why polls keep reporting lots of weird results which are then taken as absolute fact by researchers and pundits.

I saw one the other day from a professional polling firm in which exactly 7% of "the public" said they had attended a protest, no matter what the protest was about. They asked about six or seven things people might protest about, and for every single one, 7% of the panel said they'd been on such a protest. Taken at face value it led to nonsense like concluding millions of people had attended protest marches about CDBC and 15 minute cities.

Unfortunately, the panels these firms use are extremely unrepresentative of the public but are advertised as perfect proxies. People then accept this claim unthinkingly. It's a problem.

AndrewKemendo · 3 years ago

The fact that MTurkers are paid makes them unreliable for reflecting genuine human responses.

That they can’t replicate indicates that the sampling population isn’t consistent representation of population demographics and so there’s no real signal there that can be applied to a general population like most failure to replicate studies

z3c0 · 3 years ago

Or maybe the complete opposite, as it will only be learning patterns created by AI, producing less variance at higher temperatures, ultimately overfitting onto the same bland jargon.

Rebelgecko · 3 years ago

Neal Stephenson made some interesting predictions about this ~a decade ago

Edit: Anathem actually came out 15 years ago

kipple · 3 years ago

I forget, how does this come up in Anathem?

I remember similar notions in his novel Fall — where the internet is so full of fake news and sponsored content that people need additional services to filter out useful information

jvanderbot · 3 years ago

Side topic: Anathem is actually my favorite book of his. I feel like it's under-represented in a list of top NS books.

kristianp · 3 years ago

Did you just coin that? Looks like previous instances on google were trying to say tourettes syndrome.

I like the phrase, I have been wondering about this problem. Future web crawls are going to contain ever increasing amounts of gpt generated content.

A similar problem is who is going to use stack overflow when an llm can do a better job for simple problems?

AndrewKemendo · 3 years ago

Yes I did, though I thought I originally posted it on Twitter or HN long ago.

It’s on the internet somewhere from me at some point previously

Deleted Comment

koochi10 · 3 years ago

^^ This comment was generated by A.I.

shmatt · 3 years ago

This is more like cousins having a baby with 3 nostrils

gibolt · 3 years ago

And a few extra fingers

adventured · 3 years ago

That's not how it's actually going to play out.

As the AI generated content becomes dramatically overwhelming in scale, the human content will become increasingly easy to spot (and there will be multiple cues to the human content that make it fairly obvious). There will be a crossing of the two along the way, in regards to the amount of content generated, a relatively brief time where it will be difficult to tell which is which.

The more AI content there is, and the more it advances, the easier it's going to be to play spot the human. The time in which it'll be hardest to tell them apart, will be in the middle frames rather than in the later stages.

jerf · 3 years ago

"As the AI generated content becomes dramatically overwhelming in scale, the human content will become increasingly easy to spot (and there will be multiple cues to the human content that make it fairly obvious)"

I feel like you're not accounting for the amount of that AI content that will be deliberately and intelligently intended to masquerade as human. Every signal you can think of and may start using is a signal that intelligent humans can and will forge. And they'll get the ones you didn't think of either, because that's their job.

I don't even have to hedge or qualify this prediction, because we have decades of experience with people already forging every signal that is technically possible to forge, and pouring very substantial effort into doing so, right up to having dark businesses that provide these things as a service. "Forge all the signals that are technically possible to forge" is a product you can buy right now. For example, from 13 days ago: https://news.ycombinator.com/item?id=36151140

I don't know what signals you think you're going to see, but I'll guarantee A: yes, you will indeed see them because there's a lot of degrees of competence in the use of these systems (I still periodically get spams where the sender sent out their template rather than filling it in with a mail merge), but B: those will just be the ones you notice, not the entire population.

Rather than a binary decision, I like to measure with "how much information does it take for me to detect a forgery". For instance, things like modern architecture renders can absolutely fool me as being real at standard definition, but at 4K they still struggle (too precise, too clean, even when they try to muss it up on purpose). I need a few paragraphs of GPT text to identify it conclusively, and that's just the default tone it takes. Ask it to give you some text in a specific style and I don't know how much it would take. For all I know I'm already missing things. You're probably already missing things you didn't realize.

tjoff · 3 years ago

Like Kessler though the problem isn't to detect it but to get anything through. We might not see human content for a few decades before we've sorted it out.

317070 · 3 years ago

Because the human data will become "stupid" in comparison, is that the reason?

I don't understand why spot-the-human will become easier otherwise.

jmount · 3 years ago

That is hilarious, and it utter defeats one of the quality checks on MTurk style tasks: using agreement as a proxy for correctness.

famouswaffles · 3 years ago

There's the paper that shows ChatGPT outperforming MTurk Crowdworkers and another one showing GPT-4 going to to toe with experts (and blowing past crowdworkers) who set the benchmarks they were evaluating.

https://arxiv.org/abs/2303.15056

https://www.artisana.ai/articles/gpt-4-outperforms-elite-cro...

If quality is the issue then rest easy lol. The age of state of the art artificial Natural Language Processers being obviously inferior to Human Performance has come and gone.

Choco31415 · 3 years ago

You could potentially introduce an occasional prompt and check the answer against a few LLMs to see if they match.

adventured · 3 years ago

Good LLM outputs are typically not the same for every identical query. For now that makes AI checkers fairly incompetent (and leads to disastrous teacher results when trying to find the students using AI).

If I ask even just GPT 3.5 something like: "Who was John Adams?" - it'll give me a slightly varied answer pretty much every single time (even if I modify its settings to make it less creative with responses, it'll still usually vary a bit).

Here is a simple API hit on gpt-3.5-turbo-16k-0613

Output 1) John Adams was an American statesman, lawyer, diplomat, and Founding Father who served as the second President of the United States from 1797 to 1801. He was one of the key figures in the American Revolution and played a crucial role in drafting the Declaration of Independence. Adams also served as the first Vice President under George Washington. He was known for his strong advocacy of republicanism and his belief in a strong central government. Adams was a prolific writer and his letters and writings provide valuable insights into the early years of the United States.

Output 2) John Adams was an American statesman, lawyer, diplomat, and Founding Father who served as the second President of the United States from 1797 to 1801. He was one of the key figures in the American Revolution and played a crucial role in drafting the Declaration of Independence. Adams was also a strong advocate for the separation of powers and a strong central government. He was known for his intellect and commitment to public service.

And so on.

flangola7 · 3 years ago

How's would you check that? What if they're using a self hosted LLM?

CoastalCoder · 3 years ago

Out of curiosity, what happens to an LLM if, over time, it's increasingly trained on its own output and/or that of other LLMs?

And has anyone figured out a good way to minimize that when sourcing training data from e.g. Reddit?

pjmorris · 3 years ago

I read (probably on HN) where someone suggested that pre-LLM data will become the 'low background steel' (pre-Nuclear-age steel with lower levels of internal radiation) of machine learning.

It seems like provenance/traceability of data sources will become important, both for the builders and users of LLMs.

verytrivial · 3 years ago

There's a no-longer-very-speculative bit of Sci-Fi waiting here about whole societies "forking" based upon the curation of their training data. Arguably social-media + bots + paid state actors means we're already there.

lozaning · 3 years ago

Habsburg AI – a system that is so heavily trained on the outputs of other generative AI's that it becomes an inbred mutant, likely with exaggerated, grotesque features

from https://twitter.com/jathansadowski/status/162524580321127219...

samstave · 3 years ago

Or even, the Zuckerberg of AI in a meta-sorta-way''

This was pretty clever:

>CaligulAI <-- @Rogntuudju

jmount · 3 years ago

https://news.ycombinator.com/item?id=36319076

gnicholas · 3 years ago

Hm, sounds like dog-fooding, but where the 'food' is dog shit. Dog-shitting? Dog-shit-fooding?

gpderetta · 3 years ago

Dog breakfasting

TheSpiceIsLife · 3 years ago

Dogenshittification

jstarfish · 3 years ago

The Poopboros.

Daishiman · 3 years ago

AI Centipede

lordnacho · 3 years ago

dogging

hospadar · 3 years ago

I certainly have no idea but interesting to think about! One could argue that humans are trained on their own output so presumably interesting things could arise, especially if it's not a totally closed loop (LLM#1 -> some human curation -> LLM#2).

fladrif · 3 years ago

> humans are trained on their own output

And it seems we're emulating that same cycle. Humans are trained not only on our own output, but also take changes from our environment. So in this sense LLM's environment would be human input.

veave · 3 years ago

Lately I have been trying a new LLM called Falcon that was supposedly created in the UAE but it was mostly trained on conversations with ChatGPT... the result is that if you ask Falcon who created it, it will happily say OpenAI. I found that really amusing.

whimsicalism · 3 years ago

The instruct trained variation, you mean.

spacemanspiff01 · 3 years ago

I believe that performance degrades, as the models end up training on data that is more homogenized and less diverse when compared to original data.

At least according to this paper (I think, FYI I am not a expert) https://arxiv.org/abs/2305.17493

Balooga · 3 years ago

"Model Collapse"

https://venturebeat.com/ai/the-ai-feedback-loop-researchers-...

Aerbil313 · 3 years ago

I read a research paper which says LLMs don’t actually improve by training on the outputs of other LLMs. They only gain the ability to answer the specific questions included in the LLM output training dataset. Their reasoning capabilities etc. don’t improve.

Aerbil313 · 3 years ago

I’m probably wrong. See the Orca model which came out recently.

avereveard · 3 years ago

Possibly such data will be used in a different way. If it's embedded in a social network, engagement (votes, retweets,wwillhstecer) will act as a sort of crowd sourced reinforcement learning source instead of being direct part of the training set.

cj · 3 years ago

Perhaps the source training data will shift to transcripts or subtitles of podcasts, cable television, TV shows, and other sources that haven't (yet) been polluted.

Combined with pre-LLM data sets.

the8472 · 3 years ago

Training is becoming multi-modal anyway and will need grounding in physical reality so human-generated text will most likely become a smaller fraction of the data.

asow92 · 3 years ago

Hopefully some of this is democratized away by humans voting on quality of output directly and/or indirectly?

Deleted Comment

MengerSponge · 3 years ago

Have you seen Multiplicity?

"Hey Steve, come on up, I'm spittin on bugs"

Deleted Comment

jerpint · 3 years ago

It can reinforce its own biases for one

Mistletoe · 3 years ago

You’ve heard it at a concert when the microphone picks up sound from the speakers and a positive feedback loop ensues. A lot of screeching.

xwdv · 3 years ago

Context drift.

remote_phone · 3 years ago

What will happen is that it will stratify people even more and income inequality will become even more pronounced.

The educated people will learn how to write on their own and the less educated will use LLMs and never rise above it. That will be the McDonald’s workers of the Information Age.

This is the future that I’m preparing my kids for so that they land on top. I’m investing heavily in creativity and writing for them so that they will land in the upper levels of society and not the lower ones that become slaves to AI.

WesternWind · 3 years ago

A friend was joking that LLMs explain star trek's obsession with the 20th and 21st century.

antaviana · 3 years ago

Funny because I remember in an AWS conference that the presenter touted Mechanical Turk as AAI (Artificial Artificial Intelligence) because it was a fake AI service done by humans, and it seems that it will become soon AAAI (Artificial Artificial Artificial Intelligence, or A3I).