This keeps me up at night too. I’d like to stake the position that LLMs are antagonistic to the (beleaguered) idea of an internet.
LLMs increase the burden of effort on users to successfully share information with other humans.
LLMs are already close to indistinguishable from humans in chat; Bots are already better at persuading humans[1]. Suggesting that users who feel ineffective at conveying their ideas online, are better served by having a bot do it for them.
All of this, is effectively putting a fitness function on online interactions, increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner. The only steady state I can envision is where we assume that we ONLY talk to bots online.
Free speech and the market place of ideas, sees us bouncing ideas off of each other. Our way of refining our thoughts and forcing ourselves to test our ideas. This is the conversation that is meant to be the bedrock of democratic societies.
It does not envisage an environment where the exchange of ideas is into a bot.
Yes yes, this is a sky is falling view - not everyone is going to fall off the deep end, and not everyone is going to use a bot.
In a funny way, LLMs will outcompete average forum critters and trolls for their ecological niches.
> increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner
We are at the stage where it’s still mostly online but the first ways this will leak into the real world in big ways are easy to guess. Job applications, college applications, loan applications, litigation. The small percentage of people who are savvy and naturally inclined towards being spammy and can afford any relevant fees will soon be responsible for over 80 percent of all traffic, not only drowning out others but also overwhelming services completely.
Fees will increase, then the institutions involved will use more AI to combat AI submissions, etc. Schools/banks/employers will also attempt to respond by networking, so that no one looks at applicants directly any more, they just reject if some other rejected. Other publishing from calls for scientific papers to poetry submissions kind of progresses the same way under the same pressures, but the problem of “flooded with junk” isn’t such a new problem there and the stakes are also a bit lower.
In Peter Watts' Maelstrom (2002) it's ultimately self replicating code that pushes the internet from a brutal and rough and competitive infoscape into something worse & even more rawly aggressive. But the book and it's tattered wasteland of the internet still has such tone setting power for me, set such an image up of an internet after humans: where the competing forces of exploitation have degraded and degraded and degraded the situation, pushing humans out.
The only way to solve it for decentralized messaging systems is a decentralized system for verification of identities based on chain of trust and use of digital signatures by default. It must be a legal framework supported by technical means. For example, id providers may be given a responsibility to confirm certain assumptions about their clients (is a real human, is adult etc) while keeping their identity confidential. The government and the corporations will know only what this person allows the id provider to disclose (unless there’s a legal basis for more, like a decision of the court to accept a lawsuit or a court order to identify suspect or witness). Id provider can issue an ID card that can be used as authentication factor. As long as a real person can be confirmed behind the nickname or email address, the cost of abuse will be permanent ban on a platform or on a network. Not many people will risk it. Natural candidates for id providers can be notaries.
Yes, I think we'll see the rise of id-verified online communities. As long as all the other members of the community are also id-verified, the risk of abuse (bullying, doxing, etc) is minimized. This wouldn't stop someone from posting AI-generated content, but it would tend to suppress misinformation and spam, which arguably is the real issue. Would people complain about AI-generated content that is genuinely informative or thought-provoking?
Having links in comments has always been problematic.
For myself, I usually link to my own stuff; not because I am interested in promoting it, but as relevant backup/enhancement of what I am writing about. I think that a link to an article that goes into depth, or to a GitHub repo, is better than a rough (and lengthy) summary in a comment. It also gives others the opportunity to verify what I say. I like to stand behind my words.
I suspect that more than a few HN members have written karmabots, and also attackbots.
I recall blogs from over 20 years ago, with blatant comment spam, where the blog author would respond to the comment spam individually as if it was real readers. Most didn't fall for that, but a few clearly didn't understand it.
I'm not sure LLMs deviate from a long term trend of increasing volume of information production. It certainly does break the little bubble we had from the early 1990s until 2022/3 where you could figure out you were talking to a real human based on the sophistication of the conversation. That was nice, as was usenet before spammers.
There is a bigger question of identify here. I believe the mistake is to go the path of: photo ID, voice verification, video verification (all trivially by-passable now.) Take another step further with Altman's eyeball thing, another mistake since a human can always be commandeered by a third party. In the long term do we really care that the person we are talking to is real or an AI model? Most of the conversations generated in the future will be AI. They may not care.
I think what actually matters more is some sort of larger history of identify and ownership, matching to what one wishes to (I see no problem with multiple IDs, nicks, avatars.) What does this identify represent? In a way, proof of work.
Now, when someone makes a comment somewhere, if it is just a throw away spam account there is no value. Sure, the spammers can and will do all of the extra stuff to build a fake identity just to promote some bullshit produce, but that already happens with real humans.
I think that, ultimately, systems that humans use to interact on the internet will have to ditch anonymity. If people can't cheaply and reliably distinguish human output from LLM output, and people care about only talking to humans, we will need to establish authenticity via other mechanisms. In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Nobody wants this, because it's a pain, it hurts privacy (or easily can hurt it) and has other social negatives (cliques forming, people being fake to build their reputation, that episode of Black Mirror, etc.). Anonymity is useful like cash is useful. But if someone invents a machine that can print banknotes that fool 80% of people, eventually cash will go out of circulation.
I think the big question is: How much do most people actually care about distinguishing real and fake comments? It hurts moderators a lot, but most people (myself included) don't see this pain directly and are highly motivated by convenience.
We will ditch anonymity, but for pseudonymity, not eponymity. Meaning someone, somewhere will know who is who and can attest that 1000 usernames are humans, but people will be able to identify with just a username to everyone else, except that one person.
>In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Yep, that is the way.
Also LLMs will help us create new languages or dialects from existing languages, with the purpose of distinguishing the inner group of people from the outer group of people, and the outer group of LLMs as well. We are in a language arms race for that particular purpose for thousands of years. Now LLMs are one more reason for the arms race to continue.
If we focus for example in making new languages or dialects which sound better to the ear, LLMs have no ears, it is always humans who will be one step ahead of the machine, providing that the language evolves non stop. If it doesn't evolve all the time, LLMs will have time to catch up. Ears are some of the more advanced machinery on our bodies.
BTW I am making right now a program which takes a book written in Ancient Greek and creates an audiobook, or videobook automatically using Google's text to speech. The same program on Google Translate website.
I think people will be addicted in the future with how new languages sound or can be sung.
No one is trying to take away your right to host or participate in anonymous discussions.
> Those systems also use astroturfing. It was not invented with LLMs.
No one is claiming that LLMs invented astroturfing, only that they have made it considerably more economical.
> You're just doing the bidding of corporations who want to sell ID online systems for a more authoritarian world.
Sure, man. Funny that I mentioned "web of trust" as a potential solution, a fully decentralised system designed by people unhappy with the centralised nature of PKI. I guess I must be working in deep cover for my corporate overlords, cunningly trying to throw you off the scent like that. But you got me!
If you want to continue drinking from a stream that's been becoming increasingly polluted since November 2022, you're welcome to do so. Many other people don't consider this an appealing tradeoff and social systems used by those people are likely to adjust accordingly.
You could have authenticated proofs of human-ness without providing your full identity. There are similar systems today which can prove your age without providing your full identity.
You ditch anonymity, and you have this cascading chilling effect through the interwebs because you cannot moderate communities against the political head winds of your nations.
Worse, it won’t work. We are already able to create fake human accounts, and it’s not even a contest.
And with LLMs, I can do some truly nefarious shit. I could create articles about some discovery of an unknown tribe in the Amazon, populate some unmanned national Wikipedia version with news articles, and substantiate the credentials of a fake anthropologist, and use that identity to have a bot interact with people.
Heck I am bad at this, so someone is already doing something worse than what I can imagine.
Essentially, we can now cheaply create enough high quality supporting evidence for proof of existence. We can spoof even proof of life photos to the point that account take overs resolution tickets can’t be sure if the selfies are faked. <Holy shit, I just realized this. Will people have to physically go to Meta offices now to recover their accounts???>
Returning to moderation, communities online, and anonymity:
The reason moderation and misinformation has been the target of American Republican Senators is because the janitorial task of reducing the spread of conspiracy theories touched the conduits carrying political powers.
That threat to their narrative production and distribution capability has unleashed a global campaign to target moderation efforts and regulation.
Dumping anonymity requires us to basically jettison ye olde internet.
I kind of wonder if I care if comments are real people and actually probably don’t as long as they’re thought provoking. I actually thought it would be an interesting experiment to make my own walled garden LLM link aggregator, sans all the rage bait.
I mean, I care if meetup.com has real people, and I care if my kids’ schools Facebook group has real people, and other forums where there is an expectation of online/offline coordination, but hacker news? Probably not.
I feel like part of why comments here are thought provoking is because they're grounded in something? It's not quite coordination, but if someone talks about using software at a startup or small company I do assume they're genuine about that, which tells you more about something being practical in the real world.
And use cases like bringing up an issue on HN to get companies to reach out to you and fix it would be much harder with llms taking up the bandwidth probably.
I could understand that position, except that I don't think most LLM generated text are for the purpose of producing thought provoking conversation.
My expectation would be that anyone going through the effort to put a LLM generated comment bot online is doing it for some ulterior motive, typically profit or propaganda.
Given this, I would equate not caring about the provenance of the comment, to not caring if you're being intentionally misinformed for some deceptive purpose.
Agree. Another complicating factor for detection is that I don't personally mind seeing a sliver of self-promotion in a comment/post if I feel it's "earned" by the post being on-topic and insightful overall. If such a comment was posted by an LLM, I think I would actually be fine with that.
That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.
Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
Even supposing the purported "model collapse" does occur, it doesn't destroy the LLMs we already have -- which are clearly already capable of fooling humans. I don't see the clown party being over, just reaching a stable equilibrium.
Exactly. It logically can't occur, even by the own flawed assumptions of the people that say this. Just freeze all training data at 2024 or keep existing models, the worse case scenario is the models will plateau.
Twitter, LinkedIn, and others are following the credit card and id (KYC) way but the issue remains when people start automating interactions, not spam per se but it creates a waste of time since users cannot cope with the triggering of zillions of interactions that cannot be followed by human-time.
> Because if there's one place where Google didn't solve spam, it's on YT's comments
I do believe that this problem is very self-inflicted (and perhaps even desired) by YouTube:
- The way the comments on YouTube are structured and ordered makes it very hard to make deep discussions on YouTube
- I think there is also a limit on the comment length on YouTube, which again makes it hard to write longer, sophisticated arguments.
- Videos for which a lot of comments a generated tend to become promoted by YouTube's algorithm. Thus YouTubers encourage viewers to write lots of comments (thus also a lot of low-quality comments), i.e. YouTube incentivizes that videos are "spammed" with comments.
The correct solution would be to incentivize few, but high-quality comments (i.e. de-incentivize comments that contribute nothing valuable (i.e. worth your time to read)). This makes it much easier to detect and remove the (real) spam among them.
This doesn’t work in perpetuity. One of the reason why spam is so persistent is that when you ban a spammer, they can just create a new identity and go again. If payment is required then not only do they have to repeatedly pay every time they get banned, they need a new payment card too because you aren’t limited to banning their account – you can ban the payment mechanism they used.
LLMs increase the burden of effort on users to successfully share information with other humans.
LLMs are already close to indistinguishable from humans in chat; Bots are already better at persuading humans[1]. Suggesting that users who feel ineffective at conveying their ideas online, are better served by having a bot do it for them.
All of this, is effectively putting a fitness function on online interactions, increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner. The only steady state I can envision is where we assume that we ONLY talk to bots online.
Free speech and the market place of ideas, sees us bouncing ideas off of each other. Our way of refining our thoughts and forcing ourselves to test our ideas. This is the conversation that is meant to be the bedrock of democratic societies.
It does not envisage an environment where the exchange of ideas is into a bot.
Yes yes, this is a sky is falling view - not everyone is going to fall off the deep end, and not everyone is going to use a bot.
In a funny way, LLMs will outcompete average forum critters and trolls for their ecological niches.
[1] (https://arxiv.org/pdf/2505.09662)
We are at the stage where it’s still mostly online but the first ways this will leak into the real world in big ways are easy to guess. Job applications, college applications, loan applications, litigation. The small percentage of people who are savvy and naturally inclined towards being spammy and can afford any relevant fees will soon be responsible for over 80 percent of all traffic, not only drowning out others but also overwhelming services completely.
Fees will increase, then the institutions involved will use more AI to combat AI submissions, etc. Schools/banks/employers will also attempt to respond by networking, so that no one looks at applicants directly any more, they just reject if some other rejected. Other publishing from calls for scientific papers to poetry submissions kind of progresses the same way under the same pressures, but the problem of “flooded with junk” isn’t such a new problem there and the stakes are also a bit lower.
Recently revisited on Peter's blog: https://www.rifters.com/crawl/?p=11220
It will not stop misinformation either.
Verification is expensive and hard, and currently completely spoof-able. How will a Reddit community verify an ID? In person?
If Reddit itself verifies IDs, then nations across the world will start asking for those IDs and Reddit will have to furnish them.
For myself, I usually link to my own stuff; not because I am interested in promoting it, but as relevant backup/enhancement of what I am writing about. I think that a link to an article that goes into depth, or to a GitHub repo, is better than a rough (and lengthy) summary in a comment. It also gives others the opportunity to verify what I say. I like to stand behind my words.
I suspect that more than a few HN members have written karmabots, and also attackbots.
https://www.youtube.com/watch?v=4VrLQXR7mKU
Previously (6 months ago but didn't trend, perhaps due for a repost?):
https://news.ycombinator.com/item?id=42353508
Thanks!
I'm not sure LLMs deviate from a long term trend of increasing volume of information production. It certainly does break the little bubble we had from the early 1990s until 2022/3 where you could figure out you were talking to a real human based on the sophistication of the conversation. That was nice, as was usenet before spammers.
There is a bigger question of identify here. I believe the mistake is to go the path of: photo ID, voice verification, video verification (all trivially by-passable now.) Take another step further with Altman's eyeball thing, another mistake since a human can always be commandeered by a third party. In the long term do we really care that the person we are talking to is real or an AI model? Most of the conversations generated in the future will be AI. They may not care.
I think what actually matters more is some sort of larger history of identify and ownership, matching to what one wishes to (I see no problem with multiple IDs, nicks, avatars.) What does this identify represent? In a way, proof of work.
Now, when someone makes a comment somewhere, if it is just a throw away spam account there is no value. Sure, the spammers can and will do all of the extra stuff to build a fake identity just to promote some bullshit produce, but that already happens with real humans.
Not so sure I'd call it "nice."
I am ashamed to say that I was one of the reasons that it wasn't so "nice."
Nobody wants this, because it's a pain, it hurts privacy (or easily can hurt it) and has other social negatives (cliques forming, people being fake to build their reputation, that episode of Black Mirror, etc.). Anonymity is useful like cash is useful. But if someone invents a machine that can print banknotes that fool 80% of people, eventually cash will go out of circulation.
I think the big question is: How much do most people actually care about distinguishing real and fake comments? It hurts moderators a lot, but most people (myself included) don't see this pain directly and are highly motivated by convenience.
>In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Yep, that is the way.
Also LLMs will help us create new languages or dialects from existing languages, with the purpose of distinguishing the inner group of people from the outer group of people, and the outer group of LLMs as well. We are in a language arms race for that particular purpose for thousands of years. Now LLMs are one more reason for the arms race to continue.
If we focus for example in making new languages or dialects which sound better to the ear, LLMs have no ears, it is always humans who will be one step ahead of the machine, providing that the language evolves non stop. If it doesn't evolve all the time, LLMs will have time to catch up. Ears are some of the more advanced machinery on our bodies.
BTW I am making right now a program which takes a book written in Ancient Greek and creates an audiobook, or videobook automatically using Google's text to speech. The same program on Google Translate website.
I think people will be addicted in the future with how new languages sound or can be sung.
This was not always the case. I used to be a Grade A asshole, and have a lot to atone for.
I also like to make as much of my work open, as I can.
You're just doing the bidding of corporations who want to sell ID online systems for a more authoritarian world.
Those systems also use astroturfing. It was not invented with LLMs.
See my other comment https://news.ycombinator.com/item?id=44130743#44150878 for how this is "bleak" mostly if you were comfortable with your Overton window and censorship.
No one is trying to take away your right to host or participate in anonymous discussions.
> Those systems also use astroturfing. It was not invented with LLMs.
No one is claiming that LLMs invented astroturfing, only that they have made it considerably more economical.
> You're just doing the bidding of corporations who want to sell ID online systems for a more authoritarian world.
Sure, man. Funny that I mentioned "web of trust" as a potential solution, a fully decentralised system designed by people unhappy with the centralised nature of PKI. I guess I must be working in deep cover for my corporate overlords, cunningly trying to throw you off the scent like that. But you got me!
If you want to continue drinking from a stream that's been becoming increasingly polluted since November 2022, you're welcome to do so. Many other people don't consider this an appealing tradeoff and social systems used by those people are likely to adjust accordingly.
Relevant meme video (which watching is in my opinion worth your time):
Worse, it won’t work. We are already able to create fake human accounts, and it’s not even a contest.
And with LLMs, I can do some truly nefarious shit. I could create articles about some discovery of an unknown tribe in the Amazon, populate some unmanned national Wikipedia version with news articles, and substantiate the credentials of a fake anthropologist, and use that identity to have a bot interact with people.
Heck I am bad at this, so someone is already doing something worse than what I can imagine.
Essentially, we can now cheaply create enough high quality supporting evidence for proof of existence. We can spoof even proof of life photos to the point that account take overs resolution tickets can’t be sure if the selfies are faked. <Holy shit, I just realized this. Will people have to physically go to Meta offices now to recover their accounts???>
Returning to moderation, communities online, and anonymity:
The reason moderation and misinformation has been the target of American Republican Senators is because the janitorial task of reducing the spread of conspiracy theories touched the conduits carrying political powers.
That threat to their narrative production and distribution capability has unleashed a global campaign to target moderation efforts and regulation.
Dumping anonymity requires us to basically jettison ye olde internet.
I mean, I care if meetup.com has real people, and I care if my kids’ schools Facebook group has real people, and other forums where there is an expectation of online/offline coordination, but hacker news? Probably not.
And use cases like bringing up an issue on HN to get companies to reach out to you and fix it would be much harder with llms taking up the bandwidth probably.
My expectation would be that anyone going through the effort to put a LLM generated comment bot online is doing it for some ulterior motive, typically profit or propaganda.
Given this, I would equate not caring about the provenance of the comment, to not caring if you're being intentionally misinformed for some deceptive purpose.
Once the algorithms predominantly feed on their own shit the bazillion dollar clown party is over.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
Dead Comment
For web spam this was HTTPS. For account spam this is phone # 2fa. I think requiring a form of id or payment card is the next step.
Because if there's one place where Google didn't solve spam, it's on YT's comments
I do believe that this problem is very self-inflicted (and perhaps even desired) by YouTube:
- The way the comments on YouTube are structured and ordered makes it very hard to make deep discussions on YouTube
- I think there is also a limit on the comment length on YouTube, which again makes it hard to write longer, sophisticated arguments.
- Videos for which a lot of comments a generated tend to become promoted by YouTube's algorithm. Thus YouTubers encourage viewers to write lots of comments (thus also a lot of low-quality comments), i.e. YouTube incentivizes that videos are "spammed" with comments. The correct solution would be to incentivize few, but high-quality comments (i.e. de-incentivize comments that contribute nothing valuable (i.e. worth your time to read)). This makes it much easier to detect and remove the (real) spam among them.