The Internet Is Full of AI Dogshit

One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Unfortunately, this doesn't work at all for AI-generated garbage. Its command of the language is perfect - in fact, it's much better than that of most human beings. Anyone can instantly generate superficially coherent posts. You no longer have to hire a copywriter, as many SEO spammers used to do.

curl's struggle with bogus AI-generated bug reports is a good example of the problems this causes: https://news.ycombinator.com/item?id=38845878

This is only the beginning, it will get much worse. At some point it may become impossible to separate the wheat from the chaff.

pavel_lishin · 2 years ago

We should start donating more heavily to archive.org - the way back machine may soon be the only way to find useful data on the internet, by cutting out anything published after ~2020 or so.

emsign · 2 years ago

I won't even bet on archive.org to survive. I will soon upgrade my home NAS to ~100TB and fill it up with all kinds of information and media /r/datahoarder style. Gonna archive the usual suspects like Wikipedia and also download some YouTube channels. I think now is the last chance to still get information that hasn't been tainted by LLM crap. The window of opportunity is closing fast.

letitbeirie · 2 years ago

It will be like salvaging pre-1945 shipwrecks for their non-irradiated metal.

cauliflower99 · 2 years ago

Interesting idea. Could there be a market for pre-AI era content? Or maybe it would be a combination of pre-AI content plus some extra barriers to entry for newer content that would increase the likelihood the content was generated by real people?

sjfjsjdjwvwvc · 2 years ago

Love that sentiment! The Internet Archive is in many ways one of the best things online right now IMO. One of the few organisations that I donate regularly to without any second thoughts. Protect the archive at all costs!

EasyMark · 2 years ago

I update my wikipedia copy every few months, but I can't really afford to back up internet archive. I do send them and around $10 every christmas as part of my $100 bucks to my favorite sites like archive, wikipedia, etc

hyperthesis · 2 years ago

~2020, the end of history

Dead Comment

a_c · 2 years ago

Things go in cycle. Search engine was so much better at discovering linked websites. Then people play the SEO game, write bogus articles, cross link this and that, everyone got into writing. Everyone write the same cliches over and over, quality of search engine plumets. But then since we are regurgitating the same thought over and over again, why not automate it. Over time people will forget where the quality post comes up in the first place. e.g. LLM replaces stackoverflow replaces technical documentation. When the cost of production is dirt cheap, no one cares about quality. When enough is enough, people will start to curate a web of word of mouth of everything again.

What I typed above is extrememly broad stroking and lacking of nuances. But generally I think quality of online content will go to shit until people have had enough, then behaviour will swing to other side

jstarfish · 2 years ago

Nah, you got the right of it. It feels like the end of Usenet all over again, only these days cyber-warlords have joined the spammers and trolls.

Mastodon sounded promising as What's Next, but I don't trust it-- that much feels like Bitcoin all over again. Too many evangelists, and there's already abuse of extended social networks going on.

Any tech worth using should sell itself. Nobody needed to convince me to try Usenet, most people never knew what it was, and nobody is worse off for it.

We created the Tower of Babel-- everyone now speaks with one tongue. Then we got blasted with babble. We need an angry god to destroy it.

I figure we'll finally see the fault in this implementation when we go to war with China and they brick literally everything we insisted on connecting to the internet, in the first few minutes of that campaign.

CogitoCogito · 2 years ago

I feel like somehow this is all some economic/psychological version of a heat equation. Anytime someone comes up with some signal with economic value that value is exploited to spread the signal back out.

I think it’s similar to a Matt Levine quote I read which said something like Wall Street will find a way to take something riskless and monetize them so that they now become risky.

Log_out_ · 2 years ago

Insular splinternets with Web of trust where allowing corporate access is banworthy?

indigochill · 2 years ago

> You no longer have to hire a copywriter, as many SEO spammers used to do.

I used to do SEO copywriting in high school and yeah, ChatGPT's output is pretty much at the level of what I was producing (primarily, use certain keywords, secondarily, write a surface-level informative article tangential to what you want to sell to the customer).

> At some point it may become impossible to separate the wheat from the chaff.

I think over time there could be a weird eddy-like effect to AI intelligence. Today you can ask ChatGPT a Stack Overflow-style and get a Stack Overflow-style response instantly (complete with taking a bit of a gamble on whether it's true and accurate). Hooray for increased productivity?

But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up, instead becoming a loop of sometimes-correct goop. Maybe that becomes a problem as technology evolves? Or maybe they train on technical documentation at that point?

zeruch · 2 years ago

I think you are generally correct in where things will likely go (sometimes correct goop) but the problem I think will be far more existential; when people start to feel like they are in a perpetual uncanny valley of noise, what DO they actually do next? I don't think we have even the remotest grasp of what that might look like and how it will impact us.

spaceman_2020 · 2 years ago

Its already becoming hard to tell the wheat from the chaff.

AI generated images used to look AI generated. Midjourney v6 and well tuned sdxl models look almost real. For marketing imagery, Midjourney v6 can easily replicate images from top creative houses now.

ren_engineer · 2 years ago

>But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up

for coding tasks I'd imagine it could be trained on the actual source code of the libraries or languages and determine proper answers for most questions. AI companies have seen success using "synthetic" data, but who knows how much it can scale and improve

WalterBright · 2 years ago

I've rarely found stackoverflow to give useful answers. If I am looking for how to do something with Linux programming, I'll get a dozen answers, half of which are only partial answers, the other half don't work.

samstave · 2 years ago

>the well of information for AI to train on starts to dry up

and WRT to the eddy-like model-self-incestuation - I am sure that the scope of that well just becomes wider - now its slurping any and all video and learning human micro emotions and micro-aggressions - and mastering human interpersonal skills.

My prediction is that AI will be a top-down reflection of societies' leadership. So as long as we have these questionable leaders throughout the world governments and global corps - the Alignment of AI will be bias to their narratives.

bobthepanda · 2 years ago

It didn't take very long for the first lawyers to get sanctioned for using ChatGPT-made-up cases in legal briefs. https://www.reuters.com/legal/new-york-lawyers-sanctioned-us...

It would be hilarious if the end result of all this would be to go back to a 1990s-2000s Yahoo style of web portal where all the links are curated by hand by reputable organizations.

taberiand · 2 years ago

They find a way to validate the utility of the information instead of the source.

It doesn't matter if the training data is AI generated or not, if it is useful.

rs999gti · 2 years ago

> You no longer have to hire a copywriter

Has anyone tested a marketing campaign using copy from a human copywriter versus an AI one?

I would like to see which one converts better.

betaby · 2 years ago

> Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Or just a post from a non-native speaker.

commandlinefan · 2 years ago

I can always tell the difference between a non-native English speaking writer and somebody who's just stupid - the sort of grammatical mistakes stupid people make are very, very different than the ones that people make when speaking a second language.

Of course, sometimes the non-native English was so bad it wasn't worth wading through it, so that's still sort of a good signal.

jprete · 2 years ago

Often it was possible to tell these apart on repeat interactions.

drewcoo · 2 years ago

> a post from a non-native speaker

In my experience as an American, US-born and -educated English speakers have much worse grammar than non-native speakers. If nothing else, the non-native speakers are conscious of the need for editing.

asylteltine · 2 years ago

That’s true. I thought I missed the internet before ClosedAI ruined it but man, I would love to go back to 2020 internet now. LLM research is going to be the downfall of society in so many ways. Even at a basic level my friend is taking a masters and EVERYONE is using chatgpt for responses. It’s so obvious with the PC way it phrases things and then summarizes it at the end. I hope they just get expelled.

j0hnyl · 2 years ago

I don't see how this points to downfall of society. IMO it's clearly a paradigm shift that we need to adjust to and adjustment periods are uncomfortable and can last a long time. LLMs are massive productivity boosters.

monkeynotes · 2 years ago

I think this is hyperbole, and similar to various techno fears throughout the ages.

Books were seen by intellectuals as being the downfall of society. If everyone is educated they'll challenge dogma of the church, for one.

So looking at prior transformational technology I think we'll be just fine. Life may be forever changed for sure, but I think we'll crack reliability and we'll just cope with intelligence being a non-scarce commodity available to anyone.

oblio · 2 years ago

At this rate many exams will just become oral exams :-)

BeFlatXIII · 2 years ago

Is it a master's in an important field or just one of those masters that's a requirement for job advancement but primarily exists to harvest tuition money for the schools?

vladsolokha · 2 years ago

> Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Timee to stert misspelling and using poorr grammar again. This way know we LLM didn't write it. Unlearn we what learned!

l33t7332273 · 2 years ago

If you prompt LLMs to use poor spelling and grammar, they will.

acdha · 2 years ago

I’ve thought about that a lot - a while back I heard about problems with a contract team supplying people who didn’t have the skills requested. The thing which are it easiest to break the deal was that they plagiarized a lot of technical documentation and code and continued after being warned, which removed most of the possible nuance. Lawyers might not fully understand code but they certainly know what it means when the level of language proficiency and style changes significantly in the middle of what’s supposed to be original work, exactly matching someone else’s published work, or code which is supposedly your property matches a file on GitHub.

An LLM wouldn’t have made them capable of doing the job but the degree to which it could have made that harder to convincingly demonstrate made me wonder how much longer something like that could now be drawn out, especially if there was enough background politics to exploit ambiguity about intent or the details. Someone must already have tried to argue that they didn’t break a license, Copilot ChatGPT must have emitted that open source code and oh yes I’ll be much more careful about using them in the future!

philwelch · 2 years ago

With practice I’ve found that it’s not hard to tell LLM output from human written content. LLM’s seemed very impressive at first but the more LLM output I’ve seen, the more obvious the stylistic tells have become.

bluetomcat · 2 years ago

It's a shallow writing style, not rooted in subjective experience. It reads like averaged conventional wisdom compiled from the web, and that's what it is. Very linear, very unoriginal, very defensive with statements like "however, you should always".

ptmx · 2 years ago

Are you talking about LLMs in general, or specifically ChatGPT with a default prompt?

Since dabbling with some open source models (llama, mistral, etc.), I've found that they each have slightly different quirks, and with a bit of prompting can exhibit very different writing styles.

I do share your observation that a lot of content I see online now is easily identifiable as ChatGPT output, but it's hard for me to say how much LLM content I'm _not_ identifying because it didn't have the telltale style of stock ChatGPT.

ToucanLoucan · 2 years ago

A work-friend and I were musing in our chat yesterday about a boilerplate support email from Microsoft he received after he filed a ticket, that was simply chock full of spelling and grammar errors, alongside numerous typos (newlines where inappropriate, spaces before punctuation, that sort of thing) and as a joke he fired up his AI (honestly I have no idea what he uses, he gets it from a work account as part of some software so don't ask me) and asked it to write the email with the same basic information and with a given style, and it drafted up an email that was remarkably similar, but with absolutely perfect english.

On that front, at least, I welcome AI to be integrated in businesses. Business communication is fucking abysmal most of the time. It genuinely shocks me how poorly so many people who's job is communication do at communicating, the thing they're supposed to have as their trade.

jprete · 2 years ago

Grammar, spelling, and punctuation have never been _proof_ of good communication, they were just _correlated_ with it.

Both emails are equally bad from a communication purist viewpoint, it's just that one has the traditional markers of effort and the other does not.

I personally have wondered if I should start systematically favoring bad grammar/punctuation/spelling both in the posts I treat as high quality, and in my own writing. But it's really hard to unlearn habits from childhood.

adamckay · 2 years ago

I can imagine soon - within the next year or so - that business emails will simply be AI talking to AI. Especially with Microsoft pushing their copilot into Office and Outlook.

You'll need to email someone so you'll fire up Outlook with its new Clippy AI and tell it the recipient and write 2 or 3 bullet points of what you want it to include. Your AI will write the email, including the greeting and all the pleasantries ("hope this email finds you well", etc) with a wordy 3 or 4 paragraphs of text, including a healthy amount of business-speak.

Your recipient will then have an email land in their inbox and probably have their AI read the email and automatically summarise those 3 or 4 paragraphs of text into 3 or 4 bullet points that the recipient then sees in their inbox.

seabass-labrax · 2 years ago

I agree that most business communication is pretty low-quality. But after reading your post with the kind of needlessly fine-tooth comb that is invited by a thread about proper English, I'm wondering how it matters. You yourself made a few mistakes in your post, but not only does it scarcely matter, it would be rude of me to point it out in any other context (all the same, I hope you do not take offence in this case).

Correct grammar and spelling might be reassuring as a matter of professionalism: the business must be serious about its work if it goes to the effort of proofreading, surely? That is, it's a heuristic for legitimacy in the same way as expensive advertisements are, even if completely independent from the actual quality of the product. However, I'm not sure that 100% correct grammar is necessary from a transactional point of view; 90% correct is probably good enough for the vast majority of commerce.

raxxorraxor · 2 years ago

The windows bluescreen in German has had grammatical errors (maybe it still does in the most recent version of Win10).

Luckily you don't see it very often these days, but I first thought it would be one of those old anti-virus scams. Seems QA is less a focus at Microsoft right now.

ozr · 2 years ago

It won't help as much with local models, but you could add an 'aligned AI' captcha that requires someone to type a slur or swear word. Modern problems/modern solutions.

switch007 · 2 years ago

> that we have lost a useful heuristic

But we've gained some new ones. I find ChatGPT-generated text predictable in structure and lacking any kind of flair. It seems to avoid hyperbole, emotional language and extreme positions. Worthless is subjective, but ChatGPT-generated text could be considered worthless to a lot of people in a lot of situations.

madeofpalk · 2 years ago

If it had a colour, it would be 'grey'. It's the average of all text.

cratermoon · 2 years ago

The current crop of LLMs at least have a style and voice. It's a bit like reading Simple English Wikipedia articles, the tone is flat and the variety of sentence and paragraph structure is limited.

The heuristic for this is not as simple as bad spelling and grammar, but it's consistent enough to learn to recognize.

sgustard · 2 years ago

I rely on the stilted style of Chinese product descriptions on Amazon to avoid cheap knockoffs. Why do these products use weird bullet lists of features like "will bring you into a magical world"? Once you LLM these into normal human speak it will be much harder to identify the imports. https://www.amazon.com/CFMOUR-Original-Smooth-Carbon-KB8888T

kaetemi · 2 years ago

It'll just be even more empty fluff.

popcalc · 2 years ago

It's already 404-ing.

stcredzero · 2 years ago

One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

The signal has shifted. For now, theory of mind and social awareness are better indicators. This has a major caveat, however: There are lots of human beings who have serious problems with this. Then again, maybe that's a non-problem.

Deleted Comment

photon_collider · 2 years ago

I agree. I've noticed the other heuristic that works is "wordiness". Content generated by AI tends to be verbose. But, as you suggested, it might just be a matter of time until this heuristic also no longer becomes obsolete.

Deleted Comment

munk-a · 2 years ago

At the moment we can at least still use the poor quality of AI text to speech to filter out the dogshit when it comes to shorts/reel/tik toks etc... but we'll eventually lose that ability as well.

globular-toast · 2 years ago

There might be a reversal. Humans might start intentionally misspelling stuff in novel ways to signal that they are really human. Gen Zs already don't use capitals or any other punctuation.

vibrolax · 2 years ago

gen-z channels ee cummings

itronitron · 2 years ago

Every human-authored news article posted online since 2006 has had multiple misspellings, typos, and occasional grammar mistakes. Blogs on the other hand tend to have very few errors.

valval · 2 years ago

Poor use of LLMs is incredibly easy to spot, and works as today’s sign of a worthless post/comment/take.

pmarreck · 2 years ago

So now the heuristic will change to "super excellent grammar", clearly.

We'll learn to pepper our content with creative misspellings now...

vagab0nd · 2 years ago

> At some point it may become impossible to separate the wheat from the chaff.

Then the chaff is as good as the wheat.

mtillman · 2 years ago

LLM trash is one thing but if you follow OP link all I see is the headline and a giant subscribe takeover. Whenever I see trash sites like this I block the domain from my network. The growth hack culture is what ruins content. Kind of similar to when authors started phoning in lots of articles (every newspaper) or even entire books (Crichton for example) to keep publishers happy. If we keep supporting websites like the one above, quality will continue to degrade.

ineptech · 2 years ago

I understand the sentiment, but those email signup begs are to some extent caused by and a direct response to Google's attempts to capture traffic, which is what this article is discussing. And "[sites like this] is what ruins content" doesn't really work in reference to an article that a lot of people here liked and found useful.

Channel9877 · 2 years ago

Interesting point about the spelling and grammar. I wonder if that could be used as a method of proving you are a human..

spaceman_2020 · 2 years ago

Would just penalize non native speakers.

KingGeedorah · 2 years ago

I was waiting for you to reveal your comment was written by AI

heresie-dabord · 2 years ago

> it may become impossible to separate the wheat from the chaff

It is already approaching the societal limit to separate careful thought from psyops and delusional nonsense.

The way out is authenticity. Signed content is the only way to get that. You can't take anything at face value. It might be generated, forged, et. When anyone can publish anything and when anyone is outnumbered by AIs publishing even more things, the only way to filter that is by relying on reputation and authenticity so you can know who published what and what else they are saying.

Web of trust has of course been tried but it never got out of the it's a geeky things for tin foil hat wearing geeks kind of corner. It may be time to give that another try.

smt88 · 2 years ago

> Signed content is the only way to get that.

This does nothing to guarantee that the content was written or edited by a human. Because of the risk of key theft, it doesn't even guarantee that it was published by the human who signed it.

It is physically, philosophically, and technically impossible to verify the authenticity of digital content. At the boundary between the analog world and the digital world, you can always defraud it.

This is the same reason that no one ever successfully used blockchains for supply-chain authentication. Yes, you can verify that item #523 has a valid hash associated with it, but you can't prove that the hash was applied to item #523 instead of something fraudulent.

marstall · 2 years ago

> It is physically, philosophically, and technically impossible to verify the authenticity of digital content.

Though there are many brands built on trust, whose domain name is very difficult to spoof, that are an exception to this.

Hate on nytimes.com, but you have reasonable confidence the content on that site is written, fact-checked and edited by staff at the New York Times Company.

jtsiskin · 2 years ago

Your home internet and your cellular provider can “attest” you make a monthly payment - right now, the scarcity of ipv4 and cell phone numbers often serve this purpose. A government agency or bank can attest you’re a real person. A hardware manufacturer can attest you purchased a device. A PGP style web-of-trust can show other people, who also own scarce resources and they may trust indirectly, also think you’re real.

Blockchain may be largely over-hyped, but from this bubble I think important research in zero-knowledge proofs and trust-less systems will one day lead to a solution to this that is private and decentralized, rather than fully trackable and run by mega-corps.

jillesvangurp · 2 years ago

It 100% guarantees that the content was signed by whomever signed it. The problem then becomes much simpler: do you trust the signer or not. And you can base that decision on what others are saying about the signer (using signed content obviously) or other things also signed by the same person or entity.

Once you have that, unsigned content or content signed by AIs is easy to spot. Because it would either have no reputation at all, or a poor one.

Signatures are impossible to forge (or sufficiently hard that we can assume so), and easy to verify. Reputations are a bit more work but we could provide some tools for that or search engines and other content aggregators could check things for us. But it all starts with a simple signature. Once you have lots of people signing their work, checking their reputation becomes easy. And the nice thing with a reputation is that people care about guarding it is as well. Reputation is hard to fake; you build it throughout your life. And you stake it with everything you publish.

There's no need for blockchains or any fancy nonsense like that. It might help but it's a bit of a barrier to taking this into use.

willmadden · 2 years ago

>It is physically, philosophically, and technically impossible to verify the authenticity of digital content.

That's the entire point of cryptocurrencies. They do that as well as is possible right now in a distributed network, conceding the point about key theft.

I would argue it's not all-or-nothing. Signing would verify the the majority of content from creators that have not had their keys stolen. Adding currency/value to this equation boosts the quality further and discourages spamming "content based marketing" garbage. The obstacles are usability and behavior changes, and also that any given user can now copy/paste LLM prompt responses, of course.

l33t7332273 · 2 years ago

And it is physically, philosophically, and technically impossible to prove that a speaker at defcon that you recognize from last year isn’t an imposter wearing a mission impossible mask; these conditions are too strict.

donmcronald · 2 years ago

> The way out is authenticity. Signed content is the only way to get that.

This is the real play IMO. With the push for identity systems that support attestation [1], it doesn't matter if AI is successful at producing high quality results or if it only ever produces massive amounts of pure garbage.

In the latter case, it's a huge win for platform owners like Apple, Google, or Microsoft (via TPM) because they're the ones that can attest to you being "not a bot". I wouldn't be surprised if 5 years from now you need a relationship with one of those 3 companies to participate online in any meaningful way.

So, even if AI "fails", they'll keep pushing it because it's going to allow them to shift a large portion of internet users to a subscription model for identity and attestation. If you don't pay, your content won't ever get surfaced because the default will be to assume it's generated trash.

On the business side we could see schemes that make old-school SSL and code signing systems look like charities. Imagine something like BIMI [2], but for all content you publish, with a pay-per-something scheme. There could even be price discrimination in those systems (similar to OV, EV SSL) where the more you pay the more "trustworthy" you are.

My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple. Imagine having your real identity tied to an attestation by one of those companies.

1. https://www.w3.org/TR/webauthn/#sctn-defined-attestation-for...

2. https://bimigroup.org/

seabass-labrax · 2 years ago

I'm glad to see someone else on the Web who's concerned about this. Remote attestation is a triumph of 21st-century cryptography with all kinds of security benefits, but never before in my lifetime have I seen a technology be misappropriated so quickly for dubious purposes.

My country (the UK) is one of the worst right now, with the current government on a crusade to make the internet 'safer' by adding checkpoints[1] at various stage to tie your internet usage to your real-world identity. Unlike some other technically advanced countries, though, the UK doesn't have the constitutional robustness to ensure civil liberties under such a regime, nor does the population have what I like to think of as the 'continental temperament' to complain about it.

I'd like to make a shout-out to a project in which I participate: the Verifiable Credentials Working Group[2] at the World Wide Web Consortium is the steward of a standard for 'Self-Sovereign Identity' (SSI). This won't be able to fix all the issues with authenticity online, but it will at least provide a way of vouching for others without disclosing personal information. It's a bit like the GPG/PGP 'Web of Trust' idea, but with more sophisticated cryptography such as Zero-Knowledge Proofs.

[1]: https://www.eff.org/deeplinks/2023/09/uk-online-safety-bill-...

[2]: https://www.w3.org/2017/vc/WG/

Deleted Comment

espe · 2 years ago

> My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple. Imagine having your real identity tied to an attestation by one of those companies.

a general taste for dysfunctional public-private partnerships and the fact that auth is a seriously hard problem at scale, make this scenario a few percentage points more likely than anyone should feel comfortable with.

l33t7332273 · 2 years ago

> My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple

One enticing alternative (for the government!) is to require you to upload your actual documents and use some government sanctioned attestation service.

lolsal · 2 years ago

Sincerely asking - how does this solve the problem? I could generate a bunch of dog-shit and then sign it and publish it. Even with user attestation services provided by Apple, Google, et. al - couldn't I even automate generating a bunch of AI junk and signing it?

jprete · 2 years ago

This would have to work by individuals or organizations building a good reputation over time, so their specific output is trusted. The fact that an LLM outputted the text is not nearly as relevant as whether anyone has staked their reputation on its correctness.

throwaway29812 · 2 years ago

Exactly. This presupposes that humans are always better than AI, or that they don't produce spam or harmful content.

They do.

stcredzero · 2 years ago

The way out is authenticity.

My impression of Flat Earthers, is that a lot of them are indeed authentic.

broscillator · 2 years ago

the craziest part is that jaron lanier said this like 20 years ago, if not more

danielovichdk · 2 years ago

The way out is authenticity.

Funny.

Has it ever been anything else?

Don't feel sorry for people that has no ability to be authentic. They try to hard.

Don't try.

volkk · 2 years ago

agree 100%. there's no going back at this point and it's an inevitable problem anyway. so we need to innovate further.

TimurSadekov · 2 years ago

And we have a practical solution — we create a global unbiased decentralized CyberPravda platform for disputes, for analyzing the reliability of information and assessing the reputation of its authors, where people are accountable with personal reputation for their knowledge and arguments.

We have found a way to mathematically determine the veracity of Internet information and have developed a fundamentally new algorithm that does not require the use of cryptographic certificates of states and corporations, voting tokens that can bribe any user, or artificial intelligence algorithms that are not able to understand the exact meaning of what a person said. The algorithm does not require external administration, review by experts or special content curators. We have neither semantics nor linguistics — all these approaches have not justified themselves. We have found a unique and very unusual combination of mathematics, psychology and game theory and have developed a purely mathematical international multilingual correlation algorithm that uses graph theory and allows us to get a deeper scientometric assessment of the accuracy and reliability of information sources compared to the PageRank algorithm or the Hirsch index. The algorithm allows betting on different versions of events with automatic determination of the winner and allows to create a holistic structural and motivational frame in which users and news agencies can earn money by publishing reliable information, and a high reputation rating becomes a fundamentally new social elevator.

CyberPravda mathematically evaluates the balance of arguments used by different authors to confirm or refute various contradictory facts to assess their credibility, in terms of consensus in large international and socially diverse groups. From these facts, the authors construct their personal descriptions of the picture of events, for the veracity of which they are held responsible by their personal reputations. An unbiased and objective purely mathematical correlation algorithm based on graph theory checks these narratives for mutual correspondence and coherence according to the principle of "all with all" and finds the most reliable sequences of facts that describe different versions of events. Different versions compete with each other in terms of the value of the flow of meaning, and the most reliable versions become arguments in the chain of events for facts of higher or lower level, which loops the chain of mutual interaction of arguments and counterarguments and creates a global hypergraph of knowledge, in which the greatest flow of meaning flows through stable chains of consistent scientific knowledge that best meet the principle of falsifiability and Popper's criterion. A critical path in the sequence of the most credible facts forms an automatically generated multi-lingual article for each of the existing versions of events, which is dynamically rearranged according to new incoming evidences and the desired credibility levels set by readers in their personal settings ranging from zero to 100%. As a result, users have access to multiple Wikipedia-like articles describing competing versions of events, ranked by objectivity according to their desired level of credibility.

peppermint_gum · 2 years ago

ravenstine · 2 years ago

Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2. Articles written by interns or Indian virtual assistants about generic topics are pretty much as bad as most AI generated material and isn't that distinguishable from it. It doesn't help that search engines today sort by prestige over whether your query matches text in a webpage.

People aren't really using the web much now anyway. They're living in apps. I don't see people surfing webpages on their phone unless they're "googling" a question, and even then they aren't usually going more than 1 level deep before returning to their app experience. The web has been crap for a very long time, and it has become worse, but soon it's not going to matter anymore.

You, the reader, were the frog slowly boiling, except now the heat has been turned way up and you are now aware of your situation.

If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.

EDIT: People seem to be misunderstanding me by thinking I am not considering the change in volume of spam. I invoked the boiling frog analogy specifically to make the point that the volume has significantly increased.

luma · 2 years ago

Totally agreed, SEO spammers wrecked the public web years ago and Google did everything they could to enable it for more ad revenue.

Culonavirus · 2 years ago

SEO spammers were a thing even before Google fucked their search results. You know when Google search results were still amazing, like decade ago? SEO spammers were thriving. I know that for a fact because I worked for one back then. 90% of why Google search sucks now is due to Google being too greedy, only the rest is caused by SEO spammers.

ryanisnan · 2 years ago

With respect, I think you're missing a key variable, which is volume.

Sure, interns or outsourced content was there, but those are still humans, spending human-time creating that crap.

Any limiter on the volume of this crap is now gone.

rchaud · 2 years ago

Plus, at worst, human writers simply regurgitate/summarize info from other articles. It's more work to intentionally write something false.

AI writing has no idea what is real or fake and doesn't care.

tonymet · 2 years ago

Yes but the content from the web flows into social media, news, “books” (now e-books) in an intangible cyclone of fabricated information.

If sewage gets into the water supply no one is safe. You don’t get to feel better for having a spigot away from the source.

nerdponx · 2 years ago

The sewage has already been flowing for years. Now we're just going to have more of it.

Search results on both Bing and DDG have been rendered functionally useless for a year or so now. Almost every page is an SEO-oriented blob of questionable content hosted on faceless websites that exist solely for ads and affiliate links, whether it's AI-generated or underpaid-third-world-worker-generated.

https://en.wikipedia.org/wiki/Wikipedia:List_of_citogenesis_...

Now powered by AI.

Shorel · 2 years ago

I agree that low quality content has always existed.

But the issue is about the volume of misleading information that can be generated now.

Anything legit will be much more difficult to find now, because of the increased (increasing?) volume.

Good insight about Apps.

One wonders: How good could the next generation of AIs after LLMs become at curating the web?

What if every poster was automatically evaluated by AIs on 1, 2, and 5 year time horizons for predictive capability, bias, and factual accuracy?

kossTKR · 2 years ago

Okay so this is pretty bleak isn't it, for the entrepreneurs, grassroots, startups, or just free enterprise in the most basic form?

I hope making software, apps, coding and designing is still a viable path to take when everyone has been captured into apps owned by the richest people on earth and no one will go to the open marketplace / "internet" anymore.

Will smaller scale tech entrepreneurship die?

>If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.

I completely agree!

Aurornis · 2 years ago

> Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2.

I feel the same way.

I'm sure some corners of the internet have incrementally more spam, but things like SEO spam word mixers and blog spam have been around for a decade. ChatGPT didn't appreciably change that for me.

I have, however, been accused of being ChatGPT on Reddit when I took the time to wrong out long comments on subjects I was familiar with. The more unpopular my comment, the more likely someone is to accuse me of being ChatGPT. Ironically, writing thoughtful posts with good structure triggers some people to think content is ChatGPT.

bombela · 2 years ago

I failed a remote technical interview by writing a bad abstraction and mudding myself with it.

After the interview I rewrote the code, and sent an email with it and a well written apology.

The company thought the email and the code was chatgpt! I am still not sure how I feel about that.

branon · 2 years ago

Never thought I'd say this, but in times like these, with clearnet in such dire straits, all the information siloed away inside Discord doesn't seem like such a bad thing. Remaining unindexable by search engines all but guarantees you'll never appear alongside AI slop or be used as training data.

The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at. They have eschewed their efficacy at basic tasks in favor of being terrible at complex tasks.

titzer · 2 years ago

The fundamental dynamic that ruins every technology is (over-)commercialization. No matter what anyone says, it is clear that in this era, advertising has royally screwed up all the incentives on the internet and particularly the web. Whereas in the "online retailer" days, there was transparency about transactions and business models, in the behind-the-scenes ad/attention economy, it's murky and distorted. Effectively all the players are conspiring to generate revenue from people's free time, attention, and coerce them into consumption, while amusing them to death. Big entities in the space have trouble coming up with successful models other than advertising--not because those models are unsuccessful, but because 20+ years of compounded exponential growth has made them so big that it's no longer worth their while and will not help them achieve their yearly growth targets.

Just a case in point. I joined Google in 2010 and left in 2019. In 2010 annual revenue was ~$30 billion. Last year, it was $300 billion. Google has grown at ~20% YoY very consistently since its inception. To meet that for 2024, they'll have to find $60 billion in new revenue. So they need to find two 2010-Google's worth of revenue in just one year. And of course 2010-Google took twelve years to build. It's just bonkers.

plagiarist · 2 years ago

There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things. The weeds have grown over them and made it difficult to find these from the public web because these individuals cannot devote the same resources to SEO and SEM as teams of adtech affiliate marketers with LLM-generated content.

When Google first came out, it was amazing how effective it was. In the years following, we have had a feedback loop of adtech bullshit.

nicbou · 2 years ago

I strongly disagree. I've been answering immigration questions online for a long time. People frequently comment on threads from years ago, or ask about them in private. In other words, public content helps a lot of other people over time.

On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

mrkramer · 2 years ago

>On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

>If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

Exactly; open web is better because everything is public and "easy" to find....well if you have a good search engine.

Deep web is huge: Facebook, Instagram, Discord etc. and unfortunately unsearchable.

Right, the issue is not that people don't appreciate good content. The issue is that it's harder for people to find it.

It's an entrenching of the existing phenomenon where the only way to know what to trust on the Web is word of mouth.

nonameiguess · 2 years ago

I'd think this depends heavily on the subject. Someone asking about fundamental math and physics is likely to get the same answer now as 50 years from now. Immigration law and policy can change quickly and answers from 5 years ago may no longer present accurate information.

krapp · 2 years ago

"Sharing useful knowledge with the broadest possible audience," unfortunately, is the worst possible thing you can do nowadays.

I hate that the internet is turning me into that guy, but everything is turning into shit and cancer, and AI is only making an already bad situation worse. Bots, trolls, psychopaths, psyops and all else aside, anything put on to the public web now only contributes to its metastasis by feeding the AI machine. It's all poisoned now.

Closed, gatekept communities with ephemeral posts and aggressive moderation, which only share knowledge within a limited and trusted circle of confirmed humans, and only for a limited time, designed to be as hostile as possible to sharing and interacting the open web, seem to be the only possible way forward. At least until AI inevitably consumes that as well.

nemomarx · 2 years ago

Well, unless Discord starts selling it to ai companies right?

OpenAI trains GPT on their own Discord server, apparently. If you copy paste a chatlog from any Discord server into GPT completion playground, it has a very strong tendency to regress into a chatlog about GPT, just from that particular chatlog format.

queuebert · 2 years ago

No, that's never happened before. You're crazy.

kaashif · 2 years ago

*until

If people believe giving all information to one company and having it unindexable and impossible to find on the open internet is a way to keep your data safe, I have an alternative idea.

This unindexability means Discord could charge a much higher price when selling this data.

Imagine the rich economic insights we could get from a Discord AI trained on billions of messages in crypto shitcoin channels. /s

wussboy · 2 years ago

They wouldn’t…would they? /s

theonlybutlet · 2 years ago

I can't see how being used as training data has anything to do with this problem. Being able to differentiate between the AI slop and the accurate information is the issue.

unglaublich · 2 years ago

Differentiation becomes harder the better AIs perform, which is currently bound by data availability and quality.

welder · 2 years ago

Discord is searchable: https://www.answeroverflow.com/

raesene9 · 2 years ago

I think that answer overflow is opt-in, that is individual communities have to actively join it, for their content to show up. That would mean that (unless answer overflow becomes very popular), most discord content isn't visible that way.

CaptainFever · 2 years ago

I can't really see a relevance between "we should spend more time with trusted people", which is an argument for restricting who can write to our online spaces, and "we should be unindexable and untrainable", which is an argument for restricting who can read our online spaces.

I still hold that moving to proprietary, informational-black-hole platforms like Discord is a bad thing. Sure, use platforms that don't allow guest writing access to keep out spam; but this doesn't mean you should restrict read access. One big example: Lobsters. Or better-curated search engines and indexes.

Read access to humans means read access to AIs. We can't stop the cancer but we can at least try to slow its spread.

The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at.

What if the AI apocalypse takes this form?

    - Social Media takes over all discourse
    - Regurgitated AI crap takes over all Social Media
    - Intellectual level of human beings spirals downward as a result

Neural networks will degenerate in the process of learning from their own hallucinations, and humans will degenerate in the process of applying the degenerated neural networks. This process is called "neural network collapse". https://arxiv.org/abs/2305.17493v2 It can only be countered by a collective neural network of all the minds of humanity. For mutual validation and self-improvement of LLM and humans, we need the ability to match the knowledge of artificial intelligence with collective intelligence. Only the CyberPravda project is the practical solution to avoid the collapse of large language models.

Discord will die and there's no way that I'm aware of to easily export all that information.

pawelmurias · 2 years ago

AI spam bots will invade discord.

TillE · 2 years ago

And they'll get banned by moderators. Ultimately that's the key ingredient in any good strategy here: human curation.

BizarreByte · 2 years ago

Mods will ban them and new users will be forced to verify via voice/video chat/livestream.

they will sell. the big guys are gobbling up _anything_ they can get their hands on.

nunez · 2 years ago

This is an old problem that LLM-generated content only accelerated. LMGTFY died when Google tripled down on growing their ad revenue and adtech dominance and SEO ran rampant throughout search results. It is fairly difficult to get non-biased factual information from a naked query these days, which is why I try to search for info on Reddit first.

This isn't a panacea either given that it's been chock-ful of astroturfed content for the last few years, but older threads from when Reddit was less popular and manipulatable or threads from small communities are usually good bets.

causal · 2 years ago

Finally switched to Kagi when I realized Google could not find a particular ThreeJS class doc page for me no matter what keywords I used, I had to paste the very URL of the page for it to appear at the top of my search results.

Kagi got it first try using the class name. Paid search is the way, ad incentives are at odds with search. Made Kagi my address bar default search and it's been great.

Gibson_v1 · 2 years ago

Maybe I'll try Kagi. I've had a hell of a time googling docs lately. I've been experimenting with different libraries on somde side projects and it feels like I'm always scrolling past stuff like GeeksForGeeks and various sites that look like some sort of AI generated stuff just to get to official docs or github links.

worldsayshi · 2 years ago

So does that mean that the free (as in beer) internet is dying and ad-tech killed it?

jjtheblunt · 2 years ago

Agreed: Kagi is straight from the future and worth every cent.

drcongo · 2 years ago

I had the same experience with a slightly esoteric Django class back when Kagi first appeared. I subscribed straight away and every now and then when I end up on any other search engine I'm remind what a good decision that was.

mangodrunk · 2 years ago

Google was amazing at one point in time. In search of profit, it got worse. Maybe Kagi can withstand it, but I don’t see much difference between one company and another.

novemp · 2 years ago

Too bad Kagi is also investing in LLMs.

Absolutely love Kagi for everything except shopping and maps.

larodi · 2 years ago

one thing to always remember, which may also easily repulse you from ever using google search again - it does not give search results. it generates a carefully crafted page which caters to your bubble. so does FB, so does Twitter, etc. just using different algos. Google search does not return the same results for the same query for different people, which a) makes it so different from AltaVista and historical search engines (from ElasticSearch if you want); and b) this is enough to NOT treat it as a search engine, even though is still billed as one....but as a personal wall of ad-designated BS.

htrp · 2 years ago

Well I feel like that would be ok if 1) they told you this 2) it actually gave you those relevant results

It does make troubleshooting officially impossible, can't tell people its the 3rd link on this specific query in google.

rvba · 2 years ago

You think that spammers dont use AI to write on reddit now?

jeremyjh · 2 years ago

"re"-read GP's last sentence.

erellsworth · 2 years ago

I'm old enough to remember when the Internet was full of organic dog shit.

oceanplexian · 2 years ago

Humans are the original bullshit generator. AI is only doing what humans have been doing since forever.

agentultra · 2 years ago

So instead of putting out the fires, in the interest of improving the situation, we'll make the fires bigger?

A good deal of humans care about the truth. Some of them actively seek to deceive and avoid the truth -- liars, we tend to dislike them. But the ones both sides dislike are the ones who disregard the truth... ie: bullshitters -- the, "that's just your opinion, man," the, "what even is the truth anyway?" people.

Yeah, we also produce literal shit, and we have a toilet and plumbing to deal with that.

If you welcomed a giant robot in your house that produces 100x as much shit as a human, you don't have the infrastructure to deal with it.

It was never an issue of yes or no, it's an issue of how much.

_heimdall · 2 years ago

My bull would beg to differ. He and his bovine forefathers have been generating bullshit for much, much longer than humans have.

gorjusborg · 2 years ago

Humans care about reputation.

Humans get tired.

Thrymr · 2 years ago

Indeed, why are innocent dogs and bulls getting the blame here? Clearly this is about artificial human shit.

cranberryturkey · 2 years ago

time to abandon google and go back to web rings.

bovermyer · 2 years ago

And/or human-curated web directories.

7thaccount · 2 years ago

I don't know how the old web worked, but something decentralized makes sense to me.

jebarker · 2 years ago

There are purveyors of artisanal organic bullshit now, but it's pricey.

gremlinsinc · 2 years ago

you mean like stack overflow scraped answer spam? wasn't that like last year? I hardly ever Google anymore I just ask Bing chat.

Human shitposting is at least entertaining.

wharvle · 2 years ago

I think there's actually some deep insight here into why we tend not to like too much AI in our art.

Consider: TimeCube.

Created by a human? It's nonsense, but... it's fascinating. Engaging. Memorable. Thought-provoking (in a meta kind of way, at any rate). I dare say, worthy of preservation.

If TimeCube didn't exist, and an AI generated the exact same site today? Boring. Not worth more than a glance. Disposable. But why? It's the same!

------

Right or wrong, we value communication more when there's a human connection on the other side—when there's a mind on the other side to pick at, between the lines of what's explicitly communicated, and continuity for ongoing or repeated communication that could reveal more of what's behind the veil. There's another level of understanding we feel like we can achieve, when a human communicates, and expectation, an anticipation of more, of enticing mystery, of a mind that may reflect back on our own in ways that we find enlightening, revealing, or simply to grant positive familiar-feeling and a sense of belonging.

What's remarkable is this remains true even when the content of the communication is rather shit. Like TimeCube.

All of that is lost when an LLM generates text. I think that's also why we feel deceived by LLM use when it masquerades as human, even if what's communicated is identical: it's because we go looking for that other level of communication, and if that's not there, giving the impression it might be really is misleading.

This may change, I suppose, if "AI" develops rather a lot farther than it is now and we begin to feel like we're getting a window into a true other when it generates output, but right now, it's plainly far away from that.

red_admiral · 2 years ago

At the end of the day, ads exist to make money, and until the bots have credit cards that means money from humans. Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.

Google will start dealing with this problem when it starts appearing in their budget in big enough numbers. The tech layoffs we're hearing about from one company after another - google is mentioned in another HN thread today - may be a sign of which way the wind is blowing.

novagameco · 2 years ago

AI is generating content, not consuming it. If people are easily duped by fake or bad products with advertisements or content generated by AI (which, they are) then this will continue to drive revenue for Google. The only reason Google dislikes SEO manipulation is because it's a way for sites to get top real estate on google without paying for the promoted results; the quality of the product doesn't matter to them

It only becomes a problem when it results in a collapse of trust; when people have been burned by too many bad products and decide to no longer trust the sites or search results which they used to. Due to my job, I get a lot of ads for gray market drugs on Instagram. I know, however, that all of these are not tested by the FDA and most are either snake oil or research chemicals masquerading as Amanita Muscaria or Delta-8 THC, and so I ignore these ads.

lacrimacida · 2 years ago

If AI is good at faking content what stops its use for faking consumtion/engagement. In my mind that’s the next logical step in the internet enshitification.

lolinder · 2 years ago

> Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.

Google might notice but has no incentive to spend money to stop it because they're not the ones the humans stopped paying. The companies that advertise with Google might notice a drop in ROI on their ads, but it will be a while before they abandon Google because most of them don't see any other option.

I dread what the internet will look like if we wait for this this to hit Google's bottom line.

ramesh31 · 2 years ago

>At the end of the day, ads exist to make money, and until the bots have credit cards that means money from humans. Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.

You seem to have a hilariously over generous opinion of ad tech spending. The biggest players are already doing this themselves.

mrweasel · 2 years ago

That's an interesting take, but Google won't suffer before the advertisers decide that they are wasting their money on online advertising. Some topics should already have dried up, but perhaps scams are fueling the advertising machine for now on those. You can't really use Google for things like fitness or weight-loss. When we remodeled it also became clear that building materials and especially paint have become unsearchable. In the end I resulted to just go to the store and ask, it as the only way to get reliable information and recommendation.

Google is still working for most areas, but where it's really good is the ads for products. If there's something you want to buy, Googles ads engine will find it for you, you just have to know exactly what you want.

burkaman · 2 years ago

Why wouldn't it result in humans spending dollars? The ads are real and the visitors are real, it doesn't matter if the content is real. In fact people are probably more likely to click on an ad if the page it's on is generic and uninteresting.

My reasoning is, there's topics where I don't bother to go to google anymore because I know the results will be crap. That way google loses any way to show me ads when I'm searching for these topics, or get paid for click-throughs, or to profile my interests as accurately as they could otherwise.

There's categories of products where I spend money regularly, but I go directly to category-specific sites so google again loses out on the ability to take their cut as middleman, which I'd happily let them take - and maybe discover vendors other than the ones I know - if they provided me with higher-quality results than they do now.

bbarnett · 2 years ago

Big tech has always been laying off, firing, even in the best of years.

And today, right now, they're all still hiring.

trey-jones · 2 years ago

Before the "AI" takeover, it was already full of SEO-mandated human-generated bullshit, so we haven't actually lost that much in the last couple of years. I've been saying it for almost as long as I've been in the industry, which is well over a decade now.

If this is true it implies all news and history for the past 10 years is also human-generated bullshit. I’m not saying you’re wrong – just that you have to follow your beliefs to their conclusions.

I apologize; I didn't mean to imply anything about ALL of anything. My main complaint is that the things that are not bullshit on the internet have largely been buried beneath bullshit for a long time.

waveBidder · 2 years ago

humans are quite capable of generating bullshit, I don't think that's ever been in contention. doesn't mean all or even most human generated content is garbage.