Google's New AI-Powered Browser Could Mark the End of the Human Internet

This could be the weirdest kind of moat yet. If you crawled all the things and built a model before everything became bot-generated, you can get clean post-2024 human data from the human inputs to your tool. If you haven't, then maybe you're stuck with the 2023-and-earlier crawls, limiting your models' relevance. We've already seen that the feedback loops of training models on model outputs isn't nearly as valuable, and can get wacky fast. It'll be weird to see how that plays out.

baq · 2 years ago

See also a physical analog: https://en.m.wikipedia.org/wiki/Low-background_steel

throwing_away · 2 years ago

That is such a fantastic comparison and this is the first place I've heard it made. I'll be stealing it, thank you :)

thunderbong · 2 years ago

I was immediately reminded of this too.

I'm wondering now, does the same effect apply to regular HN readers? In the sense that, we're contaminated (for lack of a better word), and are unable to see things out there without having equivalent connections pop into our heads! :)

nyc_data_geek1 · 2 years ago

The analogy I've been using is an ouroboros of bullshit, consuming ai generated bullshit to generate ai bullshit to consume to generate ai bullshit ad infinitum

sheepscreek · 2 years ago

Very cool - I wonder what else fits the analogy. No plastic meat?

carlosjobim · 2 years ago

The shadow libraries are the largest collection of human knowledge to date, and completely untainted by AI. Any search engine that crawls and indexes them will have a tenfold increase in quality and be as revolutionary as the invention of the internet. No LLM model needed.

On top of that, there is no incentive for AI generated content to enter the shadow libraries at all.

DaiPlusPlus · 2 years ago

> On top of that, there is no incentive for AI generated content to enter the shadow libraries at all.

I think you underestimate just how many people/entities/forces that exist that would love to see further decline, division, and discord in the Anglosphere...

ilaksh · 2 years ago

What makes you assume they have not already been used by OpenAI, Google, or Baidu, etc?

CuriouslyC · 2 years ago

Except that human generated doesn't really seem to matter, all that seems to matter is some basic guard rails on the data you choose. Meta has models generating training data then grading it and select the best examples to reincorporate into the training set, and it's improving benchmarks.

kromem · 2 years ago

The problem with model collapse is reinforcing means at the costs of the edges of your distribution curve, particularly on repeat.

One of the things that is being overlooked is that offsetting the job loss from AI replacing mean work is that there's going to be new markets for edge case creation and curation.

Jackson Pollock and Hunter S Thompson for the AI generation with a primary audience of AI vs humans, sponsored by large tech and data companies like the new Renaissance Vatican.

inerte · 2 years ago

Another way they can use this is to log the generated text, and when crawling pages if they find text that Chrome didn’t generate, there’s a chance it was a human, or another tool. But I doubt if people have access to this on Chrome they will really use another tool, so Google can probably differentiate between sources.

HeatrayEnjoyer · 2 years ago

>We've already seen that the feedback loops of training models on model outputs isn't nearly as valuable, and can get wacky fast.

IIRC this is less true with the very largest SOTA models, and that OpenAI is now using synthetic data with success.

kjkjadksj · 2 years ago

Reminds me of how they need to raise sunken wwi ships to get clean steel for certain applications after all the nuclear weapon testing happened.

mensetmanusman · 2 years ago

It still helps build synthetic data.

In the example screenshot, the assistant takes this input:

> im interested in this place - do you allow dogs?

and writes this output:

> I'm interested in your property. Its exactly what I've been looking for. To make it perfect for me, I just need to know if the unit is pet-friendly. Thank you for your time and consideration. I look forward to your response.

The input is concise and to the point, the latter is infuriatingly verbose and formulaic. But I guess it'll be easy to filter out humans I would actually be willing to communicate with.

jvanderbot · 2 years ago

My wife makes an living asking people for things.

She writes like the latter example. I find myself continuously frustrated by people. She loves them. I find that I'm constantly rejected when suggesting things, she isn't.

I'm with you, but I think we're wrong.

mega_dingus · 2 years ago

I was talking to somebody who worked HR at a multi-disciplinary shop, and she said you could always identify the emails coming from programmers

It was a complaint, definitely not a compliment. She said programmers listed things out in bullet points and bluntly to-the-point. She complained they were dry, intimidating, and she hated dealing with them

I still write concisely and with bullet points, when writing to other programmers. But I now expand things when talking to everybody else. And I've found I get better responses

kristjansson · 2 years ago

It shouldn't be terribly surprising that humans incorporate signals beyond pure denotational content of message? Text is a pretty low-bandwidth channel, so we infer as much meaning as possible from the bits of information we receive. All the stylistic choices encode additional information about the sender; part of one's job as an effective communicator is evaluating the effect of all those choices and adapting the entire message (not just its content) to convey the intended impression (not just the meaning).

Incidentally, this is why AI-writing isn't necessarily better communication. The robot can help translate intentions into prose, but it can't decide what one should actually intend to say.

anon373839 · 2 years ago

This reminds me of Craigslist. When I get a response that’s written in a terse and grammatically incorrect style, I ignore it. Experience tells me these transactions don’t tend to go well.

sirspacey · 2 years ago

This is it. This is why I think AI is a better writer than I am.

JohnFen · 2 years ago

The latter also says quite a lot that was just made up and wasn't even implied by the original.

SoftTalker · 2 years ago

There's a middle ground which is what a normal person would write:

I'm interested in your property, but I have a dog. Will that be a issue? Thank you!

I'm interested in your property, it looks like just what I need. But I need to know if you allow dogs. Thanks!

People are busy. The kind of filler in the AI example shows that you don't value their time more than you value trying to sound sophisticated when making a simple inquiry. But people also don't have time to decipher possibly cryptic text-message-shorthand. Think about your audience, and write accordingly.

NoZebra120vClip · 2 years ago

We are still working with yes/no questions. While perhaps a landlord may reply with more information, I would phrase them as open-ended: "What can you tell me regarding dog ownership in your community?" That is an invitation to describe pet deposit, size/breed limitations, places to walk them, etc.

stonogo · 2 years ago

It's not only pointlessly verbose, it ruins the intention behind the input! The user wants to know if they allow dogs, not pets. They can get a "yes we allow some pets" response and now they have to start all over to figure out which pets those are, whether dogs are included, etc.

This is a shitload of computational expenditure to make things objectively worse by introducing an entirely new class of problem to the original message. It's literally "I had a problem, so I used AI, and now I have two problems"

achrono · 2 years ago

Well, we obviously then need a de-verbosifier. In which case, how do you filter for your aforementioned humans?

mega_dingus · 2 years ago

Why is this downvoted? I consider it and its replies interesting and relevant

If there's an HN policy violation in this post, I'm legit curious what it is

RandomLensman · 2 years ago

When I take the output apart: The first sentence is to the point and short. The second is potentially redundant but might increase the likelihood of a reply. The third one is perhaps a bit over the top and could be merged shorter with the second (e.g., "... looking for, but I was wondering if ..."). Next one is just basic politeness. Last one feels optional but might at the margin increase likelihood/speed of reply.

Not perfect but not bad either (assuming a human reader on the receiving side).

emporas · 2 years ago

You can fine-tune LLMs in new styles, without even considering all the styles they are already trained on. The formulaic style response is not needed at all.

The formulaic response in the style of Coding Horror:

"Hey there! Your property has piqued my interest—it's what I've been looking for. Just a tiny detail left to seal the deal: Is the unit cool with pets? Thanks a bunch for your time and consideration. Anticipating your swift response!"

coffeebeqn · 2 years ago

It’s a BSifyer