Claude can now search the web

Searching the web is a great feature in theory, but every implementation I've used so far looks at the top X hits and then interprets it to be the correct answer.

When you're talking to an LLM about popular topics or common errors, the top results are often just blogspam or unresolved forum posts, so the you never get an answer to your problem.

More of an indicator that web search is more unusable than ever, but interesting that it affects the performance of generative systems, nonetheless.

Almondsetat · a year ago

>looks at the top X hits and then interprets it to be the correct answer.

LLMs are truly reaching human-like behavior then

dstroot · a year ago

https://danstroot.imgix.net/assets/blog/img/comic2.png?auto=...

yoyohello13 · a year ago

The longer I've been in the workforce, the more I realize most humans actually kind of suck at their jobs. LLMs being more human like is the opposite of what I want.

dartos · a year ago

Splitting hairs, but LLMs themselves don’t search.

LLMs themselves don’t choose the top X.

That’s all regular flows written by humans run via tool calls after the intent of your message has been funneled into one of a few pre-defined intents.

pizza · a year ago

It would probably be really great for web searching llms to let you calibrate how they should look for info by letting you do a small demonstration of how you would pick options yourself, then storing that preference feedback in your profile’s system prompt somehow.

rendaw · a year ago

Here though they're not replacing a random person, they're replacing _you_ (doing the search yourself). _You_ wouldn't look at the top X hits then assume it's the correct answer.

ChrisRR · 10 months ago

Bold of you to assume that most people even bother googling simple questions

wvh · a year ago

Be careful what you call AI, you might just get what you wish for...

LightBug1 · a year ago

Degenerative AI ?

johndhi · a year ago

lol

johntb86 · a year ago

I've found that OpenAI's Deep Research seems to be much better at this, including finding an obscure StackOverflow post that solved a problem I had, or finding travel wiki sites that actually answered questions I had around traveling around Poland. However it finds its pages, they're much better than just the top N Google results.

wongarsu · a year ago

Grok's DeepSearch and DeeperSearch are also pretty good, and you can look at their stream of thought to see how it reaches its results.

Not sure how OpenAIs version works, but grok's approach is to do multiple rounds of searches, each round more specific and informed by previous results

dontlikeyoueith · a year ago

They're probably doing RAG on a huge chunk of the internet, i.e. they built their own task-specific search engine.

matwood · a year ago

I'm glad you mentioned this. I asked Deep Research to lay out a tax strategy in a foreign country and it cited a ton of great research I hadn't yet found.

HankWozHere · a year ago

Kagi Assistant allows you to do search with LLM queries. So far I feel it bears reliable results. For instance - I tried couple of queries for product suggestions and came back with some good results. Whilst it’s a premium service , I find the offering to be of good value.

chrisweekly · a year ago

Yeah, Kagi's search results are so much better than Google's, it defies comparison.

eli · a year ago

It's neat but I've found the value kinda variable. It seems heavily influenced by whatever the first few hits are for a query based on your question, so if it's the kind of question that can be answered with a simple search it works well. But of course those are the kinds of questions where you need it the least.

I find myself much more often using their "Quick Answer" feature, which shows a brief LLM answer above the results themselves. Makes it easier to see where it's getting things from and whether I need to try the question a different way.

dmazin · a year ago

Has anyone compared Perplexity with Kagi Assistant?

I am always looking for Perplexity alternatives. I already pay for Kagi and would be happy to upgrade to the ultimate plan if it truly can replace Perplexity.

hooli_gan · a year ago

Does it just start a search or does the chat continue with the results? Would be cool to continue the chat with result, which were filtered acording to the blacklist.

KoolKat23 · a year ago

I have a subscription, please could I ask how you do this? I only know of the append ? Feature.

mavamaarten · a year ago

Oh yeah this is very much the case. Every time I ask ChatGPT something simple (thinking it'd be a perfect fit for an LLM, not for a google search) and it starts searching, I already know the "answer" is going to be garbage.

spoaceman7777 · a year ago

I have in my prompt for it to always use search, no matter what, and I get pretty decent results. Of course, I also question most of its answers, forcing it to prove to me that its answer is correct.

Just takes some prompt tweaking, redos, and followups.

It's like having a really smart human skim the first page of Google and give me its take, and then I can ask it to do more searches to corroborate what it said.

NavinF · a year ago

Try their Deep Research or grok's DeepSearch. Both do many searches and read many articles over a couple of minutes

osigurdson · a year ago

That is interesting. I have often been amazed at how good it is at picking up when to search vs use its weights. My biggest problem with ChatGPT is the horrendous glitchyness.

bambax · a year ago

"Searching" doesn't mean much without information about the ranking algorithm or the search provider, because with most searches there will be millions of results and it's important to know how the first results have been determined.

It's amazing that the post by Anthropic doesn't say anything about that. Do they maintain their own index and search infrastructure? (Probably not?) Or do they have a partnership with Bing or Google or some other player?

andai · a year ago

>top results are blogspam

It gets even better. When I first tested this feature in Bard, it gave me an obviously wrong answer. But it provided two references. Which turned out to be AI generated web pages.

Oddly enough in my own Googles I could not even find those pages in the results.

dspillett · a year ago

> Bard […] it provided two references. Which turned out to be AI generated web pages.

Welcome to the Habsburg Internet.

kelseyfrog · a year ago

Search engines now have an incentive to offer a B2B search product that solves the blogspam problem. Don't worry, the AIs will get good search results, and you'll still get the version that's SEOed to the point of uselessness.

wenc · a year ago

I just tried Claude’s web search. It works pretty well.

I’m not sure if Claude does any reranking (see Cohere Reranker) where it reorders the top n results or just relies on Google’s ranking.

But a web search that does re-ranking should reduce the amount of blogspam or incomplete answers. Web search isn’t inherently a lost cause.

macrolime · a year ago

Deep search/deep research in grok, chatgpt, perplexity etc works much better. It can also do things like search in different languages. Wonder about something in some foreign country? Ask it to search in the local language and find things you won't find in English.

wickedsight · a year ago

> Ask it to search in the local language and find things you won't find in English.

Yeah, this is one of my favorite use cases. Living in Europe, surrounded by different languages, this makes searching stuff in other countries so much more convenient.

PStamatiou · a year ago

Could not agree more. I wrote in detail about some of these issues last week https://paulstamatiou.com/browse-no-more

osigurdson · a year ago

My experience with ChatGPT is really good. I find standard web searches very annoying now.

GraffitiTim · a year ago

Exa (YC S21) is trying to solve this problem by re-indexing the web in an LLM-friendly way.

ipaddr · a year ago

2021? how are they doing?

magackame · a year ago

Google search is crap. It seems to be a sentiment among many HNers, but is it really that bad? I mostly use it for programming, so documentation/forums and it works out greatly. For some queries it even returns personal blogs (which people seem to bash google for not happening). Of course there are some queries that return purely AI blogspam, but reformatting the query with a bit more thought usually solves it. I wonder if that is a US thing? Do search results differ greatly based on the region?

Beijinger · a year ago

Is google search bad? Click here to find ten reasons why it is bad and 10 reasons why you should still use it.

Yes, it is that bad.

Website of Nike? Website of Starbucks? Likely position number one.

Every product, category etc., e.g. what rice cooker should I buy? Is diseased by link and affiliate spam. There is a reason why people put +reddit on search terms.

harrall · a year ago

I watched one of my friends who says Google is useless use Google one day.

If I were looking for a song, I would type in something like “song used at beginning of X movie indie rock”

He would type in “X songs.”

I basically find everything in Google in one search and it takes him several. I type in my thought straight whereas he seems to treat Google like a dumb keyword index.

hansmayer · a year ago

I mean, for those of us who used it since way before the '20s, it's not really a sentiment - it's a fact. You used to be able to type in 3 words and whatever error message your stack trace was showing, and the first 3 links returned were very likely a definitive source to solving your problem.Written by a human, and believe my word for it - it was much better back then than the crap you get out of torturing whatever your LLM of choice is. However the weird MBas took it over to and did exactly what you are describing - forced people to spend more time "engaging with the platform" (to increase the revenue). As you can see, they seem to have achieved this goal, and we all now spend time reformatting our queries as they wanted us to, and yes Google search is complete crap.

tim333 · a year ago

Personally I like Google search. I think it's not crap - actually quite good. I use it multiple times a day (just checked - about 42 times yesterday). It's different from what it was 10 years ago but still works for most stuff.

That said I also use Perplexity which does things Google never really did.

I've got a theory that people just like to be negative about stuff, especially market leaders, and are a bit in denial as to how it still has the majority search share in spite of many billions spent trying to compete with it and ernest HN posts saying Google is crap use Kagi. For amusement I tried to find their share of search and Google is approx 90%, Kagi approx 0.01% by my calculations.

simonw · a year ago

How long have you been using Google search for?

It used to be SO much less likely to return junk.

UltraSane · a year ago

Google randomly deletes words from your search term. Why would anyone think that was a good idea?

sky2224 · a year ago

It's kind of surprising to me that I can't customize the search ability at all with a lot of models (at least I wasn't really able to last time I checked). Would providing a blacklist to the model really be that hard?

Actually, it's astounding to me that companies haven't created a more user friendly customization interface for models. The only way to "customize" things would be through the chat interface, but for some reason everyone seems to have forgotten that configuration buttons can exist.

grapesodaaaaa · a year ago

> Actually, it's astounding to me that companies haven't created a more user friendly customization interface for models.

To be fair, LLM technology in its current form, is still relatively new. I would also like to see what you are suggesting, though.

lairv · a year ago

Overall LLMs (that I've tested) don't know how to use a search engine, their queries are bad and naive, probably because the way to use a search engine isn't part of training data, it's just something that people learn to do by using them. Maybe Google has the data to make LLMs good at using search engines but would it serve their business?

tymonPartyLate · a year ago

This is actually not true. I'm getting traffic from ChatGpt and Perplexity to my website which is fairly new, just launched a few months ago. Our pages rarely rank in the top 4, but the AI answer engines mange to find them anyways. And I'm talking about traffic with UTM params / referrals from chatgpt, not their scraper bots.

ForTheKidz · a year ago

If chatgpt is scraping the web, why can they not link tokens to source of token? being able to cite where they learned something would explode the value of their chatbot. At least a couple of orders of magnitude more value. Without this chatbots are mostly a coding-autocomplete tool for me—lots of people have takes, but it's the tying into the internet that makes a take from an unknown entity really valuable.

Perplexity certainly already approximates this (not sure if it's at a token level, but it can cite sources. I just assumed they were using a RAG.)

hibikir · a year ago

Imagine how much fun it will be when the breakthrough in search engine quality comes from companies building a better engine to get good LLM answers.

This is ultimately google's problem: They are making money from the fact that the page is now mostly ads and not necessarily going to lead to a good, quick answer, leading to even more ads. They probably lose money if they make their search better

UnreachableCode · a year ago

> web search is more unusable than ever

I’m curious why I’m seeing a lot of people thinking this lately. Google definitely made the algorithm worse for customers and better for ads, but I’m almost always able to find what I’m looking for in the working day still. What are other people’s experiences?

vbezhenar · a year ago

My experience is that Google works perfectly for me and I almost never have any issues with it, despite all the doomsaying.

cudgy · a year ago

AI results typically blow away Google results in both quality and definitely speed.

For example, when searching for product information, Google results in top 50 to 100 listed items titled “the 10 best …“ full of vapid articles that provide little to no insight beyond what is provided in a manufacturers product sheet. Many times I have to add “Reddit” to my search to try and find real opinions about a product or give up and go to Youtube review videos from trusted sources.

For technical searches like programming questions, AI is basically immediately nailing most basic questions while Google results require scanning numerous somewhat related results from technical discussion forums, many of which are outdated.

input_sh · a year ago

This about sums it up: https://www.reddit.com/r/mildlyinfuriating/comments/1hsf0to/...

PetahNZ · a year ago

It would be nice if I could tell it what page to look at (maybe you can, I am not sure). Often if I am getting an LLM to write some code that I can see is obviously wrong, I would love to say here is the docs ... use that to formulate your response.

oytis · a year ago

Well, they are professionals, they sure add "reddit" to every query.

taude · a year ago

Do you think that if it's a non-Google company, that maybe doesn't rank search by ad payment $$$, that this new company could in theory do a better job?

OscarTheGrinch · a year ago

If only search engines weren't also in the business of inserting unverifiable AI assertions into our information ecosystem.

johnisgood · a year ago

Yeah, this is why I almost never enable the search feature. Hopefully Claude (I have not tried) has a way of disabling it.

Xenoamorphous · a year ago

Is there any viable alternative to pass knowledge to the LLMs that goes beyond their training cut off date?

jonny_eh · a year ago

Via their context window, but new knowledge could easily fill it up.

collyw · a year ago

Isn't that the same as any place (like here for example), that uses an up-voting system?

colordrops · a year ago

Ugh, what a nightmare, now search engines are going to start optimizing for bots.

darkhorse13 · a year ago

This is basically AGI because that's what we humans do.

Tycho · a year ago

It’s good if it hits on high quality sources like ons.gov.uk

ryukoposting · a year ago

I reiterate: https://news.ycombinator.com/context?id=42012631

RAG was dead on arrival because it uses the same piss-poor results a human would, wrapped in more obfuscation and unwanted tangents.

My question is why the degradation of search wouldn't affect LLMs. These chatbot god-oracle businesses are already unprofitable because of their massive energy footprint, now you expect them to build their own search engine in-house to try to circumvent SEO spam? And you expect SEO spam to not catch up with whatever tricks they use? Come on, people.

chairmansteve · a year ago

It needs to use Kagi.

yorkeccak · a year ago

founder here. solved this problem. old news mate. https://exchange.valyu.network

yorkeccak · a year ago

AI-native search api to retrieve over web/proprietary content - full semantic search (e.g. we indexed all of arxiv), reranking built in, simple pricing, cheap

zk108 · a year ago

We’re giving away free credits to try out our platform — no card required. If you’re building with AI and need quality data, we’d love your feedback!

elliotrpmorris · a year ago

Lol so true

blackeyeblitzar · a year ago

For me LLMs have basically removed any need to visit search engines. I was already not using Google due to how bad its interface had become, but I feel like LLMs at least are more efficient as an interface even if they’re still looking at the same blogspam or unresolved forum posts. My anecdotal experience though, is that I get better answers from LLMs, perhaps because I am able to give them really detailed prompts that seem to improve the answers based how specific I get. Generic search engines don’t seem to do that, in my experience.

Dead Comment

MuffinFlavored · a year ago

> the top results are often just blogspam

top results are blogspam but the LLM isn't? /s

I wonder if it will actually respect the robots.txt this time.

creddit · a year ago

I don't think it should. If a user asks the AI to read the web for them, it should read the web for them. This isn't a vacuum charged with crawling the web, it's an adhoc GET request.

birken · a year ago

The AI isn't "reading the web" though, they are reading the top hits on the search results, and are free-riding on the access that Google/Bing gets in order to provide actual user traffic to their sites. Many webmasters specifically opt their pages out of being in the search results (via robots.txt and/or "noindex" directives) when they believe the cost/benefit of the bot traffic isn't worth the user traffic they may get from being in the search results.

One of my websites that gets a decent amount of traffic has pretty close to a 1-1 ratio of Googlebot accesses compared to real user traffic referred from Google. As a webmaster I'm happy with this and continue to allow Google to access the site.

If ChatGPT is giving my website a ratio of 100 bot accesses (or more) compared to 1 actual user sent to my site, I very much should have to right to decline their access.

1shooner · a year ago

>You can now use Claude to search the internet to provide more up-to-date and relevant responses.

It's a search engine. You 'ask it to read the web' just like you asked Google to, except Google used to actually give the website traffic.

I appreciate the concept of an AI User-agent, but without a business model that pays for the content creation, this is just going to lead to the death of anonymously accessible content.

internetter · a year ago

You could make this justification for a lot of unapproved bot activity.

scoofy · a year ago

Many if not most websites are paid for by eyeballs not by get requests. A bot is a bot is a bot. Respect robots.txt or expect to have your IPs banned.

bayindirh · a year ago

How can you be so sure? Processors love locality, so they fetch the data around the requested address. Intel even used to give names to that.

So, similarly, LLM companies can see this as a signal to crawl to whole site to add to their training sets and learn from it, if the same URL is hit for a couple of times in a relatively short time period.

usrbinbash · a year ago

> This isn't a vacuum charged with crawling the web, it's an adhoc GET request.

Doesn't matter. The robots-exclusion-standard is not just about webcrawlers. A `robots.txt` can list arbitrary UserAgents.

Of course, an AI with automated websearch could ignore that, as can webcrawlers.

If they chose do that, then at some point, some server admins might, (again, same as with non-compliant webcrawlers), use more drastic measures to reduce the load, by simply blocking these accesses.

For that reason alone, it will pay off to comply with established standards in the long run.

mvdtnz · a year ago

No thank you, when I define a robots.txt file I expect all automated systems to respect it.

GuinansEyebrows · a year ago

Someday I’ll have enough “karma” to downvote things like this.

The agent should respect robots.txt no matter who is using the Robot.

Deleted Comment

JimDabell · a year ago

The LLM shouldn’t.

robots.txt is intended to control recursive fetches. It is not intended to block any and all access.

You can test this out using wget. Fetch a URL with wget. You will see that it only fetches that URL. Now pass it the --recursive flag. It will now fetch that URL, parse the links, fetch robots.txt, then fetch the permitted links. And so on.

wget respects robots.txt. But it doesn’t even bother looking at it if it’s only fetching a single URL because it isn’t acting recursively, so robots.txt does not apply.

The same applies to Claude. Whatever search index they are using, the crawler for that search index needs to respect robots.txt because it’s acting recursively. But when the user asks the LLM to look at web results, it’s just getting a single set of URLs from that index and fetching them – assuming it’s even doing that and not using a cached version. It’s not acting recursively, so robots.txt does not apply.

I know a lot of people want to block any and all AI fetches from their sites, but robots.txt is the wrong mechanism if you want to do that. It’s simply not designed to do that. It is only designed for crawlers, i.e. software that automatically fetches links recursively.

manquer · a year ago

While robots.txt is not there to directly prevent automated requests, it does prevent crawling which is needed for search indices.

Without recursive crawling, it will not possible for a engine to know what are valid urls[1]. They will otherwise either have to brute-force say HEAD calls for all/common string combinations and see if they return 404s or more realistically have to crawl the site to "discover" pages.

The issue of summarizing specific a URL on demand is a different problem[2] and not related to issue at hand of search tools doing crawling at scale and depriving all traffic

Robots.txt does absolutely apply to LLMs engines and search engines equally. All types of engines create indices of some nature (RAG, Inverted Index whatever) by crawling, sometimes LLM enginers have been very aggressive without respecting robots.txt limits, as many webmasters have reported over the last couple of years.

---

[1] Unless published in sitemap.xml of course.

[2] You need to have the unique URL to ask the llm to summarize in the first place, which means you likely visited the page already, while someone sharing a link with you and a tool automatically summarizing the page deprives the webmaster of impressions and thus ad revenue or sales.

This is common usage pattern in messaging apps from Slack to iMessages and been so for a decade or more, also in news aggregators to social media sites, and webmasters have managed to live with this one way or another already.

mtkd · a year ago

Do really think LLM vendors that download 80TB+ of data over torrents are going to be labeling their crawler agents correctly and running them out of known datacenters?

Arnt · a year ago

The ones I noticed in my logfiles behave impeccably: retrieve robots.txt every week or so and act on it.

(I noticed Claude, OpenAI and a couple of others whose names were less familiar to me.)

teh_infallible · a year ago

Apparently they use smart appliances to scrape websites from residential accounts.

noddleah · a year ago

https://thelibre.news/foss-infrastructure-is-under-attack-by...

SoftTalker · a year ago

Maybe we need a new "ai.txt" that says "yes I mean you, ChatGPT et. al."

JadoJodo · a year ago

https://github.com/ai-robots-txt/ai.robots.txt

verdverm · a year ago

Bluesky / ATProto has a proposal for User Intents for data. More semantics than robots.txt, but equally unenforceable. Usage with AI is one of the intents to be signaled by users

https://github.com/bluesky-social/proposals/tree/main/0008-u...

whoami_nr · a year ago

Small difference. Its called llms.txt

https://llmstxt.org/

jsheard · a year ago

If they don't comply with robots.txt, why would they comply with anything else?

furyofantares · a year ago

Presumably the crawler that produces whatever index it uses does, which is how it knows what sites to read. Unless you provide it a URL yourself I guess, in which case, it shouldn't.

explain · a year ago

robots.txt is meant for automated crawlers, not human-driven actions.

zupa-hu · a year ago

Every automated crawler follows human-driven actions.

nicce · a year ago

It must form the search index somehow. That is prior the human action. Simply it would not find the page at all if it respects.

bayindirh · a year ago

So, do you mean LLMs are human-like and conscious?

I thought they were just machine code running on part GPU and part CPU.

postexitus · a year ago

if a human triggers the web crawlers by pressing a button, should they ignore robots.txt?

dudeinjapan · a year ago

In practice, robots.txt is to control which pages appear in Google results, which is respected as a matter of courtesy, not legality. It doesn't prevent proxies etc. from accessing your sites.

micromacrofoot · a year ago

almost no one does, robots.txt is practically a joke at this point — right up there with autocomplete=off

Demiurge · a year ago

In what circles is it a joke? Google bots seem to respect it on my sites according to logs.

geekrax · a year ago

I have replaced all robots.txt rules with simple WAF rules, which are cheaper to maintain than dealing with offending bots.

NewJazz · a year ago

Why wonder. You can test for yourself.

tylersmith · a year ago

It's a user agent not a robot.

Y_Y · a year ago

Why not both?