Readit News logoReadit News
Posted by u/chaztaubelman a day ago
AI search engine – How to prevent bots?
Hi, I'm launching an AI search engine (ex Perplexity like). I don't want to force people to sign up to use it. I want free visitors be able to discover it and use it. However, I've had issues in the past with bots spamming usage, which exploded my costs.

What are the best methdos to prevent those bots, while also having a frictionless UX ? I've heard of Cloudflare. Will that pop for every user or only for those who are trully suspicious ?

Thanks

timshell · a day ago
Check out a demo of a similar tool we created (https://model-guessr.com/) that was bot-gated by Roundtable Proof of Human.

Happy to talk more details about PoH (disclaimer: I'm a cofounder and this is my YC S23 company)

reliefcrew · a day ago
Can you comment on the notion that Turnstile's primary goal isn't to keep bots out 100% but instead to slow them down to "human" speeds.

Asking because as a dev I hate when sites don't allow bots... however can appreciate that automation should be rate-limited. IOW, isn't preventing bot access actually an anti-pattern since rate-limiting is sufficient?

I see a lot of marketing which bashes Turnstile [detection] rates and tries to leverage this misunderstood nuance. And, it seems to be a dishonest point of contention but am willing to hear opposing arguments.

Thanks.

timshell · a day ago
Yup! It depends on your use case.

Cloudflare is really good at network bot detection. Rate-limiting is super helpful here, for example during DDoS attacks.

Our customers are a little different. They sometimes struggle with high-volume bot attacks (e.g. SMS toll fraud in ticketing marketplaces), but we specifically focus on online platforms that want to verify a human is on the other side of the screen. For example, survey pollsters and labor marketplaces want to stop a slow agent that can complete traditional CAPTCHA even if it's solving it a human speed

reliefcrew · a day ago
n1xis10t · a day ago
Another option to consider (which marginalia-search.com uses) is Anubis (anubis.techaro.lol). The operator of Marginalia told me that he was getting lots of people spamming the same queries over and over, which he thought might be them trying to influence suggested searches. He put Anubis in place and the query volume dropped to much more reasonable levels. It works by running some sort of complex calculation in javascript, so it won’t get rid of all bots, but it should slow them all down.

The downside is that their silly anime girl mascot is displayed whenever the challenge is running, which I think some people might find off-putting.

Edit: Are you going to announce the search engine on hacker news?

2nd edit: If you are making a search engine, this is probably a good article to read: https://archive.org/details/search-timeline It talks about various search engines that have disappeared mysteriously over the years.

xena · 20 hours ago
n1xis10t · 19 hours ago
I noticed that, but that page makes it sound like it can only be unbranded if you pay for the commercial version. It looks like Anubis is open source though, so I suppose you might be able to download the source and switch out images for your own, is that correct?

Also, since you are who you are, can I ask how you came across this post? Did you notice it because of the content of the original post or because I mentioned Anubis?

Deleted Comment