hCaptcha now runs on fifteen percent of the internet

I dislike the widespread use of captcha regardless of provider.

I realize anything connected to the internet will be subject to automated abuse, and it's impossible to run some types of services without taking some steps to defend against it, but it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time. The exact details will vary based on the type of service, of course.

One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password. An incorrect login says so without presenting a captcha. The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.

slx26 · 5 years ago

> it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time

As much as I agree with your dislike of captchas, I don't think this is true at scale (unless universal online identities existed, which could and should include anonymous identifiers by design). When you need to accept information from anonymous users (comments, votes, forms, registrations), there's no way to not invade users privacy and not waste their time, unless you are manually filtering / moderating all the input data, in which case you can't really say it scales. You might say emails can solve the problem. Well, they don't really solve the problem against dedicated attackers / spammers, and they do invade privacy for the average user. You can use statistical approaches to try to reduce privacy invasion or others, but I don't know of anything that really solves the problem without manual identity verification at some point.

protoduction · 5 years ago

I built an alternative[0] that takes a proof of work approach. As a site owner you set the difficulty that makes sense for you: so perhaps you would want 20 seconds of computation before you can submit. The nice thing is that this can happen entirely in the background while the user fills in the form.

Also with multiple requests from the same IP in a short timespan, the difficulty increases.

There are downsides to to any captcha, but in my opinion make a much better tradeoff. Accessibility and privacy are respected, and there are no annoying tasks.

[0]: https://friendlycaptcha.com

smittywerben · 5 years ago

CAPTCHA does not scale. CAPTCHA spams real people with requests and wastes my VALUABLE time, and still labels disabled people as subhuman. It's offensive. It's ineffective. It's outdated.

It's reaching a point where encapsulating a VPN with anti-captcha is something I'd pay for.

3np · 5 years ago

> unless universal online identities existed, which could and should include anonymous identifiers by design

Yes but no. Anonymized identifiers can be deanonymized. They should utilize zero-knowledge proofs in such a way that they can prove "yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.

It could, optionally, yield an identifier unique to each requester and unlinkable to others unless an explicit proof of the link is provided. Though if this is included, there has to be some mechanism to avoid huge ad networks sharing the same "requester entity".

This is a solved problem. All that's left is politics, implementation and alignment.

BMSmnqXAE4yfe1 · 5 years ago

One such solution would be a small payment, something like 1 cent for access. That's not too much, because I am already paying 3 cents to a service solving captchas for me.

Kalium · 5 years ago

For a lot of people, they want to run a service and not have to spend a significant amount of time and energy investing in anti-abuse. In general anti-abuse work is not nearly as useful as product work, a day off, or a variety of other things.

I agree, there should be better ways to do anti-abuse. Yet I find myself coming up empty when I try to find better options for the common scenario where people would really rather invest deeply in their service than in anti-abuse.

I would love to hear some ideas about how to solve this nasty general problem while also respecting user time and privacy. Unfortunately, I've found that entirely too often the vague sense that there must be a better way fails to translate into substantive better way.

Normille · 5 years ago

Better way? I'd be hard pushed to come up with a worse way.

The number of things that are "wrong" with reCatcha etc, have been mentioned on here ad nauseam. In fact, I'll quote myself from another debate on the subject, a while back:

  >1: It's never made clear exactly what you're supposed to click on. For example. If I'm told to click on "traffic lights" does that mean just the lights?... or the poles as well?... and what about a square that only has a tiny bit in it? Does that count too, or is it only squares which are mostly filled by the object in question?

  >2: They make no concession to non-US English speakers. I've been asked to identify things before, where I had to guess what the word means because the same thing is called something completely different in UK English.

  >The only thing that approaches the level of rage that reCaptchas instil in me are those captchas where you've got to transcribe what's in a photo of some letters & numbers and where they NEVER fecking tell you whether it's case sensitive or not, or where they use identical characters for zero and letter O, one and letter I, etc.

Normille · 5 years ago

  >One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password

Is it eBay by any chance?

That recently started randomly showing reCaptchas to me when I'm already logged in and have been using the site for some time. When this happens, it descends into a never-ending cycle of more login screens and then more reCaptchas.

But thankfully eBay have taken note of the dozens of complaints about this on their user forums, dating back to 2018 and rushed their best people in to fix it.

[That last sentence was dripping with sarcasm, in case anyone unfamiliar with the company thought eBay ever took any notice whatsoever of their users' concerns]

I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.

hyperdimension · 5 years ago

I'm about as anti-Google as it comes, but I didn't even mind the first incarnation of reCaptcha as a concept. You prove that you're human, and you also help transcribe books so that they're more accessible/searchable! Sure, it's in Google's interest in that it improves Google Books, but it at least seems like a symbiotic exchange (to, e.g. humanity in general.)

Contrast that with today's form of reCaptcha where you identify stop signs/crosswalks/et c. for Google's benefit, but at the same time you're also improving...oh, wait, Google again. It almost seems like forced labor, in a sense.

rasz · 5 years ago

>eBay

>I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.

be careful, else they start sending you dead horses head and planting gpses in your cars https://www.justice.gov/usao-ma/pr/two-former-ebay-executive...

fab1an · 5 years ago

With our hCaptcha Enterprise product (https://www.botstop.com), showing a CAPTCHA actually only happens in rare cases (relatively speaking..) - vast majority of bots are caught and stopped in the background (using ML), and most users will never see one.

xur17 · 5 years ago

I'm curious what how rare it is / what triggers it. In my experience, at least Google triggers hard mode if you use any sort of privacy preserving technology, etc ublock, brave, etc. It's very frustrating.

BMSmnqXAE4yfe1 · 5 years ago

I have a VPN so like ~50% of web sites present a captcha to me ... had to subscribe to a service solving captchas automatically.

rtx · 5 years ago

Do you allow by click type?

noja · 5 years ago

I find that when I solve a Captcha too quickly, I get another one. And another one. And another one. So instead, I wait a short time, click a few wrong boxes, then enter the correct Captcha. Maybe this is part of it, but I don't like it.

Normille · 5 years ago

If the Buster plugin can't solve the reCaptcha for me [It does fail from time to time] then I just don't bother visiting that website. Or if it's a site I need to use, then I'll try again later and see if I either get let in without being asked to jump through hoops, or get a reCaptcha Buster can solve.

I simply refuse to waste my time and drive up my blood pressure by doing unpaid training work for Google's AI, in order to visit some crappy website. I really wish more people would start boycotting any site which uses reCaptcha [or its derivatives], so we could get rid of this blight on the internet.

I've spotted this new hCaptcha junk show up recently on a couple of sites I used to frequent. I don't visit those sites any more. So well done webmasters. Apparently annoying the shit out of visitors to your site tends to drive them away. Who'da thunk it?!

villux · 5 years ago

What service are you using? Some browser plugin I assume?

aaronmdjones · 5 years ago

    One particularly egregious misuse of captcha in a
    service I use presents one after I enter a correct
    username and password.

That's nothing.

eBay will CAPTCHA me after I enter my e-mail address, and then again after I enter my password too. Every time. And I'll be damned if I don't "fail" this CAPTCHA at least once a week, with it telling me to try again.

Come on, there are only so many mountains/hills, taxis, traffic lights, bicycles, and cross-walks I can look at before I go cross-eyed.

They even have the nerve to suggest that I can avoid this by using the latest version of my browser (Firefox), which I already am and always do.

fractionalhare · 5 years ago

> The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.

Then it may surprise you to know that simply preventing automation makes many types of account takeover attacks infeasible in practice. It won't mitigate the attack if you are personally a high value, named target. But most account takeover attacks operate en masse and are coordinated after large security breaches, so having to hand over accounts to a human operator as part of the auth loop would make the campaign uneconomical. It also introduces another step at which an attack can be logged, recognized, fingerprinted and stopped by an incident response team.

This is something your security team would probably gladly tell you about if you asked them. There's also a bunch of talks about this presented at conferences like Blackhat, DEFCON, USENIX, etc.

Stated in another way: not all potential rewards for successful account takeover are high. The modal account in the modal campaign is low value, which is made up for by volume and particular purpose of accessing accounts. If you model these campaigns economically, you can eliminate entire classes of "low margin, high volume" attacks simply by introducing friction that mitigates automation.

Then there is a natural cost-benefit tradeoff as to how much friction is allowable on a per-user basis to prevent the most common types of account takeover attacks.

Abishek_Muthian · 5 years ago

>One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password.

I run a problem validation community platform. Couple of days back an individual launched automated spam/DDOS attack by commenting an abusive, demoralising text on every single thread by creating different users.

Fortunately, I had systems in place to identify and mitigate it with Cloudflare. So, in this case even genuine users would have received captcha. I found out soon enough who the attacker was from the firewall, he had earlier created an account with his own name and was using the same IP to attack, after I blocked his IP he tried with couple of other IP addresses incl. Tor; but stopped with his activity after couple of hours.

I generally don't like re-captcha because it takes cultural background for granted(e.g. 'Pie' is not a common food worldwide), Accessibility as a disabled person myself and has no mitigation for captcha-solving farms.

But in nuisance cases like the one I detailed above, captcha is the easiest method available en masse.

matheusmoreira · 5 years ago

Why can't they just allow automated user agents? I should be able to scrape websites if I want to. Why do user agents have to be browsers?

megous · 5 years ago

Exactly, or be able to just use a text-mode browser.

Or wget to save a set of pages for later.

I understand protecting commenting with captcha, or contact forms. But captcha on regular read-only access to public web pages in the style of Cloudflare is a bit ridiculous.

One thing contact forms should have is a static indication there's a captcha in use. I've filled all too many forms that just sent my written text to void, because I block some domains.

entangledqubit · 5 years ago

This doesn't mix well with the ad-based compensation model.

Sadly, there still doesn't seem to be much in the way of micropayment infrastructure.

lovegoblin · 5 years ago

Being scraped isn't free, if it's at a large enough scale.

Plus, it's not just benign read-only scrapers. Have you looked at the spam folder of your email recently? That's what every comment section and user bio and god knows what else would look like if you just blindly allow all automated traffic.

ck2 · 5 years ago

It's exactly how email spam filters evolved.

They used to be completely local and even some DIY solutions, evolved to signature updates, but eventually the attacks grew so advanced that only online services could be updated and aggressive enough, which is of course how gmail took over the internet with near perfect spam filter (when was the last time you checked a gmail spam folder).

The last generation of local spam filters were pretty good though. Anyone remember Eudora and Spamnix?

megous · 5 years ago

Local spam filtering still works quite fine. It just needs a lot of data most users probably don't have when starting out.

I just use bogofilter, and it worked almost perfectly from the start, just because I saved years upon years of SPAM and HAM. 10's of thousands of messages each.

It got slightly worse over years, because I incrementally only train it on new SPAM but not on new HAM, because of laziness.

People probably have HAM archives, but don't usually save their SPAM, to be able to start using Bayesian spam filters right away with great results.

Personally I find it much better than whatever Google uses. I don't even bother with SMTP level domain/IP blacklists, or reverse IP/domain checks anymore. All mail is just passed right to the mailbox and is then pre-filtered by a bogofilter to SPAM folder that I check once weekly, and barely find any HAM there. I receive about 500k mails a year.

moron4hire · 5 years ago

And don't spammers just click farm captchas out to Facebook users filling out "what Hogwarts House are you?" quizzes, anyway?

hyperdimension · 5 years ago

That's a little amusing just to imagine: 'Which Hogwarts house are you? Identify these traffic signals and we'll sort you into the proper house!'

Deleted Comment

1vuio0pswjnm7 · 5 years ago

Here's a thought experiment. This one requires some long-term thinking, outside the box and well past recent history and the status quo.

What if the majority internet usage is non-interactive, from so-called "bots", what we may refer to as "automated use". Google and Facebook, among others, rely on the use of automation and "bots". The non-interactive clients ("bots") being used by these companies are not asked to solve captchas. (In turn, after collecting data from public sources, these websites attempt to prohibit the use of automation by their users wishing to access it. What is interesting is that neither company provides any definition of "automated" nor any clearly stated limits on the speed at which a user may access resources or the quantity of resources they may access in a stated time period. One might be apt to find such limits associated with an "API".)

In 2013 an Incapsula report suggested that the majority of internet usage is in fact automated and not "malicious"^1 -- what if public information sources on the internet catered to the use of automation rather than trying to limit such use, e.g., with speed bumps^2 like "captchas". What if servers treated all clients equally, instead of having data forcibly collected by a few large clients that receive preferential treatment, then siloed and protected from "automation". What effects would this have on "centralisation" and levelling the playing field.

"Do not ask for permission, ask for forgiveness." What does it really mean when applied to the internet. Perhaps it means there is an endemic lack of clarity about "the rules". Prohibiting "automation" is far too vague and in many cases it makes no sense. The growth of computers and the internet is the growth of automation. Both servers and clients may have concerns about resource utilisation. Websites do not ask for permission when they decide to use large amounts of the user's computer resources.

Consider that a Google could not exist without being "given permission" to use automation. Does the GoogleBot have to solve captchas. No automation means no company such as this could exist. How useful would the web be without anyone being able to use automation to create an index. Based on the HN comments about web search I have read over the years, I would guess that for many commenters, it means the usefulness of the web would be dramatically reduced.

Imagine an automation-friendly internet. The truth is, I think (the data shows) we already have one, except we are in denial that "the rules" actually allow it. An early metaphor for internet and web use was "surfing". It may be that those who are constantly fighting against automation are fighting against the waves instead of riding them. Time will tell. It stands to reason, IMO, that every internet user, whether a server or a client, should be expected to use automation.

1. https://www.incapsula.com/blog/bot-traffic-report-2013.html

2. An early metaphor for the internet was a "superhighway". Speed bumps would seem out of place on a superhighway.

josefx · 5 years ago

Could the captcha be there to keep spam bots from posting? Sometimes it is trivial to get a new or just valid account, so just checking for that wouldn't stop spam bots.

MeatBro · 5 years ago

It's easier for me to switch from google to ddg. Then to actually complete a captcha. I don't understand why businesses don't understand this.

jsnell · 5 years ago

There's a good reason for what you're identifying as misuse.

If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information. You can have other solutions, e.g. in a login flow that splits the username and password entry, it's advantageous to put the captcha between those two steps. But even in those solutions the display of the captcha must be independent of password correctness.

aftbit · 5 years ago

There's a lot of arguments against captchas, but I do not agree with this one. You will always leak whether or not a password is correct based on how your app behaves - a correct password will grant entry to the application. If you only ask for a captcha when a user account exists but fail to ask if they use a made up username, that's an information leak.

renewiltord · 5 years ago

> If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information.

Presumably, if the person has entered the right username and password they're going to get access to the service at which point they'll know they entered the right one. What information exactly is leaked here?

aleph1 · 5 years ago

That's not what's happening:

> An incorrect login says so without presenting a captcha.

Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.

I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?

Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.

gaieges · 5 years ago

Disclaimer: I've been an engineer at hCaptcha for a few years now building out the service. I'm just as interested in you as hearing about customer and user success/pain stories!

> Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.

That's definitely a part of it, but we also have a number of other large sites and services that use hCaptcha to protect against bots, and more that get added every day because of our more advanced bot detection special sauce.

> I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?

From what we've seen, the integration process is generally smooth, especially if you're a previous reCAPTCHA user, since we keep the interface and workflow largely the same.

Solving is roughly the same although we have a number of other protections that irritate bot maintainers and get activated when we detect them.

Not sure if the majority of people are aware of the change, I'm sure some technically savvy people pick up on it more than not.

> Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.

That's actually one the top reasons we've had a lot of customers come over to us; we put a heavy emphasis on user privacy / security, including adopting/supporting privacy-preserving protocols (PrivacyPass, Tor), and minimal retention of data (see our data privacy policy on our site).

disgrunt · 5 years ago

Your CAPTCHA accessibility leaves much to be desired. You require screen reader users to register an account to create a magic cookie that itself requires Safari users to disable security protections in their browser in order to use -- and then it doesn't actually work.

Please do better. You're blocking off a non-trivial amount of the Internet to blind users. You will eventually be sued for this.

hedora · 5 years ago

I usually just bounce when I see a captcha (if I get one, I usually get a string of them, so I don’t bother).

However, I checked secondary markets where you can pay a human to solve a captcha.

It takes a professional captcha solver 70 seconds to solve an hCaptcha but only 15-20 seconds to solve a reCaptcha. Is that typical? That seems horrible.

The market rate for a captcha solution is 1-3 cents, which is clearly worth it, until you think of the ethics of paying someone slave wages so you can browse the internet slowly, but at least without breaking concentration.

Have you considered a more ethical approach, like micropayments that go to charity or something?

ve55 · 5 years ago

Love the response, happy to see that it's going well then! After reading a lot of feedback I got from 'You (probably) don’t need ReCAPTCHA' (https://nearcyan.com/you-probably-dont-need-recaptcha/), it started to seem pretty obvious to me that there was an open market space for some better competitors, so I'm glad hCaptcha got around to being adopted with such success sooner rather than later. Hopefully the challenges of the future go just as smoothly as things are going in the present.

drusepth · 5 years ago

>do the users like it?

This is completely anecdotal (and seems antithetical to the typical HN response to hCaptcha vs ReCAPTCHA), but I feel like I end up spending at least twice as much time trying to solve hCaptchas successfully because they have a lot less consistency in the objects you're searching for. I always have to zoom in to the modal and carefully search through each image, which invariably breaks whatever flow I'm in (moreso than other captchas).

For example, here's a screenshot from the hCaptcha website's "try it out" section [1] -- I barely recognized either boat in image #1 because it was so small. I missed image #3 because I didn't realize it was a huge cruise-esque boat (so big you can't even see any water) and I spent a good amount of time deliberating on #4 because, well, it looks like a car + windshield but... on the water? If it's a boat, I can't really tell, but I marked it as one solely because of the water in the background. Not sure if it was right or not.

It also seems to occasionally provide "find all the X" challenges without there actually being any X, which feels super cognitively weird ("am I just not seeing it?!").

I'd say ReCAPTCHA's main problem is deciding whether mostly-consistent objects being partially in-frame is enough to "count", whereas hCaptcha's main problem is actually recognizing the widely-varying objects in the frame. I think the former is a little more frustrating when you get something wrong, but the latter is mentally "harder" and takes more time on average, for me at least.

[1] https://i.imgur.com/uyqvs5u.png from https://www.hcaptcha.com/

QUFB · 5 years ago

Honest question: How do you view it as an improvement? The same data is being shared, and the only difference is that Cloudflare isn't immediately behaving in the same evil ways as Google. But once you concentrate power in an entity, perhaps bad things might happen?

... If there was an on-premise captcha implementation that actually worked, that would be great.

fab1an · 5 years ago

Unlike Google, hCaptcha isn't running an ad network "on the side" of their bot management business :) joking aside, hCaptcha is an extremely privacy-conscious operation, Google is not.

identity0 · 5 years ago

hCaptcha works on Tor, sometimes.

judge2020 · 5 years ago

For site operators, they don’t like the change since users are more likely to complain to the website than directly to CF. The following community post has 20k views and >100 replies asking Cloudflare to move back to recaptcha in some form.

https://community.cloudflare.com/t/stop-using-hcaptcha/15896...

elaus · 5 years ago

To be fair it doesn't seem to be _that_ bad on this thread: There's the very vocal OP as well as a "discussion" between various users that ranges from "please switch back to ReCaptcha" to "please keep hCaptcha".

For a change that affects "15% of the internet" this seems like very little negative feedback in a period of 8 months.

sxt · 5 years ago

> hCaptcha is making cloudflare money by earning them Human Tokens on the Ethereum blockchain

> Most people do the convenience from Google CAPTCHA, although they sell some kind of info, but they won’t hurt you

I can't even...this is the Cloudflare forum wow.

I've personally had a few hiccups with hCaptcha quite some time back as I "wasn't sure what I was looking for" and consistently fail on VPNs. But in recent months these there's definitely been substantial improvement , and needless to say I hope to see hCaptcha be the majority provider

TechBro8615 · 5 years ago

Maybe the user should have the option to choose which CAPTCHA to solve.

OJFord · 5 years ago

Worldwide? And since when? I've never hit one of these, I get reCaptcha'd to ~death~ anger all the time.

Although having said that, maybe I am hitting it and that I've been unaware and uninterrogated is high praise! Hm.

hda2 · 5 years ago

> Do the users like it?

Absolutely. Having to solving only one captcha every few days beats solving 5 or 6 on each page visit. hcaptcha supports privacy pass but Recaptcha doesn't.

fab1an · 5 years ago

OP here, and full disclosure I work with the hCaptcha team. Yep, Cloudflare is a big part of this, but you'll find our enterprise offering (BotStop.com) running on many many other large sites and apps. If you've used the internet in 2020, you almost certainly interacted with our products :)