I dislike the widespread use of captcha regardless of provider.
I realize anything connected to the internet will be subject to automated abuse, and it's impossible to run some types of services without taking some steps to defend against it, but it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time. The exact details will vary based on the type of service, of course.
One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password. An incorrect login says so without presenting a captcha. The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.
> it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time
As much as I agree with your dislike of captchas, I don't think this is true at scale (unless universal online identities existed, which could and should include anonymous identifiers by design). When you need to accept information from anonymous users (comments, votes, forms, registrations), there's no way to not invade users privacy and not waste their time, unless you are manually filtering / moderating all the input data, in which case you can't really say it scales. You might say emails can solve the problem. Well, they don't really solve the problem against dedicated attackers / spammers, and they do invade privacy for the average user. You can use statistical approaches to try to reduce privacy invasion or others, but I don't know of anything that really solves the problem without manual identity verification at some point.
I built an alternative[0] that takes a proof of work approach. As a site owner you set the difficulty that makes sense for you: so perhaps you would want 20 seconds of computation before you can submit. The nice thing is that this can happen entirely in the background while the user fills in the form.
Also with multiple requests from the same IP in a short timespan, the difficulty increases.
There are downsides to to any captcha, but in my opinion make a much better tradeoff. Accessibility and privacy are respected, and there are no annoying tasks.
CAPTCHA does not scale. CAPTCHA spams real people with requests and wastes my VALUABLE time, and still labels disabled people as subhuman. It's offensive. It's ineffective. It's outdated.
It's reaching a point where encapsulating a VPN with anti-captcha is something I'd pay for.
> unless universal online identities existed, which could and should include anonymous identifiers by design
Yes but no. Anonymized identifiers can be deanonymized. They should utilize zero-knowledge proofs in such a way that they can prove "yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.
It could, optionally, yield an identifier unique to each requester and unlinkable to others unless an explicit proof of the link is provided. Though if this is included, there has to be some mechanism to avoid huge ad networks sharing the same "requester entity".
This is a solved problem. All that's left is politics, implementation and alignment.
One such solution would be a small payment, something like 1 cent for access. That's not too much, because I am already paying 3 cents to a service solving captchas for me.
For a lot of people, they want to run a service and not have to spend a significant amount of time and energy investing in anti-abuse. In general anti-abuse work is not nearly as useful as product work, a day off, or a variety of other things.
I agree, there should be better ways to do anti-abuse. Yet I find myself coming up empty when I try to find better options for the common scenario where people would really rather invest deeply in their service than in anti-abuse.
I would love to hear some ideas about how to solve this nasty general problem while also respecting user time and privacy. Unfortunately, I've found that entirely too often the vague sense that there must be a better way fails to translate into substantive better way.
Better way? I'd be hard pushed to come up with a worse way.
The number of things that are "wrong" with reCatcha etc, have been mentioned on here ad nauseam. In fact, I'll quote myself from another debate on the subject, a while back:
>1: It's never made clear exactly what you're supposed to click on. For example. If I'm told to click on "traffic lights" does that mean just the lights?... or the poles as well?... and what about a square that only has a tiny bit in it? Does that count too, or is it only squares which are mostly filled by the object in question?
>2: They make no concession to non-US English speakers. I've been asked to identify things before, where I had to guess what the word means because the same thing is called something completely different in UK English.
>The only thing that approaches the level of rage that reCaptchas instil in me are those captchas where you've got to transcribe what's in a photo of some letters & numbers and where they NEVER fecking tell you whether it's case sensitive or not, or where they use identical characters for zero and letter O, one and letter I, etc.
>One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password
Is it eBay by any chance?
That recently started randomly showing reCaptchas to me when I'm already logged in and have been using the site for some time. When this happens, it descends into a never-ending cycle of more login screens and then more reCaptchas.
But thankfully eBay have taken note of the dozens of complaints about this on their user forums, dating back to 2018 and rushed their best people in to fix it.
[That last sentence was dripping with sarcasm, in case anyone unfamiliar with the company thought eBay ever took any notice whatsoever of their users' concerns]
I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.
I'm about as anti-Google as it comes, but I didn't even mind the first incarnation of reCaptcha as a concept. You prove that you're human, and you also help transcribe books so that they're more accessible/searchable! Sure, it's in Google's interest in that it improves Google Books, but it at least seems like a symbiotic exchange (to, e.g. humanity in general.)
Contrast that with today's form of reCaptcha where you identify stop signs/crosswalks/et c. for Google's benefit, but at the same time you're also improving...oh, wait, Google again. It almost seems like forced labor, in a sense.
>I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.
With our hCaptcha Enterprise product (https://www.botstop.com), showing a CAPTCHA actually only happens in rare cases (relatively speaking..) - vast majority of bots are caught and stopped in the background (using ML), and most users will never see one.
I'm curious what how rare it is / what triggers it. In my experience, at least Google triggers hard mode if you use any sort of privacy preserving technology, etc ublock, brave, etc. It's very frustrating.
I find that when I solve a Captcha too quickly, I get another one. And another one. And another one. So instead, I wait a short time, click a few wrong boxes, then enter the correct Captcha. Maybe this is part of it, but I don't like it.
If the Buster plugin can't solve the reCaptcha for me [It does fail from time to time] then I just don't bother visiting that website. Or if it's a site I need to use, then I'll try again later and see if I either get let in without being asked to jump through hoops, or get a reCaptcha Buster can solve.
I simply refuse to waste my time and drive up my blood pressure by doing unpaid training work for Google's AI, in order to visit some crappy website. I really wish more people would start boycotting any site which uses reCaptcha [or its derivatives], so we could get rid of this blight on the internet.
I've spotted this new hCaptcha junk show up recently on a couple of sites I used to frequent. I don't visit those sites any more. So well done webmasters. Apparently annoying the shit out of visitors to your site tends to drive them away. Who'da thunk it?!
One particularly egregious misuse of captcha in a
service I use presents one after I enter a correct
username and password.
That's nothing.
eBay will CAPTCHA me after I enter my e-mail address, and then again after I enter my password too. Every time. And I'll be damned if I don't "fail" this CAPTCHA at least once a week, with it telling me to try again.
Come on, there are only so many mountains/hills, taxis, traffic lights, bicycles, and cross-walks I can look at before I go cross-eyed.
They even have the nerve to suggest that I can avoid this by using the latest version of my browser (Firefox), which I already am and always do.
> The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.
Then it may surprise you to know that simply preventing automation makes many types of account takeover attacks infeasible in practice. It won't mitigate the attack if you are personally a high value, named target. But most account takeover attacks operate en masse and are coordinated after large security breaches, so having to hand over accounts to a human operator as part of the auth loop would make the campaign uneconomical. It also introduces another step at which an attack can be logged, recognized, fingerprinted and stopped by an incident response team.
This is something your security team would probably gladly tell you about if you asked them. There's also a bunch of talks about this presented at conferences like Blackhat, DEFCON, USENIX, etc.
Stated in another way: not all potential rewards for successful account takeover are high. The modal account in the modal campaign is low value, which is made up for by volume and particular purpose of accessing accounts. If you model these campaigns economically, you can eliminate entire classes of "low margin, high volume" attacks simply by introducing friction that mitigates automation.
Then there is a natural cost-benefit tradeoff as to how much friction is allowable on a per-user basis to prevent the most common types of account takeover attacks.
>One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password.
I run a problem validation community platform. Couple of days back an individual launched automated spam/DDOS attack by commenting an abusive, demoralising text on every single thread by creating different users.
Fortunately, I had systems in place to identify and mitigate it with Cloudflare. So, in this case even genuine users would have received captcha. I found out soon enough who the attacker was from the firewall, he had earlier created an account with his own name and was using the same IP to attack, after I blocked his IP he tried with couple of other IP addresses incl. Tor; but stopped with his activity after couple of hours.
I generally don't like re-captcha because it takes cultural background for granted(e.g. 'Pie' is not a common food worldwide), Accessibility as a disabled person myself and has no mitigation for captcha-solving farms.
But in nuisance cases like the one I detailed above, captcha is the easiest method available en masse.
Exactly, or be able to just use a text-mode browser.
Or wget to save a set of pages for later.
I understand protecting commenting with captcha, or contact forms. But captcha on regular read-only access to public web pages in the style of Cloudflare is a bit ridiculous.
One thing contact forms should have is a static indication there's a captcha in use. I've filled all too many forms that just sent my written text to void, because I block some domains.
Being scraped isn't free, if it's at a large enough scale.
Plus, it's not just benign read-only scrapers. Have you looked at the spam folder of your email recently? That's what every comment section and user bio and god knows what else would look like if you just blindly allow all automated traffic.
They used to be completely local and even some DIY solutions, evolved to signature updates, but eventually the attacks grew so advanced that only online services could be updated and aggressive enough, which is of course how gmail took over the internet with near perfect spam filter (when was the last time you checked a gmail spam folder).
The last generation of local spam filters were pretty good though. Anyone remember Eudora and Spamnix?
Local spam filtering still works quite fine. It just needs a lot of data most users probably don't have when starting out.
I just use bogofilter, and it worked almost perfectly from the start, just because I saved years upon years of SPAM and HAM. 10's of thousands of messages each.
It got slightly worse over years, because I incrementally only train it on new SPAM but not on new HAM, because of laziness.
People probably have HAM archives, but don't usually save their SPAM, to be able to start using Bayesian spam filters right away with great results.
Personally I find it much better than whatever Google uses. I don't even bother with SMTP level domain/IP blacklists, or reverse IP/domain checks anymore. All mail is just passed right to the mailbox and is then pre-filtered by a bogofilter to SPAM folder that I check once weekly, and barely find any HAM there. I receive about 500k mails a year.
Here's a thought experiment. This one requires some long-term thinking, outside the box and well past recent history and the status quo.
What if the majority internet usage is non-interactive, from so-called "bots", what we may refer to as "automated use". Google and Facebook, among others, rely on the use of automation and "bots". The non-interactive clients ("bots") being used by these companies are not asked to solve captchas. (In turn, after collecting data from public sources, these websites attempt to prohibit the use of automation by their users wishing to access it. What is interesting is that neither company provides any definition of "automated" nor any clearly stated limits on the speed at which a user may access resources or the quantity of resources they may access in a stated time period. One might be apt to find such limits associated with an "API".)
In 2013 an Incapsula report suggested that the majority of internet usage is in fact automated and not "malicious"^1 -- what if public information sources on the internet catered to the use of automation rather than trying to limit such use, e.g., with speed bumps^2 like "captchas". What if servers treated all clients equally, instead of having data forcibly collected by a few large clients that receive preferential treatment, then siloed and protected from "automation". What effects would this have on "centralisation" and levelling the playing field.
"Do not ask for permission, ask for forgiveness." What does it really mean when applied to the internet. Perhaps it means there is an endemic lack of clarity about "the rules". Prohibiting "automation" is far too vague and in many cases it makes no sense. The growth of computers and the internet is the growth of automation. Both servers and clients may have concerns about resource utilisation. Websites do not ask for permission when they decide to use large amounts of the user's computer resources.
Consider that a Google could not exist without being "given permission" to use automation. Does the GoogleBot have to solve captchas. No automation means no company such as this could exist. How useful would the web be without anyone being able to use automation to create an index. Based on the HN comments about web search I have read over the years, I would guess that for many commenters, it means the usefulness of the web would be dramatically reduced.
Imagine an automation-friendly internet. The truth is, I think (the data shows) we already have one, except we are in denial that "the rules" actually allow it. An early metaphor for internet and web use was "surfing". It may be that those who are constantly fighting against automation are fighting against the waves instead of riding them. Time will tell. It stands to reason, IMO, that every internet user, whether a server or a client, should be expected to use automation.
Could the captcha be there to keep spam bots from posting? Sometimes it is trivial to get a new or just valid account, so just checking for that wouldn't stop spam bots.
There's a good reason for what you're identifying as misuse.
If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information. You can have other solutions, e.g. in a login flow that splits the username and password entry, it's advantageous to put the captcha between those two steps. But even in those solutions the display of the captcha must be independent of password correctness.
There's a lot of arguments against captchas, but I do not agree with this one. You will always leak whether or not a password is correct based on how your app behaves - a correct password will grant entry to the application. If you only ask for a captcha when a user account exists but fail to ask if they use a made up username, that's an information leak.
> If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information.
Presumably, if the person has entered the right username and password they're going to get access to the service at which point they'll know they entered the right one. What information exactly is leaked here?
I think it's great. So many sites sit behind Cloudflare now and Cloudflare now uses hCaptcha, which is a big win. And the hCaptchas themselves are easy to complete. No more wondering if you actually clicked on 'all' the traffic lights anymore, yay!
I inspected the source code of Google's reCaptcha offering and was disgusted at how many bits of information they were collecting. They also seem to be fingerprinting users so they can't keep registering new accounts on a platform, locking out anonymous users who are usually the best types of users on the platform, as IMHO anonymous voices are (usually) the best voices, or at least the more interesting of voices.
Google's reCaptcha code seemed to be very keen on knowing my 'cadence' or the way I used my mouse and how quickly (or how slow) I completed the captcha. It also looked at things like timezone, screen resolution, battery charge level etc So they could determine if it was 'you' who was using the captcha, soon after, in a separate session (even on a different device!)
> Google's reCaptcha code seemed to be very keen on knowing my 'cadence' or the way I used my mouse and how quickly (or how slow) I completed the captcha. It also looked at things like timezone, screen resolution, battery charge level etc So they could determine if it was 'you' who was using the captcha, soon after, in a separate session (even on a different device!)
I'd bet a good amount that they store that along with all the other personally identifying info they have on you (and google of course has a massive amount of that); which is basically why after a single reCAPTCHA solve, you wont see them prompt you again for ages - they know who you are.
Just turn on "Resist Fingerprinting" in Firefox and you'll find ReCAPTCHA _really_ annoying! I have to solve 3-5 "panes" of a ReCAPTCHA on _every_ page... It's very annoying that preserving privacy comes with this cost.
I almost want to just add a "DeathByCaptcha" extension to handle these for me and pay a few cents for every page I visit, lol
> which is basically why after a single reCAPTCHA solve, you wont see them prompt you again for ages - they know who you are.
If only. If the same site has reCaptcha across more than one page, within mere minutes of having to slog through multiple screens of one, I can guarantee I'll be doing it again.
And I'm never sure if Google has served me either a very long sequence of reCaptchas, or whether they've decided I'm not a person and are serving me an infinite reCaptcha.
reCaptcha has also gotten increasingly annoying lately.
I forgot my password to one site and tried about 2 or 3 different passwords and in-between each it asked me to do about 7 or 8 of those labelling exercises. I finally just gave up and left the site.
Not only that, but the labelling exercises weren't clear. It wanted me to label a "公交車" which means more like a public city bus and there were also school buses which would normally not be called that in Chinese so I didn't label them but Google thought they were part of that class, and wouldn't let me proceed without me labelling them, and furthermore, punished me with more "hard" exercises like that. I guess they are trying to turn me into a stupid bot.
Recently Google's captcha asked me to mark all the traffic meters on the photos, and amongst the choices was a photo of a mailbox. It didn't let me through until I marked it as a meter as well.
Good luck to whatever self driving car they are training using this data.
I’m not particularly fond of reCaptcha either but I disagree that it’s an obviously good thing for someone to be able to repeatedly make new accounts with no restrictions. Abusive users use this to bypass account bans.
yes we need captcha that supports browsers like links. NOJS browser ought to make a comeback. display image, text and video. For many sites that's ALL we need.
For sure enable javascript to get the fancy stuff, but mostly we just want to read the text, view the picture and see the video.
Creating a monoculture makes it easier to implement systems that automatically bypass the captchas, so it’s good for end users, especially people that are visually impaired, or otherwise unable to solve captchas.
Unlike Google, which is making hundreds of billions with ads, we have zero reasons to track users - customers pay us to stop bots, and that's the product we provide.
Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.
I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?
Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.
Disclaimer: I've been an engineer at hCaptcha for a few years now building out the service. I'm just as interested in you as hearing about customer and user success/pain stories!
> Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.
That's definitely a part of it, but we also have a number of other large sites and services that use hCaptcha to protect against bots, and more that get added every day because of our more advanced bot detection special sauce.
> I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?
From what we've seen, the integration process is generally smooth, especially if you're a previous reCAPTCHA user, since we keep the interface and workflow largely the same.
Solving is roughly the same although we have a number of other protections that irritate bot maintainers and get activated when we detect them.
Not sure if the majority of people are aware of the change, I'm sure some technically savvy people pick up on it more than not.
> Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.
That's actually one the top reasons we've had a lot of customers come over to us; we put a heavy emphasis on user privacy / security, including adopting/supporting privacy-preserving protocols (PrivacyPass, Tor), and minimal retention of data (see our data privacy policy on our site).
Your CAPTCHA accessibility leaves much to be desired. You require screen reader users to register an account to create a magic cookie that itself requires Safari users to disable security protections in their browser in order to use -- and then it doesn't actually work.
Please do better. You're blocking off a non-trivial amount of the Internet to blind users. You will eventually be sued for this.
I usually just bounce when I see a captcha (if I get one, I usually get a string of them, so I don’t bother).
However, I checked secondary markets where you can pay a human to solve a captcha.
It takes a professional captcha solver 70 seconds to solve an hCaptcha but only 15-20 seconds to solve a reCaptcha. Is that typical? That seems horrible.
The market rate for a captcha solution is 1-3 cents, which is clearly worth it, until you think of the ethics of paying someone slave wages so you can browse the internet slowly, but at least without breaking concentration.
Have you considered a more ethical approach, like micropayments that go to charity or something?
Love the response, happy to see that it's going well then! After reading a lot of feedback I got from 'You (probably) don’t need ReCAPTCHA' (https://nearcyan.com/you-probably-dont-need-recaptcha/), it started to seem pretty obvious to me that there was an open market space for some better competitors, so I'm glad hCaptcha got around to being adopted with such success sooner rather than later. Hopefully the challenges of the future go just as smoothly as things are going in the present.
This is completely anecdotal (and seems antithetical to the typical HN response to hCaptcha vs ReCAPTCHA), but I feel like I end up spending at least twice as much time trying to solve hCaptchas successfully because they have a lot less consistency in the objects you're searching for. I always have to zoom in to the modal and carefully search through each image, which invariably breaks whatever flow I'm in (moreso than other captchas).
For example, here's a screenshot from the hCaptcha website's "try it out" section [1] -- I barely recognized either boat in image #1 because it was so small. I missed image #3 because I didn't realize it was a huge cruise-esque boat (so big you can't even see any water) and I spent a good amount of time deliberating on #4 because, well, it looks like a car + windshield but... on the water? If it's a boat, I can't really tell, but I marked it as one solely because of the water in the background. Not sure if it was right or not.
It also seems to occasionally provide "find all the X" challenges without there actually being any X, which feels super cognitively weird ("am I just not seeing it?!").
I'd say ReCAPTCHA's main problem is deciding whether mostly-consistent objects being partially in-frame is enough to "count", whereas hCaptcha's main problem is actually recognizing the widely-varying objects in the frame. I think the former is a little more frustrating when you get something wrong, but the latter is mentally "harder" and takes more time on average, for me at least.
Honest question: How do you view it as an improvement? The same data is being shared, and the only difference is that Cloudflare isn't immediately behaving in the same evil ways as Google. But once you concentrate power in an entity, perhaps bad things might happen?
... If there was an on-premise captcha implementation that actually worked, that would be great.
Unlike Google, hCaptcha isn't running an ad network "on the side" of their bot management business :) joking aside, hCaptcha is an extremely privacy-conscious operation, Google is not.
For site operators, they don’t like the change since users are more likely to complain to the website than directly to CF. The following community post has 20k views and >100 replies asking Cloudflare to move back to recaptcha in some form.
To be fair it doesn't seem to be _that_ bad on this thread: There's the very vocal OP as well as a "discussion" between various users that ranges from "please switch back to ReCaptcha" to "please keep hCaptcha".
For a change that affects "15% of the internet" this seems like very little negative feedback in a period of 8 months.
> hCaptcha is making cloudflare money by earning them Human Tokens on the Ethereum blockchain
> Most people do the convenience from Google CAPTCHA, although they sell some kind of info, but they won’t hurt you
I can't even...this is the Cloudflare forum wow.
I've personally had a few hiccups with hCaptcha quite some time back as I "wasn't sure what I was looking for" and consistently fail on VPNs. But in recent months these there's definitely been substantial improvement , and needless to say I hope to see hCaptcha be the majority provider
Absolutely. Having to solving only one captcha every few days beats solving 5 or 6 on each page visit. hcaptcha supports privacy pass but Recaptcha doesn't.
OP here, and full disclosure I work with the hCaptcha team. Yep, Cloudflare is a big part of this, but you'll find our enterprise offering (BotStop.com) running on many many other large sites and apps. If you've used the internet in 2020, you almost certainly interacted with our products :)
I'm really starting to hate all the captchas with a burning passion. Partly because the corporation I work for seems to have gotten our NAT addresses onto a blacklist so I get captcha'd constantly, and partly because my close up vision is getting noticeably weaker (pushing 50, that's why) and without hunting down my reading glasses it can be difficult to make out the smaller details necessary to solve the puzzle. Especially when I'm on my phone.
I really wish we could find something relatively foolproof that didn't rely heavily on tracking or really good vision.
Similar deal where I am at present in India: the small ISP uses carrier-grade NAT, so there’s malware and related activity occurring every day from at least one of the who-knows-how-many people behind this one IP address. Last time I was here in 2016 it was actually a lot worse than it is now (then, any Cloudflare site would trigger it, so I’d be hitting dozens of challenges per day), but I still get the occasional hCaptcha here (e.g. the Audacity wiki), and they’re awful. I normally take two or three attempts (quite apart from the regular times when you finish the challenge and press submit, and it just does nothing), guessing things like whether they want to count this particular dark smudge as a motorcycle or not, or whether this fragment of a motorcycle should count or not.
I wish people would just face up to the reality that challenge-based CAPTHCA techniques have failed, and stop using them.
We've moved to hCaptcha from reCAPTCHA after Google surprised us with their pricing (blog[1], hn discussion[2]), and couldn't be happier. We use it in invisible mode and it does a great job at finding bots while getting out of users' way.
Also top-notch customer support. The CEO was personally in the slack channel helping us. Highly recommended.
My mom and dad's shared IP (somewhere in Europe) repeatedly gets on CloudFlare's IP ban list meaning my mom keeps having to solve these hCaptcha's. hCaptcha's is a lot more difficult to complete than Google's reCaptcha and she has a lot of trouble with it.
Why they get on these IP lists is I think because it's a general consumer ISP and probably a lot of people get bot nets on there.
I've learned that folks end up unintentionally installing software which acts as essentially "proxy server as a service". I've heard of browser extensions doing this, but I would be unsurprised if mobile apps did it. Every holiday I do a sweep of my parents' devices to make sure they haven't installed anything silly (my mom somehow always has three or four different weather apps). I'd suggest giving it a look the next time you can.
Or, it could be that their computers or home network is infested with malware or bots.
Majority of people complaining about captcha need to look at their system first. Of course any detection system has false positive, but the false positive rate is not double digit percentage in vast majority of cases.
Install privacy pass on their computers. It won't eliminate the captchas completely but will decrease the number of times they'll see them. I believe cloudflair gives you 30 passes for each captcha solved.
As someone who scrapes, captcha's are pretty silly. One of the sites we scrape implemented hCaptcha, and it was a breeze to get around. There are a few things that make my life more difficult, but captchas aren't one of them, and nothing can stop scraping altogether.
Meh, there's always going to be a longtail of targeted abuse so it's not much to boast over. Xrumer software in 2001 could even let you sit at your computer and fill out those common PHP-lib captchas (like on EZBoard) while Xrumer spammed internet forums and blogs. You could even hire a cubicle farm of humans to manually abuse a web service.
Captchas filter out the 90% bulk of automated abuse.
Btw, web scraping is on the nearly harmless side of abuse.
That makes sense that there's longtail abuse, thanks. What would you say is more harmful abuse? Spamming endpoints for SQL and other injection attacks?
I'm not OP but there are cheap solving services that will fill them in for you. The cost is trivial and they have decent APIs for automatic integration into scrapers.
That's great to see! At Plausible Analytics, we had a wave of spam attacks two months ago or so and hCaptcha saved us. Great product and great service both for companies and for users. We're very happy with how it works. And great to have a quality de-Googled alternative for this use case!
I realize anything connected to the internet will be subject to automated abuse, and it's impossible to run some types of services without taking some steps to defend against it, but it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time. The exact details will vary based on the type of service, of course.
One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password. An incorrect login says so without presenting a captcha. The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.
As much as I agree with your dislike of captchas, I don't think this is true at scale (unless universal online identities existed, which could and should include anonymous identifiers by design). When you need to accept information from anonymous users (comments, votes, forms, registrations), there's no way to not invade users privacy and not waste their time, unless you are manually filtering / moderating all the input data, in which case you can't really say it scales. You might say emails can solve the problem. Well, they don't really solve the problem against dedicated attackers / spammers, and they do invade privacy for the average user. You can use statistical approaches to try to reduce privacy invasion or others, but I don't know of anything that really solves the problem without manual identity verification at some point.
Also with multiple requests from the same IP in a short timespan, the difficulty increases.
There are downsides to to any captcha, but in my opinion make a much better tradeoff. Accessibility and privacy are respected, and there are no annoying tasks.
[0]: https://friendlycaptcha.com
It's reaching a point where encapsulating a VPN with anti-captcha is something I'd pay for.
Yes but no. Anonymized identifiers can be deanonymized. They should utilize zero-knowledge proofs in such a way that they can prove "yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.
It could, optionally, yield an identifier unique to each requester and unlinkable to others unless an explicit proof of the link is provided. Though if this is included, there has to be some mechanism to avoid huge ad networks sharing the same "requester entity".
This is a solved problem. All that's left is politics, implementation and alignment.
I agree, there should be better ways to do anti-abuse. Yet I find myself coming up empty when I try to find better options for the common scenario where people would really rather invest deeply in their service than in anti-abuse.
I would love to hear some ideas about how to solve this nasty general problem while also respecting user time and privacy. Unfortunately, I've found that entirely too often the vague sense that there must be a better way fails to translate into substantive better way.
The number of things that are "wrong" with reCatcha etc, have been mentioned on here ad nauseam. In fact, I'll quote myself from another debate on the subject, a while back:
That recently started randomly showing reCaptchas to me when I'm already logged in and have been using the site for some time. When this happens, it descends into a never-ending cycle of more login screens and then more reCaptchas.
But thankfully eBay have taken note of the dozens of complaints about this on their user forums, dating back to 2018 and rushed their best people in to fix it.
[That last sentence was dripping with sarcasm, in case anyone unfamiliar with the company thought eBay ever took any notice whatsoever of their users' concerns]
I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.
Contrast that with today's form of reCaptcha where you identify stop signs/crosswalks/et c. for Google's benefit, but at the same time you're also improving...oh, wait, Google again. It almost seems like forced labor, in a sense.
>I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.
be careful, else they start sending you dead horses head and planting gpses in your cars https://www.justice.gov/usao-ma/pr/two-former-ebay-executive...
I simply refuse to waste my time and drive up my blood pressure by doing unpaid training work for Google's AI, in order to visit some crappy website. I really wish more people would start boycotting any site which uses reCaptcha [or its derivatives], so we could get rid of this blight on the internet.
I've spotted this new hCaptcha junk show up recently on a couple of sites I used to frequent. I don't visit those sites any more. So well done webmasters. Apparently annoying the shit out of visitors to your site tends to drive them away. Who'da thunk it?!
eBay will CAPTCHA me after I enter my e-mail address, and then again after I enter my password too. Every time. And I'll be damned if I don't "fail" this CAPTCHA at least once a week, with it telling me to try again.
Come on, there are only so many mountains/hills, taxis, traffic lights, bicycles, and cross-walks I can look at before I go cross-eyed.
They even have the nerve to suggest that I can avoid this by using the latest version of my browser (Firefox), which I already am and always do.
Then it may surprise you to know that simply preventing automation makes many types of account takeover attacks infeasible in practice. It won't mitigate the attack if you are personally a high value, named target. But most account takeover attacks operate en masse and are coordinated after large security breaches, so having to hand over accounts to a human operator as part of the auth loop would make the campaign uneconomical. It also introduces another step at which an attack can be logged, recognized, fingerprinted and stopped by an incident response team.
This is something your security team would probably gladly tell you about if you asked them. There's also a bunch of talks about this presented at conferences like Blackhat, DEFCON, USENIX, etc.
Stated in another way: not all potential rewards for successful account takeover are high. The modal account in the modal campaign is low value, which is made up for by volume and particular purpose of accessing accounts. If you model these campaigns economically, you can eliminate entire classes of "low margin, high volume" attacks simply by introducing friction that mitigates automation.
Then there is a natural cost-benefit tradeoff as to how much friction is allowable on a per-user basis to prevent the most common types of account takeover attacks.
I run a problem validation community platform. Couple of days back an individual launched automated spam/DDOS attack by commenting an abusive, demoralising text on every single thread by creating different users.
Fortunately, I had systems in place to identify and mitigate it with Cloudflare. So, in this case even genuine users would have received captcha. I found out soon enough who the attacker was from the firewall, he had earlier created an account with his own name and was using the same IP to attack, after I blocked his IP he tried with couple of other IP addresses incl. Tor; but stopped with his activity after couple of hours.
I generally don't like re-captcha because it takes cultural background for granted(e.g. 'Pie' is not a common food worldwide), Accessibility as a disabled person myself and has no mitigation for captcha-solving farms.
But in nuisance cases like the one I detailed above, captcha is the easiest method available en masse.
Or wget to save a set of pages for later.
I understand protecting commenting with captcha, or contact forms. But captcha on regular read-only access to public web pages in the style of Cloudflare is a bit ridiculous.
One thing contact forms should have is a static indication there's a captcha in use. I've filled all too many forms that just sent my written text to void, because I block some domains.
Sadly, there still doesn't seem to be much in the way of micropayment infrastructure.
Plus, it's not just benign read-only scrapers. Have you looked at the spam folder of your email recently? That's what every comment section and user bio and god knows what else would look like if you just blindly allow all automated traffic.
They used to be completely local and even some DIY solutions, evolved to signature updates, but eventually the attacks grew so advanced that only online services could be updated and aggressive enough, which is of course how gmail took over the internet with near perfect spam filter (when was the last time you checked a gmail spam folder).
The last generation of local spam filters were pretty good though. Anyone remember Eudora and Spamnix?
I just use bogofilter, and it worked almost perfectly from the start, just because I saved years upon years of SPAM and HAM. 10's of thousands of messages each.
It got slightly worse over years, because I incrementally only train it on new SPAM but not on new HAM, because of laziness.
People probably have HAM archives, but don't usually save their SPAM, to be able to start using Bayesian spam filters right away with great results.
Personally I find it much better than whatever Google uses. I don't even bother with SMTP level domain/IP blacklists, or reverse IP/domain checks anymore. All mail is just passed right to the mailbox and is then pre-filtered by a bogofilter to SPAM folder that I check once weekly, and barely find any HAM there. I receive about 500k mails a year.
Deleted Comment
What if the majority internet usage is non-interactive, from so-called "bots", what we may refer to as "automated use". Google and Facebook, among others, rely on the use of automation and "bots". The non-interactive clients ("bots") being used by these companies are not asked to solve captchas. (In turn, after collecting data from public sources, these websites attempt to prohibit the use of automation by their users wishing to access it. What is interesting is that neither company provides any definition of "automated" nor any clearly stated limits on the speed at which a user may access resources or the quantity of resources they may access in a stated time period. One might be apt to find such limits associated with an "API".)
In 2013 an Incapsula report suggested that the majority of internet usage is in fact automated and not "malicious"^1 -- what if public information sources on the internet catered to the use of automation rather than trying to limit such use, e.g., with speed bumps^2 like "captchas". What if servers treated all clients equally, instead of having data forcibly collected by a few large clients that receive preferential treatment, then siloed and protected from "automation". What effects would this have on "centralisation" and levelling the playing field.
"Do not ask for permission, ask for forgiveness." What does it really mean when applied to the internet. Perhaps it means there is an endemic lack of clarity about "the rules". Prohibiting "automation" is far too vague and in many cases it makes no sense. The growth of computers and the internet is the growth of automation. Both servers and clients may have concerns about resource utilisation. Websites do not ask for permission when they decide to use large amounts of the user's computer resources.
Consider that a Google could not exist without being "given permission" to use automation. Does the GoogleBot have to solve captchas. No automation means no company such as this could exist. How useful would the web be without anyone being able to use automation to create an index. Based on the HN comments about web search I have read over the years, I would guess that for many commenters, it means the usefulness of the web would be dramatically reduced.
Imagine an automation-friendly internet. The truth is, I think (the data shows) we already have one, except we are in denial that "the rules" actually allow it. An early metaphor for internet and web use was "surfing". It may be that those who are constantly fighting against automation are fighting against the waves instead of riding them. Time will tell. It stands to reason, IMO, that every internet user, whether a server or a client, should be expected to use automation.
1. https://www.incapsula.com/blog/bot-traffic-report-2013.html
2. An early metaphor for the internet was a "superhighway". Speed bumps would seem out of place on a superhighway.
If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information. You can have other solutions, e.g. in a login flow that splits the username and password entry, it's advantageous to put the captcha between those two steps. But even in those solutions the display of the captcha must be independent of password correctness.
Presumably, if the person has entered the right username and password they're going to get access to the service at which point they'll know they entered the right one. What information exactly is leaked here?
> An incorrect login says so without presenting a captcha.
I inspected the source code of Google's reCaptcha offering and was disgusted at how many bits of information they were collecting. They also seem to be fingerprinting users so they can't keep registering new accounts on a platform, locking out anonymous users who are usually the best types of users on the platform, as IMHO anonymous voices are (usually) the best voices, or at least the more interesting of voices.
Google's reCaptcha code seemed to be very keen on knowing my 'cadence' or the way I used my mouse and how quickly (or how slow) I completed the captcha. It also looked at things like timezone, screen resolution, battery charge level etc So they could determine if it was 'you' who was using the captcha, soon after, in a separate session (even on a different device!)
I'd bet a good amount that they store that along with all the other personally identifying info they have on you (and google of course has a massive amount of that); which is basically why after a single reCAPTCHA solve, you wont see them prompt you again for ages - they know who you are.
I almost want to just add a "DeathByCaptcha" extension to handle these for me and pay a few cents for every page I visit, lol
If only. If the same site has reCaptcha across more than one page, within mere minutes of having to slog through multiple screens of one, I can guarantee I'll be doing it again.
And I'm never sure if Google has served me either a very long sequence of reCaptchas, or whether they've decided I'm not a person and are serving me an infinite reCaptcha.
https://www.hcaptcha.com/accessibility
hCaptcha is not easy as is being claimed here. I have lost a lot of time and been blocked from much content due to hCaptcha.
It won’t solve the privacy issues but at least you’re not working on google’s training set anymore and captchas are automatically solved for you.
I forgot my password to one site and tried about 2 or 3 different passwords and in-between each it asked me to do about 7 or 8 of those labelling exercises. I finally just gave up and left the site.
Not only that, but the labelling exercises weren't clear. It wanted me to label a "公交車" which means more like a public city bus and there were also school buses which would normally not be called that in Chinese so I didn't label them but Google thought they were part of that class, and wouldn't let me proceed without me labelling them, and furthermore, punished me with more "hard" exercises like that. I guess they are trying to turn me into a stupid bot.
Good luck to whatever self driving car they are training using this data.
For sure enable javascript to get the fancy stuff, but mostly we just want to read the text, view the picture and see the video.
Deleted Comment
Dead Comment
Deleted Comment
I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?
Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.
> Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.
That's definitely a part of it, but we also have a number of other large sites and services that use hCaptcha to protect against bots, and more that get added every day because of our more advanced bot detection special sauce.
> I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?
From what we've seen, the integration process is generally smooth, especially if you're a previous reCAPTCHA user, since we keep the interface and workflow largely the same.
Solving is roughly the same although we have a number of other protections that irritate bot maintainers and get activated when we detect them.
Not sure if the majority of people are aware of the change, I'm sure some technically savvy people pick up on it more than not.
> Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.
That's actually one the top reasons we've had a lot of customers come over to us; we put a heavy emphasis on user privacy / security, including adopting/supporting privacy-preserving protocols (PrivacyPass, Tor), and minimal retention of data (see our data privacy policy on our site).
Please do better. You're blocking off a non-trivial amount of the Internet to blind users. You will eventually be sued for this.
However, I checked secondary markets where you can pay a human to solve a captcha.
It takes a professional captcha solver 70 seconds to solve an hCaptcha but only 15-20 seconds to solve a reCaptcha. Is that typical? That seems horrible.
The market rate for a captcha solution is 1-3 cents, which is clearly worth it, until you think of the ethics of paying someone slave wages so you can browse the internet slowly, but at least without breaking concentration.
Have you considered a more ethical approach, like micropayments that go to charity or something?
This is completely anecdotal (and seems antithetical to the typical HN response to hCaptcha vs ReCAPTCHA), but I feel like I end up spending at least twice as much time trying to solve hCaptchas successfully because they have a lot less consistency in the objects you're searching for. I always have to zoom in to the modal and carefully search through each image, which invariably breaks whatever flow I'm in (moreso than other captchas).
For example, here's a screenshot from the hCaptcha website's "try it out" section [1] -- I barely recognized either boat in image #1 because it was so small. I missed image #3 because I didn't realize it was a huge cruise-esque boat (so big you can't even see any water) and I spent a good amount of time deliberating on #4 because, well, it looks like a car + windshield but... on the water? If it's a boat, I can't really tell, but I marked it as one solely because of the water in the background. Not sure if it was right or not.
It also seems to occasionally provide "find all the X" challenges without there actually being any X, which feels super cognitively weird ("am I just not seeing it?!").
I'd say ReCAPTCHA's main problem is deciding whether mostly-consistent objects being partially in-frame is enough to "count", whereas hCaptcha's main problem is actually recognizing the widely-varying objects in the frame. I think the former is a little more frustrating when you get something wrong, but the latter is mentally "harder" and takes more time on average, for me at least.
[1] https://i.imgur.com/uyqvs5u.png from https://www.hcaptcha.com/
... If there was an on-premise captcha implementation that actually worked, that would be great.
https://community.cloudflare.com/t/stop-using-hcaptcha/15896...
For a change that affects "15% of the internet" this seems like very little negative feedback in a period of 8 months.
> Most people do the convenience from Google CAPTCHA, although they sell some kind of info, but they won’t hurt you
I can't even...this is the Cloudflare forum wow.
I've personally had a few hiccups with hCaptcha quite some time back as I "wasn't sure what I was looking for" and consistently fail on VPNs. But in recent months these there's definitely been substantial improvement , and needless to say I hope to see hCaptcha be the majority provider
Although having said that, maybe I am hitting it and that I've been unaware and uninterrogated is high praise! Hm.
Absolutely. Having to solving only one captcha every few days beats solving 5 or 6 on each page visit. hcaptcha supports privacy pass but Recaptcha doesn't.
I really wish we could find something relatively foolproof that didn't rely heavily on tracking or really good vision.
I wish people would just face up to the reality that challenge-based CAPTHCA techniques have failed, and stop using them.
Also top-notch customer support. The CEO was personally in the slack channel helping us. Highly recommended.
[1]: https://blog.repl.it/anon
[2]: https://news.ycombinator.com/item?id=25004476
Interesting didn't realize this was a thing hcaptcha did[0]. It's basically recaptcha in terms of tracking which sites you visit then, no?
0: https://docs.hcaptcha.com/invisible
Deleted Comment
Why they get on these IP lists is I think because it's a general consumer ISP and probably a lot of people get bot nets on there.
Majority of people complaining about captcha need to look at their system first. Of course any detection system has false positive, but the false positive rate is not double digit percentage in vast majority of cases.
Captchas filter out the 90% bulk of automated abuse.
Btw, web scraping is on the nearly harmless side of abuse.