Is Google reCAPTCHA GDPR Compliant?

CAPTCHAs are overused because of groupthink and fashions/fads. Before you use a CAPTCHA of any kind, consider very carefully if you really need one.

I've seen this a number of times in design meetings: someone will say "oh, an account registration form, we will of course need a CAPTCHA there", everyone will nod their heads and move on. In reality, in most of those cases, no one will ever conceivably even try to automate/script the thing being designed.

janpieterz · 3 years ago

Thought the same, had a pleasant signup form for a small SaaS platform nobody really knows about, with no captcha. Then someone or some group found it and there's been a barrage of attacks varying in intensity, vectors etc. Cost us so much money in vendor costs the small company is now in danger of going bankrupt.

I appreciate the sentiment, as I had it, but rest assured any future publicly accessible form I build will get at least a CAPTCHA in front of it.

newaccount74 · 3 years ago

I have a bunch of publicly accessible forms and none of them have captchas.

I did once run into an issue where a signup form was abused by a spammer, but that was a simple fix (tip: in verification emails, do not include any information that the user typed in the form).

If you are careful with your forms, you don't need captchas. Captchas add a lot of friction for some users, so if they can be avoided, they should be.

lapser · 3 years ago

A long time ago, I was still in college (UK college, i.e., pre-university), and still learning.

I discovered a classmate was involved in some event, and found the event's website. They didn't have a captcha. By your logic, this was the right choice.

In reality, my dumb ass decided it would be fun to script something that would register millions of users (another classmate ran the script with me). After a few hundred thousand registration, the website was brought to its knees. I was a bit shook, but didn't think much of it.

Next morning I come into class, and was reprimanded by my teacher. Turns out, the owner of said event had threatened to sue the school and me, among other things. What had happened was their servers were down, their email server was brought to its knees, their web servers had died, and generally I had caused a lot of damage without even thinking about it. It caused them to potentially lose some money. None of this was my intention, of course, but I didn't know much better.

Point is, kids will kid, and spammers will spam. There are plenty of bots that just scrape the internet and fill out forms indiscriminately.

Captcha may or may not be the best option here (I'm always of the opinion it's not, especially not reCAPTCHA), but something has to be put in place, even if to stop the majority of bad actors.

asddubs · 3 years ago

you can also just limit the amount of sign ups from one IP each day. There's more simple heuristics to prevent unsophisticated abuse like that

chollida1 · 3 years ago

Life as a developer has taught me to take the other side of your argument. I'd disagree on this.

Once you release something to the wild you need to have robust controls in place to prevent one person or group of people from using all your resources.

I wouldn't release a product that doesn't have rate limiting of some kind, of which a captcha is one way to rate limit.

Always trust people to push the boundaries of your app as far as they possibly can. I have yet to build a system where someone doesn't. And that includes tools I've built for inhouse users:(

Whether intentionally or not, they always find a way to push the boundaries:)

danuker · 3 years ago

> no one will ever conceivably even try to automate/script the thing being designed.

Spammers will spam everywhere they can. My minuscule personal site suffers from it very rarely, but I can imagine anyone getting a lot of page views making it worth it.

yonixw · 3 years ago

On my custom built site I have none of those. But, on my WordPress site, I had to install captcha the second days. Spammers are just using scripts, which cost next to nothing...

heipei · 3 years ago

I don't know. I run a SaaS that allows free user signup and significantly more than 50% of my daily signups are just signup "spam", without any visible motivation for doing so. The user name or information doesn't show up anywhere publicly and there is no inherent value in having a free user account. I've implemented some basic countermeasures (dummy form fields which reject the submission) which wasn't enough. I've added reCaptcha, and I'm still getting 50% spam signups from working (!) gmail addresses, meaning someone is able to receive emails on these. The majority of these are from places like India, Bangladesh, Vietnam, etc.

I don't event want to know what my site would look like without my own countermeasures + reCaptcha + if it was a service where a user account has any kind of "value"...

Deleted Comment

daveoc64 · 3 years ago

Is there a particular problem if someone signs up for an account on your system and doesn't use it?

Is such an account using a lot of resources?

vincnetas · 3 years ago

CAPTCHA on registration page removed quite a bit of automated registrations. What are other options to prevent/reduce automated registrations? (one from top of my head email/phone verifications)

realusername · 3 years ago

hidden fields will remove most of the non targeted attacks.

And if they really are targeted, I don't think CAPTCHA will help much.

giancarlostoro · 3 years ago

We use Auth0 which determines when to show a captcha, I think "Smarter Captcha" should be the industry standard. If you don't suspect the end-user being a bad actor, why show them a captcha every time. In fact, Google's Captcha is awful for literally almost always showing it, tells you they dont care about stopping bots, only the data they get from user inputs.

Edit: And come to think of it, A TON of websites do "smarter captcha" or whatever you want to call it, because in one of my computer I enabled the resist fingerprinting setting on Firefox, and I get a captcha every visit on some sites that NEVER show a captcha (I think it might be cloudflare driven, but unsure). Like Walmart comes to mind, it shows me a pill looking thing where I have to hold the mouse click until it fills.

a_c · 3 years ago

It took me one incidence to turn from "no one will ever conceivably even try to..." to "everyone will nod their heads and move on"

staringback · 3 years ago

Years ago I had a blood test taken at a local pathology place, the form they were submitting had a CAPTCHA and pictures they were given weren't easy by any means. I'm talking the kind of stuff you get trying to go to google.com on Tor browser.

As far as I could tell this was an internal form that wasn't publicly accessible

Deleted Comment

mschuster91 · 3 years ago

> In reality, in most of those cases, no one will ever conceivably even try to automate/script the thing being designed.

There are more than enough people running automated crawlers, probably fed from Google "inurl: contact-form" searches or whatever, and just blanket spam you.

efields · 3 years ago

We ignored them until we needed them. Then we needed them.

V__ · 3 years ago

This is in line with my experience as well. For most sites, CAPTCHAs are overkill and an accessibility problem. Hidden honeypot, maybe a simple “How much is 5 + 2” keeps 99% of spam out. I had a few more difficult cases, which were solved by blocking some geographic IP regions and adding blacklists for certain words, like “crypto” for example.

frodowtf · 3 years ago

I'm not an expert on honeypot inputs but wouldn't it be super easy to check for type=hidden or opacity=0 if you'd like to spam?

revicon · 3 years ago

Remember that Google reCAPTCHA v3 is invisible to the user. No accessibility issues.

Pxtl · 3 years ago

If you writing your own account registration form instead of using something off-the-shelf that provides captcha service for you, or even better are just using an oAuth or similar technology so users don't have to manage yet-another-password? I already hate you.

smeagull · 3 years ago

Spam is ever present, and Captchas protect from the massive torrent of trash.

Zardoz84 · 3 years ago

i had to put a CAPTCHA system on a public register form for digital libraries, because they were getting spammed by bots.

The mistake here in Europe is that an ip address is considered personal information:

https://www.ra-plutte.de/lg-muenchen-dynamische-einbindung-g...

This makes it impossible to use any components hosted by third parties without getting consent by the users. And for components hosted in a different country than the visitor, even consent might not make using those external components legal.

This is bad and not in line with reality. The IP can only be turned into personal information via cooperation of the users internet provider.

So in Europe, the whole internet is made illegal based on a wrong assumption.

mk89 · 3 years ago

As a European, I don't consider this a mistake, for the simple reason that the IP address is so easily abused by trackers and people with bad intentions - the extent of abuse that we have experienced until now is absolutely ridiculous.

Hell, even a small startup with a few thousand euros can start to track and trace user behaviour on a massive scale that in reality you wouldn't or shouldn't be able to do.

The tooling (free, cheap and not) at our disposal nowadays makes everything so easy that even something that in theory should not serve as identifier can be used to identify you - so let's start with the most common ones: IP, email, etc.

The Internet in Europe is not illegal - it's just pure BS that a simple page like reuters.com contains references to 14 external scripts when loaded, when actually all you need is 2 maybe 3 scripts (the CDN to load images and videos + the page itself) - the rest is crap used explicitly to identify and market people - that's it: Ads, Ads and probably uglier things to do just to profile people online.

jansan · 3 years ago

As another European I consider this a grave mistake and I am not surprised that we do not see many successful startups in Europe. It is part of the narrow-minded micro managing mindset that too many European politicians (especially the greens) have.

Instead of finding an innovative solution (how about mandating ISPs to make IP addresses unmappable to a user?) they only know one solution: Making things illegal, even if in almost all cases the use is benign or even makes a lot of sense.

smeagull · 3 years ago

IP isn't personal, isn't unique, and isn't identifying.

dutchbrit · 3 years ago

> This is bad, because the IP can only be turned into personal information via cooperation of the users internet provider.

Not 100% true, you can often trace back users by IP using leaked databases and through companies that sell user data. Might not be legal, but you definitely don't need cooperation from a ISP.

ericpauley · 3 years ago

In that case the database is the PII, not the IP.

themitigating · 3 years ago

If the IP is dynamic then how would you know who had it at the time?

zpeti · 3 years ago

You might be able to trace an IP back to a user, but you're absolutely not guaranteed that that IP was only used by that particular user.

Therefore even on a technical level this EU legal interpretation is insane, hundreds or thousands of people can potentially use the same IP address, how is that personal information then?

mcpackieh · 3 years ago

> This makes it impossible to use any components hosted by third parties without getting consent by the users.

Sounds good to me, that's the way it should be. I shouldn't have to use third party extensions to stop my browser from automatically loading facebook crap every time I visit websites that aren't facebook. Companies should only include 3rd party components in their websites if there is a very good reason for it, and only then after the user has explicitly consented to it.

prepend · 3 years ago

> The IP can only be turned into personal information via cooperation of the users internet provider.

It’s not a direct identifier but with geo-ip or other data, it can identify an individual (eg, have 100 possibilities and geoip narrows it down to only 1 in that region based on IP).

The PII aspect isn’t based on getting a link from the provider. The PII aspect is based on the IP itself standing out in data and allowing reidentification. It’s not 100% accurate, but accurate enough to make money off advertising.

tremon · 3 years ago

It’s not a direct identifier

Except when it is. I have a semi-permanent home IP (it only changes when the MAC address on my router changes and I get assigned a new lease) and only one user in my home. My IP address pretty uniquely identifies me.

smeagull · 3 years ago

They're not stable in time either. And they can be misleading if you try to use them to geolocate a user. The ARIN for my IPs makes me appear 500km away.

RobotToaster · 3 years ago

>This makes it impossible to use any components hosted by third parties without getting consent by the users.

Good.

There's very few legitimate uses for third party hosted proprietary components.

Why it became standard to load simple things like scripts or fonts from third parties, that can be trivially hosted locally, is beyond me.

jansan · 3 years ago

So hosting an ad is now illegitimate? How about embedding a video? And why should you not embed stuff from a CDN?

jacquesm · 3 years ago

> 'So in Europe, the whole internet is made illegal based on a wrong assumption.'

Is based on a wrong assumption.

Semaphor · 3 years ago

It should be mentioned that we don’t have case law the same way the US does. So one court deciding this does not mean that this is now law.

jdsnape · 3 years ago

That is not quite completely true - In many cases you can associate it to a user based on their activity too. For example, they might logon which would link the IP to an identity.

janosett · 3 years ago

Yes, but the IP itself is not personal without the connection to other information. I do think considering IP address personal is a bit of a reach, especially given the common case of ephemeral addresses.

jdietrich · 3 years ago

>This makes it impossible to use any components hosted by third parties without getting consent by the users.

There are six lawful grounds for processing personal data under GDPR; only one of those grounds is consent. Consent is not always necessary, nor is it always sufficient.

An IP address is potentially personal data, because it could relate to a natural living person. There are all sorts of legitimate reasons to use that data without consent, the most obvious being to fulfil a request by the user. You will run into issues if you're using that data in ways that aren't strictly necessary - keeping logs indefinitely, using that data for marketing purposes, sharing that data with third parties without good reason and without adequate safeguards etc.

https://gdpr-info.eu/art-6-gdpr/

olivierduval · 3 years ago

Yeah... it could be nice if people stop spreading FUD on GDPR ;-)

All GDPR is asking mostly is: you only gather minimal PII to provide a service (if needed at all). If you use PII for another purpose than providing the service or meeting operational purposes (like fraud detection or monitoring your infra), then you must obtain the consent of the user (for marketing or selling your users data for example). This extends to your providers too (like Google Analytics...)

The problem is that a lot of "internet services" take for granted that they can do whatever they want with the data they got for a specific purpose... even without informing the user! And that's not good... so GDPR has been created.

But if your service is "fair" to the user (meaning: you only use the datas to provide the service), then there's no problem...

TekMol · 3 years ago

Unfortunately, a judge can always say that using an external resource was not necessary to fulfil a user request.

As they did in the judgement I linked to.

A judge can always claim you could have used a local version of whatever external resource you used.

yonixw · 3 years ago

I was wondering about it too, but I guess that for some customers in rural places (everywhere big like USA an EU) IP address is as good as home address. Combine it with some providers that will not change your IP until manually requested (and not until router restart) and you have a real PII on your hand.

kalleboo · 3 years ago

I used to live somewhere where the reverse-DNS for my IP was literally my home address (student housing network)

that_guy_iain · 3 years ago

> This makes it impossible to use any components hosted by third parties without getting consent by the users. And for components hosted in a different country than the visitor, even consent might not make using those external components legal.

This is based on some very faulty knowledge of GDPR and the law.

You are allowed to process data without the consent of the user for various things. This would include their IP address. You're allowed to have third party data processors process data on your behalf without user consent for various things.

The Google Font ruling was partially due to who Google is. Google data mines, they're famous for it. So giving Google data they can use to map to your internet persona which may even be linked to your name directly is obviously something many people want to do only when they consent. The fact Google Fonts could be self hosted was another part of the reason for the ruling. That is, sharing the information wasn't required to be able to perform what the actions they wanted to perform, use a font.

Data processing done by US companies is not currently GDPR compliant. However, no one is enforcing that. It would be a complete mess and there are far too many. In reality, everyone is ignoring it waiting for the new laws to be created to make it legal. The reason the US companies are an issue is a US court can issue a judgement to a US company and they are forced to comply no matter where the data is.

> This is bad and not in line with reality. The IP can only be turned into personal information via cooperation of the users internet provider.

This is also not true. If you visit a website that sells B2B accounting software and your IP is identifiable to a company. You could phone up the company and ask to talk to the person who is responsible for finance. If there is only one person, boom easily identifed. There are also various other ways.

> So in Europe, the whole internet is made illegal based on a wrong assumption.

Really, your comment is wrong based on multiple wrong assumptions.

red_trumpet · 3 years ago

> So in Europe, the whole internet is made illegal based on a wrong assumption.

No, just websites which include components hosted by third parties. This is not the whole internet (e.g. HN doesn't include third party components).

TekMol · 3 years ago

Use the search function on the bottom of HN. It is provided by a different company.

HN is also hosted by a different company. They get your IP too.

sfg · 3 years ago

If there was a proxy service that acted as an ip mask, and there was a list of the ip addresses of such masking proxies, then could EU customers using such services solve the issue?

cccbbbaaa · 3 years ago

Yes, this is what the Cnil suggests for people that want to use Google Analytics.

jansan · 3 years ago

Yes, but making stuff illegal is much easier than being inventive.

y7 · 3 years ago

Indeed, the GDPR defines "personal information" as (Article 4 sub 1)

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

This is not out of step with reality nor a wrong assumption, it is simply a definition. It is motivated somewhat in the considerations of the GDPR.

> (26) The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

> (30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

martin_a · 3 years ago

> This makes it impossible to use any components hosted by third parties without getting consent by the users.

That is simply not true, please do not spread misinformation like this.

Using/Embedding third-party resources is allowed IF it is e.g. technically necessary to provide the service or core functionality at all.

Collecting personal information and using a third-party service to do so in a shop checkout? That's okay.

Collecting personal information and shoving everything into Google Analytics because you want to know how many people visited your site? Not so okay, there are less intrusive ways to do that.