Breaking the 4Chan CAPTCHA

cherryteastain · a year ago

The part about bad Keras<->Tensorflow.js interop is classic Tensorflow. Using TF always felt like using a bunch of vaguely related tools put under the same umbrella rather than an integrated, streamlined product.

Actually, I'll extend that to saying every open source Google library/tool feels like that.

alecco · a year ago

related (15 days ago)

https://news.ycombinator.com/item?id=42130881 on Francois Chollet is leaving Google

> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.

Retr0id · a year ago

something something Conway's law

Dachande663 · a year ago

Semi-related but I needed a CAPTCHA on my site[0] mainly to block comment form spam and settled on repurposing a fun method I’d seen before. Is definitely not foolproof (or hard at all), but I really liked making it.

[0] https://www.hybridlogic.co.uk/contact

vunderba · a year ago

Reminds me of the Doom captcha.

https://vivirenremoto.github.io/doomcaptcha/

Dachande663 · a year ago

99% certain this is where I copied the idea from.

winrid · a year ago

It says I've been blocked when I try to view that. Not on a VPN.

Dachande663 · a year ago

The site runs off of a tiny little server at home so I’ve got some very aggressive firewall rules. Anything from the usual bad countries, certain signatures etc are blocked. Reduced traffic to 1% of previous load.

EasyMark · a year ago

Are you in a safari browser?

chamomeal · a year ago

No way, that is a cool fucking captcha!!

tayiorrobinson · a year ago

Cool, sure, good, probably not. I've never played Halo so I didn't entirely know what I was doing (do I shoot the blue guys too? it's not letting me through so I guess I do), and I don't doubt people couldn't even get what it meant by shoot. And god forbid anyone with disabilities that affects their mouse accuracy, or needs a screen reader tries to use it

Haven't looked at the devconsole but it'd probably be easily bypassed by someone dedicated.

account42 · a year ago

Cool as a one-off use on some random blog contact form. Infuriatingly annoying if used somewhere you have to solve it with any frequency.

bawolff · a year ago

There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.

https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting

However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract

However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.

noprocrasted · a year ago

Just because you can technically crack them doesn't mean they're useless.

There's a significant amount of time, skill and effort that went into the solution from this post, and the end result doesn't generalize well (you'd have to start all over for a different kind of captcha).

The vast majority of spammers would not be able to replicate this; those who do would either make money legitimately, or focus their skills on juicier targets (if you have AI/ML skills and want to do nefarious things there are other options that pay much better than spamming).

Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.

reaperman · a year ago

So, I do this type of AI development for solving CAPTCHAs.

I can't get any real jobs that pay me for my more advanced skills. My primary sins were going to a second/third-tier university and some performance concerns in a portion of my previous roles due to divorce and burn-out. I make $80k/year in government IT, and $30-150k/year as the "AI" guy in a small 2-5 person group that offers a CAPTCHA-breaking API.

The spammers aren't the ones replicating this. They just pay B2B rates (combo of SaaS + Consulting, depending on client needs) to help them remove the roadblocks.

fragmede · a year ago

> there are other options that pay much better than spamming

Are there? Say you've got a felony record and can't get a legit AI/ML job at eg OpenAI/anywhere. What would you do instead? most of the options I can think of involve getting paid for doing things that are basically spam if you zoom out enough.

hamilyon2 · a year ago

Captchas are now useful to distinguish well-intentioned bots (they stop whenever they see captcha) from malicious ones, which solve them, but still behave a lot like bots.

Well-intentional bots are first-class citizens

TZubiri · a year ago

Interesting, subtle difference but I always thought of captchas as having computational difficulty, but that's clearly not the point as you say. The cost is not compute but developer time.

If you manage crack it at 1mhz per captcha or 1ghz or 1000ghz, it makes no difference, as the bottleneck is the network identifier (ip address/block)

While still a type of PoW, these economics are different than offline mechanisms like password hashing or crypto. Where a 1ghz cost is still significantly different than 1mhz.

atomicnumber3 · a year ago

The watershed of "good enough at programming to just get a real job" vs "can code enough to be really annoying to businesses, but not enough to hack it as a dev" is a lot more on the annoying side than you'd think.

I say this with the chagrin of someone who works on a cool software product that is also coincidentally really well-shaped to make people want to abuse it.

Deleted Comment

delfinom · a year ago

>he vast majority of spammers would not be able to replicate this;

Eh? They just need to buy their software from someone that can. I would say many of the malware and spamware isn't created by every individual deploying it, but instead vendors that got good at it and decide to make revenue by licensing out their software to other bad actors.

brian-armstrong · a year ago

Makes me wonder what comes next. Could we create a forum where every member must do a 15 minute video interview with a moderator? I know this "doesn't scale" but I think it could make for a funny gimmick.

matchamatcha · a year ago

When I was a teenager, I stumbled upon a music forum that required phone interviews for signing up. They had other interesting sign up rules, like you could not have silly user names (judged by the admin). I guess it served as an effective filter for their member base..

jabroni_salad · a year ago

private torrent trackers are/were doing that. It was really just to make sure you understood how p2p culture works and what the expectations are, and really easy to pass if you just followed a guide. However, I did see many people fail their interview.

ggu7hgfk8j · a year ago

We are increasingly moving to ID checks. Australia law just now. For all its faults it solves spam as side effect.

bobsmooth · a year ago

A small signup fee is much easier.

Deleted Comment

3abiton · a year ago

I think captchas are just another lind of defense to make it harder for actors abusing the system. It's not a solution, just a little (getting outdated) fortification.

poincaredisk · a year ago

Small? From your own link, recaptcha v3 takes 10-15s and costs $1.3 for 1000 captchas. This is actually huge, and cost prohibitively expensive for many things where you would want to use it (like scrapping a large website).

costco · a year ago

Depends on the website, but you don't get always get a recaptcha, so the cost is a lot lower than that. You usually get it if you're exceeding some rate limit or you're doing a sensitive action like registering.

RobotToaster · a year ago

> so really captchas are more like putting a small min amount of effort.

At that point a proof of work captcha (mCaptcha.org is one, but there are others), is probably the best option. Especially with how any reasonably effective traditional captcha is an accessibility nightmare.

cubefox · a year ago

It's completely unclear what a "proof of work" captchas is supposed to be.

nyclounge · a year ago

Wow Funcaptcha cost the most and it is open source.

mieko · a year ago

If you're into this, here's my 2014 breakdown of the Silk Road CAPTCHA: https://github.com/mieko/sr-captcha

mbs159 · a year ago

Intriguing, thanks for sharing!

antirez · a year ago

Appropriate response by 4Chan to this: simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability.

codetrotter · a year ago

> simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability

Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.

https://4chan.org/pass

This is already available to not have CAPTCHA. So if CAPTCHA is totally ineffective, it follows that they should do away with CAPTCHA and free users being able to post at all and everyone should buy the 4chan Pass if they want to post.

fullspectrumdev · a year ago

This kills the board. Users will go elsewhere, fuck all people pay for pass.

ranger_danger · a year ago

Agreed, charging for accounts is the only halfway viable solution I have seen any service use that gives a sizable downtick in the sheer number of bots/spam.

Of course it's not perfect, and it will still happen, but I have yet to hear any better solutions. Please prove me wrong though!

poincaredisk · a year ago

At this point I have to wait 90 seconds before making every post. (maybe because I don't persist cookies). I posted very rarely, but now I just stopped - I get it when someone shows me the door.

matheusmoreira · a year ago

That would work. It would also kill the site.

efilife · a year ago

What? So you use 4chan? It would completely kill what makes this website special

Dead Comment

YeahThisIsMe · a year ago

We've been stuck at that point for at least 5, if not 10, years.

hackernewds · a year ago

Just use Worldcoin retina scans next

gosub100 · a year ago

"Drag each symbol to the group that is most likely to be offended by it."

xp84 · a year ago

Ooh I love this, all off-the-shelf AI won’t touch it due to all their “safety” (aka anti-hurt-feelings) protocols

encom · a year ago

4chan doesn't care about human annoyance. They just started doing a 15 minute post delay, which is infuriating. I had to whitelist 4chan in Cookie AutoDelete.

poincaredisk · a year ago

Hi fellow cookie autodeleter, I experienced the same thing, but I just decided to stop posting. Whitelisting felt too much like giving in to terrorists. I'm considering just not going there in the future. Maybe after all this time I will finally be free.

matheusmoreira · a year ago

Just stop posting there. The whole point of it is to post anonymously in a high traffic forum. The rate limiting timers have reduced traffic to the point many boards feel dead, and their solution to that problem is to sell accounts.

hsbauauvhabzb · a year ago

What is NN?

numpad0 · a year ago

"AI" but pre-COVID

layer8 · a year ago

https://en.wikipedia.org/wiki/Neural_network_(machine_learni...

brodo · a year ago

I am totally in favor of increasing the annoyance of 4chan users.

somat · a year ago

I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.

If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.

Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.

wraptile · a year ago

That's kinda what every major captcha distributor does already!

Even before captcha is being served your TLS is first fingerprinted, then your IP, then your HTTP2, then your request, then your javascript environment (including font and image rendering capabilities) and browser itself. These are used to calculate a trust score which determines whether captcha will be served at all. Only then it makes sense to analyze captcha's input but by that time you caught 90% of bots either way.

The amount your browser can tell about you to any server without your awareness is insane to the point where every single one us probably has a more unique digital fingerprint than our very own physical fingerprint!

encom · a year ago

This is how ClownFlare and its ilk, make life hell on the internet, when you use a "weird" browser on a "weird" OS.

gosub100 · a year ago

Re: your last paragraph, https://coveryourtracks.eff.org/

EFF have been running this for years. Gives an estimate about how many unique traits your browser has. Even things like screen resolution are measured.

zoltrix303 · a year ago

Would it be possible to serve a fake fingerprint that appears legitimate? Or even better mimic the finger print of real users who've visited a site you own for example?

PUSH_AX · a year ago

In that case why do I ever receive a captcha?

kccqzy · a year ago

That's what reCAPTCHA does.

benreesman · a year ago

In my opinion the granddaddy of all 4chan CAPTCHA busts is still Yannick Kilcher’s GPT-J tune on “Raiders of the Lost Kek” set, and might be the coolest thing an LLM has ever done on video: https://youtu.be/efPrtcLdcdM?si=errY0PrEhnX9ylDw

chiph · a year ago

Nearly a full minute of disclaimers and warnings about 4chan. That's got to be a record.

ValentinA23 · a year ago

>I released the model, the code and I evaluated the model on a huge set of benchmarks and it turns out this horrible, terrible, model is more truthful-yes more truthful-than any other GPT out there

Pikamander2 · a year ago

> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.

> TensorFlow.js doesn't support Keras 3.

I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.

sigmoid10 · a year ago

As someone who has been working in ML for years, I can only recommend to stay away from anything recent. Grab an old bayesian statistics textbook and learn the fundamentals, then progress to learning the major frameworks like Pytorch. Try to write every part of a CNN, RNN and Transformer architecture and training pipeline yourself the first time (including data loaders, but maybe leave out CUDA matrix kernels). Stay the hell away from wrappers for other people's wrappers like Langchain. Their documentation is often not just outdated, but flat out wrong regarding the fundamentals. Huggingface is great if you know the basics and thus how to fix things if their standard wrappers break.

rohansuri · a year ago

Any book you would recommend?