seba_dos1 (u/seba_dos1)

seba_dos1 commented on We put a coding agent in a while loop github.com/repomirrorhq/r... · Posted by u/sfarshid

We, too, are just auto-complete, next-token machines.

seba_dos1 · an hour ago

We are auto-complete next-token machines, but plastic and attached to many other not less important subsystems, which is a crucial difference.

seba_dos1 commented on We put a coding agent in a while loop github.com/repomirrorhq/r... · Posted by u/sfarshid

throwaway290 · 3 hours ago

I don't think LLM can generate good docs for not self documenting code:) Any obscure long function you can't figure out yourself and you're out of luck

seba_dos1 · 2 hours ago

Yeah, when I see all those hyped people, I keep wondering: had they not spent enough time with LLMs to notice that yet, or is what they work on just so trivial for it to not matter?

seba_dos1 commented on In the long run, LLMs make us dumber desunit.com/blog/in-the-l... · Posted by u/speckx

hnuser123456 · 3 days ago

When working well, they enable us to offload needing to memorize a wikipedia worth of information and think about higher level problems. We become more intelligent at higher level solutions. Of course people don't know what was written if they were required to "submit an essay" where the main grade is whether or not they submitted one and the topic may have been one not interesting to them. Ask people to write essays about things they're truly, honestly interested in, and people who have access to an LLM are likely able to enrich their knowledge faster than those without.

seba_dos1 · 3 days ago

Don't forget to make fact checking a part of the evaluation though.

seba_dos1 commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

rfoo · 4 days ago

> and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant

Now I get why people are so weirdly being dismissive about the whole thing. Good luck, it's not going to "burst" any time soon.

Or rather, a "burst" would not change the world in the direction you want it to be.

seba_dos1 · 4 days ago

Not exactly sure what you're talking about. The problem is caused by tons of shitty companies cutting corners to collect training data as fast as possible, fueled by easy money that you get by putting "AI" somewhere in your company's name.

As soon as the investment boom is over, this will be largely gone. LLMs will continue to be trained and data will continue to be scraped, but that alone isn't the problem. Search engine crawlers somehow manage not to DDoS the servers they pull the data from, competent AI scrapers can do the same. In fact, a competent AI scraper wouldn't even be stopped by Anubis as it is right now at all, and yet Anubis works pretty well in practice. Go figure.

seba_dos1 commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

dale_glass · 4 days ago

How is it going to hurt those?

If it's an actual botnet, then it's hijacked computers belonging to other people, who are the ones paying the power bills. The attacker doesn't care that each computer takes a long time to calculate. If you have 1000 computers each spending 5s/page, then your botnet can retrieve 200 pages/s.

If it's just a cloud deployment, still it has resources that vastly outstrip a normal person's.

The fundamental issue is that you can't serve example.com slower than a legitimate user on a crappy 10 year old laptop could tolerate, because that starts losing you real human users. So if let's say say user is happy to wait 5 seconds per page at most, then this is absolutely no obstacle to a modern 128 core Epyc. If you make it troublesome to the 128 core monster, then no normal person will find the site usable.

seba_dos1 · 4 days ago

> How is it going to hurt those?

In an endless cat-and-mouse game, it won't.

But right now, it does, as these bots tend to be really dumb (presumably, a more competent botnet user wouldn't have it do an equivalent of copying Wikipedia by crawling through its every single page in the first place). With a bit of luck, it will be enough until the bubble bursts and the problem is gone, and you won't need to deploy Anubis just to keep your server running anymore.

seba_dos1 commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

tptacek · 4 days ago

I just said you didn't have to justify it. I don't care why you run it. Run whatever you want. The point of the post is that regardless of your reasons for running it, it's unlikely to work in the long run.

seba_dos1 · 4 days ago

And what I said is that all these most visible deployments of Anubis did not deploy it to be a content protection system of any kind, so it doesn't have to work this way at all for them. As long as the server doesn't struggle with load anymore after deploying Anubis, it's a win - and it works so far.

(and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant)

seba_dos1 commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

tptacek · 4 days ago

I'm not moralizing, I'm talking about whether it can work. If it's your site, you don't need to justify putting anything in front of it.

seba_dos1 · 4 days ago

Did you accidentally reply to a wrong comment? (not trying to be snarky, just confused)

The only "justification" there would be is that it keeps the server online that struggled under load before deploying it. That's the whole reason why major FLOSS projects and code forges have deployed Anubis. Nobody cares about bots downloading FLOSS code or kernel mailing lists archives; they care about keeping their infrastructure running and whether it's being DDoSed or not.

seba_dos1 commented on Vibe coding creates a bus factor of zero mindflash.org/coding/ai/a... · Posted by u/AntwaneB

scarface_74 · 4 days ago

Reviewing code another person wrote also takes longer than code I wrote. Hell reviewing code I wrote six months ago might as well be someone else’s code.

My job right now depending on the week is to either lead large projects dealing with code I don’t write or smaller “full stack” POCs- design, cloud infrastructure (IAC), database, backend code and ETL jobs and rarely front end code. Even before LLMs if I had to look at a project I did it took me time to ramp up.

seba_dos1 · 4 days ago

> Reviewing code another person wrote also takes longer than code I wrote.

Yes, and water is wet, but that's not exactly relevant. If you have an LLM generate slop at you that you have to review and adjust, you need to compare the time this whole process took you rather than just the "generating slop" step to the time needed to write the code by yourself.

It may still save you time, but it won't be anywhere close to 2 minutes anymore for anything but the most trivial stuff.

seba_dos1 commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

st3fan · 4 days ago

"But still enough to prevent a billion request DDoS" - don't you just do the PoW once to get a cookie and then you can browse freely?

seba_dos1 · 4 days ago

Yes, but a single bot is not a concern. It's the first "D" in DDoS that makes it hard to handle

(and these bots tend to be very, very dumb - which often happens to make them more effective at DDoSing the server, as they're taking the worst and the most expensive ways to scrape content that's openly available more efficiently elsewhere)

seba_dos1 commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

tptacek · 4 days ago

Respectfully, I think it's you missing the point here. None of this is to say you shouldn't use Anubis, but Tavis Ormandy is offering a computer science critique of how it purports to function. You don't have to care about computer science in this instance! But you can't dismiss it because it's computer science.

Consider:

An adaptive password hash like bcrypt or Argon2 uses a work function to apply asymmetric costs to adversaries (attackers who don't know the real password). Both users and attackers have to apply the work function, but the user gets ~constant value for it (they know the password, so to a first approx. they only have to call it once). Attackers have to iterate the function, potentially indefinitely, in the limit obtaining 0 reward for infinite cost.

A blockchain cryptocurrency uses a work function principally as a synchronization mechanism. The work function itself doesn't have a meaningfully separate adversary. Everyone obtains the same value (the expected value of attempting to solve the next round of the block commitment puzzle) for each application of the work function. And note in this scenario most of the value returned from the work function goes to a small, centralized group of highly-capitalized specialists.

A proof-of-work-based antiabuse system wants to function the way a password hash functions. You want to define an adversary and then find a way to incur asymmetric costs on them, so that the adversary gets minimal value compared to legitimate users.

And this is in fact how proof-of-work-based antispam systems function: the value of sending a single spam message is so low that the EV of applying the work function is negative.

But here we're talking about a system where legitimate users (human browsers) and scrapers get the same value for every application of the work function. The cost:value ratio is unchanged; it's just that everything is more expensive for everybody. You're getting the worst of both worlds: user-visible costs and a system that favors large centralized well-capitalized clients.

There are antiabuse systems that do incur asymmetric costs on automated users. Youtube had (has?) one. Rather than simply attaching a constant extra cost for every request, it instead delivered a VM (through JS) to browsers, and programs for that VM. The VM and its programs were deliberately hard to reverse, and changed regularly. Part of their purpose was to verify, through a bunch of fussy side channels, that they were actually running on real browsers. Every time Youtube changed the VM, the bots had to do large amounts of new reversing work to keep up, but normal users didn't.

This is also how the Blu-Ray BD+ system worked.

The term of art for these systems is "content protection", which is what I think Anubis actually wants to be, but really isn't (yet?).

The problem with "this is good because none of the scrapers even bother to do this POW yet" is that you don't need an annoying POW to get that value! You could just write a mildly complicated Javascript function, or do an automated captcha.

seba_dos1 · 4 days ago

> The term of art for these systems is "content protection", which is what I think Anubis actually wants to be, but really isn't (yet?).

No, that's missing the point. Anubis is effectively a DDoS protection system, all the talking about AI bots comes from the fact that the latest wave of DDoS attacks was initiated by AI scrapers, whether intentionally or not.

If these bots would clone git repos instead of unleashing the hordes of dumbest bots on Earth pretending to be thousands and thousands of users browsing through git blame web UI, there would be no need for Anubis.