rfoo (u/rfoo) - Readit News

rfoo commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

Yes I had that at the start, but people kept complaining they don't know how to actually run terminal commands, hence the shortcut :(

I was thinking if I can do it during the pip install or via setup.py which will do the apt-get instead.

As a fallback, I'll probably for now remove shell executions and just warn the user

rfoo · 2 days ago

IMO the correct thing to do to make these people happy, while being sane, is - do not build llama.cpp on their system. Instead, bundle a portable llama.cpp binary along with unsloth, so that when they install unsloth with `pip` (or `uv`) they get it.

Some people may prefer using whatever llama.cpp in $PATH, it's okay to support that, though I'd say doing so may lead to more confused noob users spam - they may just have an outdated version lurking in $PATH.

Doing so makes unsloth wheel platform-dependent, if this is too much of a burden, then maybe you can just package llama.cpp binary and have it on PyPI, like how scipy guys maintain a https://pypi.org/project/cmake/ on PyPI (yes, you can `pip install cmake`), and then depends on it (maybe in an optional group, I see you already have a lot due to cuda shit).

rfoo commented on Io_uring, kTLS and Rust for zero syscall HTTPS server blog.habets.se/2025/04/io... · Posted by u/guntars

tayo42 · 2 days ago

Refcel didn't work? Or rc?

rfoo · 2 days ago

Slapping Rc<T> over something that could be clearly uniquely owned is a sign of very poorly designed lifetime rules / system.

And yes, for now async Rust is full of unnecessary Arc<T> and is very poorly made.

rfoo commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

seba_dos1 · 3 days ago

And what I said is that all these most visible deployments of Anubis did not deploy it to be a content protection system of any kind, so it doesn't have to work this way at all for them. As long as the server doesn't struggle with load anymore after deploying Anubis, it's a win - and it works so far.

(and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant)

rfoo · 3 days ago

> and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant

Now I get why people are so weirdly being dismissive about the whole thing. Good luck, it's not going to "burst" any time soon.

Or rather, a "burst" would not change the world in the direction you want it to be.

rfoo commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

Almondsetat · 3 days ago

Scrapers are orders of magnitude faster than humans at browsing websites. If the challenge takes 1 second but a human stays on the page for 3 minutes, then it's negligible. But if the challenge takes 1 second and the scraper does ita job in 5 seconds, you already have a 20% slowdown

rfoo · 3 days ago

Scrapers do not care about having a 20% slowdown. All they care is being able to scale up. This does not block any scale up attempt.

rfoo commented on Why are anime catgirls blocking my access to the Linux kernel? lock.cmpxchg8b.com/anubis... · Posted by u/taviso

voidnap · 3 days ago

The proof of work isn't really the crux. They've been pretty clear about this from the beginning.

I'll just quote from their blog post from January.

https://xeiaso.net/blog/2025/anubis/

Anubis also relies on modern web browser features:

- ES6 modules to load the client-side code and the proof-of-work challenge code.

- Web Workers to run the proof-of-work challenge in a separate thread to avoid blocking the UI thread.

- Fetch API to communicate with the Anubis server.

- Web Cryptography API to generate the proof-of-work challenge.

This ensures that browsers are decently modern in order to combat most known scrapers. It's not perfect, but it's a good start.

This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge.

This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now.

rfoo · 3 days ago

... except when you do not crawl with a browser at all. It's so trivial to solve just like the taviso post demostrated.

This makes zero sense, this is simply the wrong approach. Already tired of saying so and been attacked. So I'm glad professional-random-Internet-bullshit-ignorer Tavis Ormandy wrote this one.

rfoo commented on Analysis of the GFW's Unconditional Port 443 Block on August 20, 2025 gfw.report/blog/gfw_uncon... · Posted by u/kotri

rfoo · 4 days ago

Pretty sure it's an incident.

rfoo commented on StarDict sends X11 clipboard to remote servers lwn.net/SubscriberLink/10... · Posted by u/pabs3

avhception · 12 days ago

While I have a lot of respect for the effort that goes into Debian, I always disliked this kind of "maximalism" from the package manager. Oh, the user wants "foo"? Let's install every software that might be even remotely useful somehow in combination with foo! Oh there is a network daemon in there? Fantastic, let's start it immediately!

I know that there is a flag to disable the installation for "recommended" packages. I just think the default is a disservice here.

rfoo · 12 days ago

For me it's my most used super long command line flag.

For a brief moment `--break-system-packages` surpassed it, then I discovered `pip` accepts abbrev flags so `--br` is enough, and sounds like bruh.

rfoo commented on A ChatGPT Pro subscription costs 38.6 months of income in low-income countries policykahani.substack.com... · Posted by u/WasimBhai

the_mitsuhiko · 13 days ago

So a handful of things here. One is that you can actually at the moment at least use a lot of it for free. Secondarily, I think when it comes to access to ChatGPT and other services, in a lot of low-income countries there's a much bigger hurdle than the money, which is a combination of language and device that is capable of connecting to ChatGPT. There are a lot of countries still where you're limited to feature phones or not even having a phone at all.

rfoo · 13 days ago

> There are a lot of countries still where you're limited to feature phones

I thought China fixed this for most of the world, at least for Africa it's fixed. It's the Internet access being the bottleneck now.

rfoo commented on Open models by OpenAI openai.com/open-models/... · Posted by u/lackoftactics

foundry27 · 19 days ago

Model cards, for the people interested in the guts: https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...

In my mind, I’m comparing the model architecture they describe to what the leading open-weights models (Deepseek, Qwen, GLM, Kimi) have been doing. Honestly, it just seems “ok” at a technical level:

- both models use standard Grouped-Query Attention (64 query heads, 8 KV heads). The card talks about how they’ve used an older optimization from GPT3, which is alternating between banded window (sparse, 128 tokens) and fully dense attention patterns. It uses RoPE extended with YaRN (for a 131K context window). So they haven’t been taking advantage of the special-sauce Multi-head Latent Attention from Deepseek, or any of the other similar improvements over GQA.

- both models are standard MoE transformers. The 120B model (116.8B total, 5.1B active) uses 128 experts with Top-4 routing. They’re using some kind of Gated SwiGLU activation, which the card talks about as being "unconventional" because of to clamping and whatever residual connections that implies. Again, not using any of Deepseek’s “shared experts” (for general patterns) + “routed experts” (for specialization) architectural improvements, Qwen’s load-balancing strategies, etc.

- the most interesting thing IMO is probably their quantization solution. They did something to quantize >90% of the model parameters to the MXFP4 format (4.25 bits/parameter) to let the 120B model to fit on a single 80GB GPU, which is pretty cool. But we’ve also got Unsloth with their famous 1.58bit quants :)

All this to say, it seems like even though the training they did for their agentic behavior and reasoning is undoubtedly very good, they’re keeping their actual technical advancements “in their pocket”.

rfoo · 18 days ago

Or, you can say, OpenAI has some real technical advancements on stuff besides attn architecture. GQA8, alternating SWA 128 / full attn do all seem conventional. Basically they are showing us that "no secret sauce in model arch you guys just sucks at mid/post-training", or they want us to believe this.

The model is pretty sparse tho, 32:1.

rfoo commented on From Async/Await to Virtual Threads lucumr.pocoo.org/2025/7/2... · Posted by u/Bogdanp

hardwaresofton · 25 days ago

> Personally I'm more annoyed of async-Rust itself than not having a blessed async solution in-tree. Having to just Arc<T> away things here and there because you can't do thread::scope(f) honestly just demonstrates how stackless coroutine is unreasonably hard to everyone.

Yeah as annoying as this is, I think it actually played out to benefit Rust -- imagine if the churn that we saw in tokio/async-std/smol/etc played out in tree? I think things might ahve been even worse

That said, stackless coroutines are certainly unreasonably hard.

> Back to the original topic, I bring this up because I believe the performance advantages claimed in these "fibers bad" papers are superficial, and the limit is almost the same (think 1.00 vs 1.02 level almost), even in languages which consider raw performance as a selling-point. In case you need the absolutely lowest overhead and latency you usually want the timing to be as deterministic as possible too, and it's not even a given in async-await solutions, you still need to be very careful about that.

Yeah, I don't think this is incorrect, and I'd love to see some numbers on it. The only thing that I can say definitively is that there is overhead to doing the literal stack switch. There's a reason async I/O got us past the C10k problem so handily.

One of the nice things about some recent Zig work was how clearly you can see how they do their stack switch -- literally you can jump in the Zig source code (on a branch IIRC) and just read the ASM for various platforms that represents a user space context switch.

Agree with the deterministic timing thing too -- this is one of the big points that people who only want to use threads (and are against tokio/etc) argue -- the pure control and single-mindedness of a core against a problem is clearly simple and performant. Thread per core is still the top for performance, but IMO the ultimate is async runtime thread per core, because some (important) problems are embarassingly concurrent.

> Let alone Python.

Yeah, I' trying really not to comment much on Python because I'm out of my depth and I think there are...

I mean I'm of the opinion that JS (really TS) is the better scripting language (better bolt-on type systems, got threads faster, never had a GIL, lucked into being async-forward and getting all it's users used to async behavior), but obviously Python is a powerhouse and a crucially important ecosystem (excluding the AI hype).

rfoo · 25 days ago

> The only thing that I can say definitively is that there is overhead to doing the literal stack switch. There's a reason async I/O got us past the C10k problem so handily.

You can also say that not having to constantly allocate & deallocate stuff and rely on a bump allocator (the stack) most of the time more than compensate for the stack switch overhead. Depends on workload of course :p

IMO it's more about memory and nowadays it might just be path dependence. Back in C10k days address spaces were 32-bit (ok 31-bit really), and 2**31 / 10k ~= 210KiB. Makes static-ish stack management really messy. So you really need to extract the (minimal) state explicitly and pack them on heap.

Now we happily run ASAN which allocates 1TiB (2**40) address space during startup for a bitmap of the entire AS (2**48) and nobody complains.