natrys (u/natrys) - Readit News

natrys commented on Emacs as your video-trimming tool xenodium.com/emacs-as-you... · Posted by u/xenodium

shadowgovt · 4 days ago

That's actually the zen of programming emacs: don't avoid using buffers; create as many of them as you need.

"But that's like opening a document every time I want to glue two strings together!"

Not at all. A buffer is just a fancy blob of RAM. It's not file-backed unless you make it file-backed. They do take up RAM, but you're programming on a modern computer, not a PDP-11; if you're comfortable with Python using a whole in-memory object to represent an integer, you're comfortable with buffers.

"But it's messy to leave them lying around."

It's a feature. Yes, buffers aren't well-encapsulated and if your program crashes mid-run they get left open. That's by design. You don't need encapsulation because you're not doing multithreading here (and if you are, there are primitives for that and they take a bit more work to use); emacs is for editing and there's only one you, so if the current program is creating buffers and has no way to run two copies of itself at once, who cares. And your program crashing leaving buffers around is a feature; you can inspect the buffer and see what it looked like at crash-time, or set up the buffers the way you want them before firing off the program to get the desired effect (try those tricks with most languages without slapping on a debugger). And there are scripting blocks to create temp buffers and clean up your buffers for you anyway.

"But it's weird to have two ways to talk about strings in the language!"

That's true; it's a bit weird to have the string primitives and also buffers. But that's a pretty common flavor of weird; Java has strings and also has StringBuilder. My rule of thumb is "any time I'd reach for StringBuilder in Java, I should probably consider using a buffer in elisp."

natrys · 4 days ago

I agree with you, therefore I am pretty sure you meant to reply to the parent I was also replying to.

natrys commented on Emacs as your video-trimming tool xenodium.com/emacs-as-you... · Posted by u/xenodium

rafram · 5 days ago

The flaw in that metaphor is that elisp is a pretty suboptimal programming language for general-purpose programming. The standard library (in my limited experience) seems to use buffers as the base primitive and doesn’t help you very much if you want to do anything complicated without touching the current buffer.

natrys · 5 days ago

Don't really see how the string (and other usual container types) or filesystem APIs are lacking in any significant way compared to stdlibs of other scripting languages.

I also believe that buffer as an abstraction strictly makes many harder things easier, to the point I often wonder about creating a native library based on elisp buffer manipulation APIs alone that could be embedded in other runtimes instead. So the without touching buffer is a premise/constraint I don't quite understand to begin with.

natrys commented on GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models [pdf] arxiv.org/pdf/2508.06471... · Posted by u/SerCe

chvid · 13 days ago

This is a great model for software development - probably the best of the freely available ones.

natrys · 13 days ago

Yep I think it's the best, period. Qwen3-coder perhaps took the limelight but the GLM models perform and behave better in agentic loops. I cannot believe they had gone from a 32B frontend focused GLM-4 to these beasts that can challenge Claude, in a matter of months.

natrys commented on Janet: Lightweight, Expressive, Modern Lisp janet-lang.org... · Posted by u/veqq

girvo · a month ago

PEGs (even outside Janet) are amazing and what I reach for all the time :) definitely one of those tools that’s worth learning!

natrys · a month ago

Yep peg.el[1] is now built-into Emacs since 30.1 (which is how I came to know of it, but actually the library seems much older) and it makes certain things much simpler and faster to do than before (once you figure out its quirks).

[1] https://github.com/emacs-mirror/emacs/blob/master/lisp/progm...

natrys commented on Qwen VLo: From “Understanding” the World to “Depicting” It qwenlm.github.io/blog/qwe... · Posted by u/lnyan

echelon · 2 months ago

Hunyuan Image 2.0, which is of Flux quality but has ~20 milliseconds of inference time, is being withheld.

Hunyuan 3D 2.5, which is an order of magnitude better than Hunyuan 3D 2.1, is also being withheld.

I suspect that now that they feel these models are superior to Western releases in several categories, they no longer have a need to release these weights.

natrys · 2 months ago

> I suspect that now that they feel these models are superior to Western releases in several categories, they no longer have a need to release these weights.

Yes that I can totally believe. Standard corporation behaviour (Chinese or otherwise).

I do think DeepSeek would be an exception to this though. But they lack diversity in focus (not even multimodal yet).

natrys commented on Qwen VLo: From “Understanding” the World to “Depicting” It qwenlm.github.io/blog/qwe... · Posted by u/lnyan

echelon · 2 months ago

The era of open weights from China appears to be over for some reason. It's all of a sudden and seems to be coordinated.

Alibaba just shut off the Qwen releases

Tencent just shut off the Hunyuan releases

Bytedance just released Seedream, but it's closed

It's seems like it's over.

They're still clearly training on Western outputs, though.

I still suspect that the strategic thing to do would be to become 100% open and sell infra/service.

natrys · 2 months ago

> Alibaba just shut off the Qwen releases

Alibaba from beginning had some series of models that are always closed-weights (*-max, *-plus, *-turbo etc. but also QvQ), It's not a new development, nor does it prevent their open models. And the VL models are opened after 2-3 months of GA in API.

> Tencent just shut off the Hunyuan releases

Literally released one today: https://huggingface.co/tencent/Hunyuan-A13B-Instruct

natrys commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

refulgentis · 2 months ago

I don't know what you're talking about, partially because of poor grammar ("you knowledge cut-off does appear") and "presumption" (this was front and center on their API page at r1 release, and its in the r1 update notes). I sort of stopped reading after there because I realized you might be referring to me having a "knowledge cut-off", which is bizarre and also hard to understand, and it's unlikely to be particularly interesting conversation given that and the last volley relied on lots of stuff about tool calling being, inter alia, niche.

natrys · 2 months ago

> you might be referring to me having a "knowledge cut-off"

Don't forget I also referred to you having "hallucination". In retrospect, likening your logical consistency to an LLM was premature, because not even gpt-3.5 era models could pull off a gem like:

> You: to your Q about why no one can compete with DeepSeek R1 25-01 blah blah blah

>> Me: ...why would you presume I was talking about 25-01 when 28-05 exists and you even seem to know it?

>>> You: this was front and center on their API page!

Riveting stuff. Few more digs about poor grammar and how many times you stopped reading, and you might even sell the misdirection.

natrys commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

refulgentis · 2 months ago

> I am struggling to connect the relevance of this

> focusing on a single specific capability and

> I am not really invested in this niche topic

Right: I definitely ceded a "but it doesn't matter to me!" argument in my comment.

I sense a little "doth protest too much", in the multiple paragraphs devoted to taking that and extending it to the underpinning of automation is "irrelevant" "single" "specific", "niche".

This would also be news to DeepSeek, who put a lot of work to launch it in the r1 update a couple weeks back.

Separately, I assure you, it would be news to anyone on the Gemini team that they don't care because they want to own everything. I passed this along via DM and got "I wish :)" in return - there's been a fire drill trying to improve it via AIDER in the short term, is my understanding.

If we ignore that, and posit there is an upper management conspiracy to suppress performance, its just getting public cover by a lower upper management rush to improve scores...I guess that's possible.

Finally, one of my favorite quotes is "when faced with a contradiction, first check your premises" - to your Q about why no one can compete with DeepSeek R1 25-01, I'd humbly suggest you may be undergeneralizing, given even tool calls are "irrelevant" and "niche" to you.

natrys · 2 months ago

Interesting presumption about R1 25-01 being what's talked about, you knowledge cut-off does appear to know R1 update two weeks back was a thing, and that it even improved on function calling.

Of course you have to pretend I meant the former, otherwise "they all have" doesn't entirely make sense. Not that it made total sense before either, but if I say your definition of "they" is laughably narrow, I suspect you will go back to your google contact and confirm that nothing else really exists outside it.

Oh and do a ctrl-f on "irrelevant" please, perhaps some fact grounding is in order. There was an interesting conversation to be had about underpinning of automation somehow without intelligence (Llama 4) but who has time for that if we can have hallucination go hand in hand with forced agendas (free disclaimer to boot) and projection ("doth protest too much")? Truly unforeseeable.

natrys commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

refulgentis · 2 months ago

They all have. I don't hope to convince you of that, everyones use case differs. Generally, AIME / prose / code benchmarks that don't involve successive tool calls are used to hide some very dark realities.

IMHO tool calling is by far the most clearly economically valuable function for an LLM, and r1 self-admittedly just...couldn't do it.

There's a lot of puff out there that's just completely misaligned with reality, ex. Gemini 2.5 Pro is by far the worst tool caller, Gemini 2.5 Flash thinking is better, 2.5 Flash is even better. And either Llama 4 beats all Gemini 2.5s except 2.5 Flash not thinking.

I'm all for "these differences will net out in the long run", Google's at least figured out how to micro optimize for Aider edit formatting without tools. Over the last 3 months, they're up 10% on edit performance. But it's horrible UX to have these specially formatted code blocks in the middle of prose. They desperately need to clean up their absurd tool-calling system. But I've been saying that for a year now. And they don't take it seriously, at all. One of their most visible leads tweeted "hey what are the best edit formats?" and a day later is tweeting the official guide for doing edits. I'm a Xoogler and that absolutely reeks of BigCo dysfunction - someone realized a problem 2 months after release and now we have "fixed" it without training, and now that's the right way to do things. Because if it isn't, well, what would we do? Shrugs

I'm also unsure how much longer it's worth giving a pass on this stuff. Everyone is competing on agentic stuff because that's the golden goose, real automation, and that needs tools. It would be utterly unsurprising to me for Google to keep missing a pain signal on this, vis a vis Anthropic, which doubled down on it mid-2024.

As long as I'm dumping info, BFCL is not a good proxy for this quality. Think "converts prose to JSON" not "file reading and editing"

natrys · 2 months ago

I don't mind the info dump, but I am struggling to connect the relevance of this to topic at hand. I mean, focusing on a single specific capability and generalising it to mean "they all have" caught up with DeepSeek all across the board (which was the original topic) is a reductive and wild take. Especially when it seems to me that this seems more because of misaligned incentive than because it's truly a hard problem.

I am not really invested in this niche topic but I will observe that, yes I agree Llama 4 is really good here. And yet it's a far worse coder, far less intelligent than DeepSeek and that's not even arguable. So no it didn't "catch up" any more than what you could say by pointing out Llama is multimodal but DeepSeek isn't. That's just talking about a different things entirely.

Regardless, I do agree BFCL is not the best measure either, the Tau-bench is more real world relevant. But end of the day, most frontier labs are not incentive aligned to care about this. Meta cares because this is something Zuck personally cares about, Llama models are actually for small businesses solving grunt automation, not for random people coding at home. People like Salesforce care (xLAM), even China had GLM before DeepSeek was a thing. DeepSeek might care so long as it looks good for coding benchmarks, but that's pretty much the extent of it.

And I suspect Google doesn't truly care because in the long run they want to build everything themselves. They already have a CodeAssist product around coding which likely uses fine-tune of their mainline Gemini models to do something even more specific to their plugin.

There is a possibility that at the frontier, models are struggling to be better in a specific and constrained way, without getting worse at other things. It's either this, or even Anthropic has gone rogue because their Aider scores are way down now from before. How does that make sense if they are supposed to be all around better at agentic stuff in tool agnostic way? Then you realise they now have Claude Coder and it just makes way more economic sense to tie yourself to that, be context inefficient to your heart's content so that you can burn tokens instead of being, you know, just generally better.

natrys commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

adventured · 2 months ago

It's because DeepSeek was a fast copy. That was the easy part and it's why they didn't have to use so much compute to get near the top. Going well beyond o3 or 2.5 Pro is drastically more expensive than fast copy. China's cultural approach to building substantial things produces this sort of outcome regularly, you see the same approach in automobiles, planes, Internet services, industrial machinery, military, et al. Innovation is very expensive and time consuming, fast copy is more often very inexpensive and rapid. 85% good enough is often good enough, that additional 10-15% is comically expensive and difficult as you climb.

natrys · 2 months ago

Not disagreeing with the overarching point but:

> That was the easy part

Is a bit hand-wavy in that it doesn't explain why it's only DeepSeek who can do this "easy" thing, but still not Meta, Mistral or anyone else really. There are many other players who have way more compute than DeepSeek (even inside China, not even considering rest of the world), and I can assure you more or less everyone trains on synthetic data/distillation from whatever bigger model they can access.