Readit News logoReadit News
runeblaze commented on LLM Structured Outputs Handbook   nanonets.com/cookbooks/st... · Posted by u/vitaelabitur
libraryofbabel · a month ago
Question for the well-informed people reading this thread: do SoTA models like Opus, Gemini and friends actually need output schema enforcement still, or has all the the RLVR training they do on generating code and json etc. made schema errors vanishingly unlikely? Because as a user of those models, they almost never make syntax mistakes in generating json and code; perhaps they still do output schema enforcement for "internal" things like tool call schemas though? I would just be surprised if it was actually catching that many errors. Maybe once in a while; LLMs are probabilistic after all.

(I get why you need structured generation for smaller LLMs, that makes sense.)

runeblaze · a month ago
Schemas can get pretty complex (and LLMs might not be the best at counting). Also schemas are sometimes the first way to guard against the stochasticity of LLMs.

With that said, the model is pretty good at it.

runeblaze commented on “Erdos problem #728 was solved more or less autonomously by AI”   mathstodon.xyz/@tao/11585... · Posted by u/cod1r
jjmarr · a month ago
Post-training doesn't transfer over when a new base model arrives so anyone who adopted a task-specific LLM gets burned when a new generational advance comes out.
runeblaze · a month ago
Resouce-affording, if you are chasing the frontier of some more niche task you redo your training regime on the new-gen LLMs
runeblaze commented on “Erdos problem #728 was solved more or less autonomously by AI”   mathstodon.xyz/@tao/11585... · Posted by u/cod1r
jjmarr · a month ago
Seeing a task-specific model be consistently better at anything is extremely surprising given rapid innovation in foundation models.

Have you tried Aristotle on other, non-Lean tasks? Is it better at logical reasoning in general?

runeblaze · a month ago
Is it though? There is a reason gpt has codex variants. RL on a specific task raises the performance on that task
runeblaze commented on Tesla sales fell by 9 percent in 2025, its second yearly decline   arstechnica.com/cars/2026... · Posted by u/rbanffy
Analemma_ · a month ago
Obviously we're just dueling anecdotes here, but FWIW, I'm a US tech worker who bought a Tesla in 2022 and certainly never will again. I have four friends with Teslas in tech and all of them say the same thing: never again. Replacement cycles for cars are so long that this will take a while to fully show up in the data, but I don't see growth anywhere in their future, especially when BYD is eating their lunch in seemingly every non-US market.
runeblaze · a month ago
Sure never again is totally fair and I am sure a lot of people hate it. I was mostly objecting to the radioactivity of it. Your friends will be more like “I am looking to sell my Tesla in 3 months” if it is truly radioactive.

Let’s be realistic in our portrayal here.

runeblaze commented on Tesla sales fell by 9 percent in 2025, its second yearly decline   arstechnica.com/cars/2026... · Posted by u/rbanffy
bpt3 · a month ago
They are considered radioactive by their primary target audience in the US, have been surpassed in various ways outside the US, seem to be focused on a few boondoggles internally rather than fixing what is broken in their core business, and their CEO has been distracted by other ventures.

I expect this decline to continue indefinitely. I also wonder when the stock price will reflect the company's past and projected results.

runeblaze · a month ago
I think radioactive is a strong word here… I have talked to a lot of people in tech
runeblaze commented on Critical vulnerability in LangChain – CVE-2025-68664   cyata.ai/blog/langgrinch-... · Posted by u/shahartal
avaer · 2 months ago
I somewhat take issue as a LangChain hater + Mastra lover with 20+ years of coding experience and coding awards to my name (which I don't care about, I only mention it for context).

Langchain is `left-pad` -- a big waste of your time, and Mastra is Next.js -- mostly saving you infrastructure boilerplate if you use it right.

But I think the primary difference is that Python is a very bad language for agent/LLM stuff (e.g. static typesystem, streaming, isomorphic code, strong package management ecosystem is what you want, all of which Python is bad with). And if for some ungodly reason you had to do it in Python, you'd avoid LangChain anyway so you could bolt on strong shim layers to fix Python's shortcomings in a way that won't break when you upgrade packages.

Yes, I know there's LangChain.js. But at that point you might as well use something that isn't a port from Python.

> what would you say indicates a high quality candidate when they are discussing agent harnessing and orchestration?

Anything that shows they understand exactly how data flows through the system (because at some point you're gonna be debugging it). You can even do that with LangChain, but then all you'd be doing is complaining about LangChain.

runeblaze · 2 months ago
> And if for some ungodly reason you had to do it in Python

I literally invoke sglang and vllm in Python. You are supposed to (if not using them over-the-network) use the two fastest inference engines there is via Python.

runeblaze commented on Yann LeCun to depart Meta and launch AI startup focused on 'world models'   nasdaq.com/articles/metas... · Posted by u/MindBreaker2605
rapsey · 3 months ago
Yann was never a good fit for Meta.
runeblaze · 3 months ago
Agreed, I am surprised he is happy to stay this long. He would have been on paper a far better match at a place like pre-Gemini-era Google
runeblaze commented on Using Generative AI in Content Production   partnerhelp.netflixstudio... · Posted by u/CaRDiaK
goatsi · 3 months ago
If you ask the Adobe stock image generation for "Adventurer with a whip and hat portrait view , Brown leather hat, jacket, close-up"

It gives you an image of Harrison Ford dressed like Indiana Jones.

https://stock.adobe.com/ca/images/adventurer-with-a-whip-and...

runeblaze · 3 months ago
I don't know the data distribution, but are you sure that's generated by an Adobe model? I can only see that it is in Stock + it is tagged as AI generated (that is, was that image generated by some other model?)

Disclaimer: I used to work at Adobe GenAI. Opinions are of my own ofc.

runeblaze commented on Using Generative AI in Content Production   partnerhelp.netflixstudio... · Posted by u/CaRDiaK
kpw94 · 3 months ago
> one can absolutely check the text to remove all occurrences of Indiana Jones

How do you handle this kind of prompt:

“Generate an image of a daring, whip-wielding archaeologist and adventurer, wearing a fedora hat and leather jacket. Here's some back-story about him: With a sharp wit and a knack for languages, he travels the globe in search of ancient artifacts, often racing against rival treasure hunters and battling supernatural forces. His adventures are filled with narrow escapes, booby traps, and encounters with historical and mythical relics. He’s equally at home in a university lecture hall as he is in a jungle temple or a desert ruin, blending academic expertise with fearless action. His journey is as much about uncovering history’s secrets as it is about confronting his own fears and personal demons.”

Try copy-pasting it in any image generation model. It looks awfully like Indiana Jones for all my attempts, yet I've not referenced Indiana Jones even once!

runeblaze · 3 months ago
Emmmm sure, but throw this to a human artist who has not heard of Indiana Jones and see if they draw something alike.

u/runeblaze

KarmaCake day324September 20, 2016
About
Yet another medium-sized creature prone to great ambition.

Applied type theorist gone haywire for GenAI-ish things. Previously a game-dev. https://runeblaze.github.io/

Reach me at `me ~at~ baqiaoliu.com`.

View Original