huevosabio (u/huevosabio)

huevosabio commented on Ask HN: Relatively SoTA LLM Agents from Scratch? · Posted by u/solsane

huevosabio · 3 days ago

The Olmo team is AFAIK the only SOTA-ish model that has fully open source code and data. Their report is fantastic: https://www.datocms-assets.com/64837/1763662397-1763646865-o...

It should give you an idea of how hard it is to do a SOTA model from scratch!

If you relax the SOTA aspect, Karpathy's nanochat has you covered: https://github.com/karpathy/nanochat

huevosabio commented on Getting a Gemini API key is an exercise in frustration ankursethi.com/blog/gemin... · Posted by u/speckx

huevosabio · 4 days ago

This nonsense alone justifies the existence of OpenRouter.

huevosabio commented on If you're going to vibe code, why not do it in C? stephenramsay.net/posts/v... · Posted by u/sramsay

huevosabio · 5 days ago

> Wouldn’t a language designed for vibe coding naturally dispense with much of what is convenient and ergonomic for humans in favor of what is convenient and ergonomic for machines? Why not have it just write C? Or hell, why not x86 assembly?

In the game we're building we generate, compile and run code (C#) in real time to let the player "train and command" its monster in creative ways. So, I've thought about this.

You need both a popular language and one that has a ton of built-in verifying tools.

The author correctly highlights the former, but dismisses the latter as being targeted to humans. I think it is even more important for LLMs!

These coding agents are excellent at generating plausible solutions, but they have no guarantees whatsoever. So you need to pair them with a verifying system. This can be unit tests, integration tests, static / type checks, formal methods, etc. The point is that if you don't have these "verifier" systems you are creating an open loop and your code will quickly devolve to nonsense [0].

In my view, the best existing languages for vibe coding are: - Rust: reasonably popular, very powerful and strict type system, excellent compiler error messages. If it compiles you can be confident that a whole class of errors won't exist in your program. Best for "serious" programs, but probably requires more back and forths with the coding agent. - TypeScript: extremely popular, powerful type system, ubiquitous. Best for rapid iteration. - Luau: acceptably popular, but typed and embeddable. Best as a real-time scripting sandbox for LLMs (like our use case).

I think there is space for a "Vibe-Oriented Programming" language (VOP as the author says), but I think it will require the dust to settle a bit on the LLM capabilities to understand how much can we sacrifice from the language's lack of popularity (since its new!) and the verifiability that we should endow it with. My bet is that something like AssemblyScript would be the way to go, ie, something very, very similar to an existing, typed popular language (TS) but with extra features that serve the VOP needs.

Another aspect to consider besides verifiability is being able to incrementally analyze code. For structured outputs, we can generate guaranteed structures thanks to grammar-based sampling. There are papers studying how to use LSPs to guide LLM outputs at the token level [1] . We can imagine analyzers that also provide context as needed based on what the LLM is doing, for example there was this recent project that could trace all upstream and downstream information flow in a program thanks to Rust's ownership features [2].

Finally, the importance of a LLM-coding friendly sandbox will only increase: we already are seeing Anthropic move towards using LLMs to generate script as a way to make tool calls rather than calling tools directly. And we know that verifiable outputs are easier to hillclimb. So coding will get increasingly better and probably mediate everything these agents do. I think this is why Anthropic bought Bun.

[0] very much in the spirit of the LLM-Modulo framework: https://arxiv.org/pdf/2402.01817 [1] https://proceedings.neurips.cc/paper_files/paper/2023/file/6... [2] https://cel.cs.brown.edu/paper/modular-information-flow-owne...

huevosabio commented on Implications of AI to schools twitter.com/karpathy/stat... · Posted by u/bilsbie

raincole · 20 days ago

I think this kind of approach is the root of (the US's) hustle culture. Instead of receiving a fair score, you get a zero and need to "hustle" and challenge your teacher.

The teacher effectively filtered out the shy boys/girls who are not brave enough to "hustle." Gracefully.

huevosabio · 19 days ago

Nah, the professor wasn't American (as is often the case) and she had a tricky situation. She had strong reasons to believe people were cheating and had to sort out who did and who did not in a swift way.

This has nothing to do with American Hustle culture and just with that professor's judgment.

huevosabio commented on Implications of AI to schools twitter.com/karpathy/stat... · Posted by u/bilsbie

ubj · 20 days ago

One of my students recently came to me with an interesting dilemma. His sister had written (without AI tools) an essay for another class, and her teacher told her that an "AI detection tool" had classified it as having been written by AI with "100% confidence". He was going to give her a zero on the assignment.

Putting aside the ludicrous confidence score, the student's question was: how could his sister convince the teacher she had actually written the essay herself? My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material. It's a dilemma that an increasing number of honest students will face, unfortunately.

huevosabio · 20 days ago

When I was in college, there was a cheating scandal for the final exam where somehow people got their hands on the hardest question of the exam.

The professor noticed it (presumably via seeing poor "show your work") and gave zero points on the question to everyone. And once you went to complain about your grade, she would ask you to explain the answer there in her office and work through the problem live.

I thought it was a clever and graceful way to deal with it.

huevosabio commented on WorldGen – Text to Immersive 3D Worlds meta.com/en-gb/blog/world... · Posted by u/smusamashah

huevosabio · 22 days ago

This is cool, but it seems much more like a 3d asset generation than the scene generation like World Labs.

huevosabio commented on Roblox CEO interview about child safety didn't go well kotaku.com/roblox-new-yor... · Posted by u/tobr

noitpmeder · 23 days ago

To say nothing of the Roblox situation, anyone else having a hard time reading this piece of "reporting"?

It reads, to me, as so obviously slanted and opinionated against Roblox from the outset. It's not trying to portray facts, it's clearly trying to make the reader interpret the situation in an anti-roblox light, instead of letting the reader arrive there on their own.

huevosabio · 23 days ago

Yes, I closed it immidiately as it had the tone of a tabloid.

huevosabio commented on The Baumol Effect and Jevons paradox are related a16z.news/p/why-ac-is-che... · Posted by u/cubefox

CGMthrowaway · a month ago

Going back to econ 101 & supply/demand curves:

Jevons describes the supply curve moving out, resulting in increased quantity

Baumol describes the supply curve moving back, resulting in higher prices

huevosabio · a month ago

Yes, and in while Jevons is obvious why (efficiency changes the supply curve), Baumol is less apparent because the cause is more indirect.

huevosabio commented on The Baumol Effect and Jevons paradox are related a16z.news/p/why-ac-is-che... · Posted by u/cubefox

parpfish · a month ago

so... where does the money go? is it insurance or have duplo block prices just gotten really out of hand?

huevosabio · a month ago

Landowners!

huevosabio commented on The Baumol Effect and Jevons paradox are related a16z.news/p/why-ac-is-che... · Posted by u/cubefox

huevosabio · a month ago

```

Each of these phenomena have a name: there’s Jevons Paradox, which means, “We’ll spend more on what gets more productive”, and there’s the Baumol Effect, which means, “We’ll spend more on what doesn’t get more productive.”

```

I don't think that's exactly right. Jevons says "we consume more on what gets more productive" and Baumol says "the unit cost increases for that which is less productive".

The typical example for Baumol is the orchestra (or live music) which is today much more expensive than in the 1800s. I don't think we spend more in aggregate than we did in the 1800s!

Edit as I continue reading: ```

Other goods and services, where AI has relatively less impact, will become more expensive - and we’ll consume more of them anyway. ```

This definitely NOT the case. Basically the author is saying we will consume more of everything, which is not true! We famously stopped using horses and all the relevant industries.

The unit cost for horses, however, did increase!

What the author should be stating is that the new production bottlenecks will command a higher price and probably play a bigger role in the economy, but not everything gets to be a new bottleneck.