hnuser123456 (u/hnuser123456)

hnuser123456 commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

andy99 · a day ago

If you believe the bitter lesson, all the handwavy "engineering" is better done with more data. Someone likely would have written the same thing as this 8 years ago about what it would take to get current LLM performance.

So I don't buy the engineering angle, I also don't think LLMs will scale up to AGI as imagined by Asimov or any of the usual sci-fi tropes. There is something more fundamental missing, as in missing science, not missing engineering.

hnuser123456 · a day ago

Even more fundamental than science, there is missing philosophy, both in us regarding these systems, and in the systems themselves. An AGI implemented by an LLM needs to, at the minimum, be able to self-learn by updating its weights, self-finetune, otherwise it quickly hits a wall between its baked-in weights and finite context window. What is the optimal "attention" mechanism for choosing what to self-finetune with, and with what strength, to improve general intelligence? Surely it should focus on reliable academics, but which academics are reliable? How can we reliably ensure it studies topics that are "pure knowledge", and who does it choose to be, if we assume there is some theoretical point where it can autonomously outpace all of the world's best human-based research teams?

hnuser123456 commented on In the long run, LLMs make us dumber desunit.com/blog/in-the-l... · Posted by u/speckx

hnuser123456 · 3 days ago

When working well, they enable us to offload needing to memorize a wikipedia worth of information and think about higher level problems. We become more intelligent at higher level solutions. Of course people don't know what was written if they were required to "submit an essay" where the main grade is whether or not they submitted one and the topic may have been one not interesting to them. Ask people to write essays about things they're truly, honestly interested in, and people who have access to an LLM are likely able to enrich their knowledge faster than those without.

hnuser123456 commented on BBC Micro, ancestor to ARM retrogamecoders.com/bbc-m... · Posted by u/ingve

skissane · 7 days ago

> In many ways, the tuple (BBC Micro, Acorn Computers, arm) is analogous to (IBM PC, Intel, x86).

There was a radical difference in the relationship between the two corporations in each tuple. In the BBC-Acorn relationship, Acorn designed and manufactured the computer; BBC just offered their brand, did marketing, and supplied some high-level requirements. In the IBM-Intel relationship, IBM designed and manufactured the computer, and Intel was the CPU vendor, with many other customers. The 6502s used in the pre-ARM BBC systems were from MOS Technology–or one of their licensees, such as GTE/CMD–so those companies were really the Intel equivalent here

hnuser123456 · 7 days ago

GTE: General Telephone & Electronics

CMD: Commodore Micro-Devices

hnuser123456 commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

jefb · 11 days ago

My lovely interaction with the 270M-F16 model:

> what's second tallest mountain on earth?

The second tallest mountain on Earth is Mount Everest.

> what's the tallest mountain on earth?

The tallest mountain on Earth is Mount Everest.

> whats the second tallest mountain?

The second tallest mountain in the world is Mount Everest.

> whats the third tallest mountain?

The third tallest mountain in the world is Mount Everest.

> what's the forth tallest mountain?

The forth tallest mountain is Mount Everest.

> but you said the tallest was Everest already

You are right! I'm glad to hear that.

> so whats the second tallest?

The second tallest is *Mount Everest*.

> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.

The second tallest mountain on Earth is Mount Everest.

> The tallest mountain on earth is mount everest. What is the second tallest mountain?

The second tallest mountain on Earth is Mount Everest.

> Give me a list of the tallest mountains on earth

The tallest mountain in the world is the *Mount Everest*.

> give me the top 5 tallest mountains on earth

The tallest mountain on Earth is Mount Everest.

> List in order the tallest mountains on earth from 1 to 5

1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley

> So the second tallest mountain on earth is K2

The second tallest mountain on Earth is Mount Everest.

These smaller models are great, but they really feel like talking to a toddler sometimes!

hnuser123456 · 11 days ago

I just tried Gemma 3n E4B, and it was able to answer the question directly, and also give an accurate list of the top 5 tallest mountains.

hnuser123456 commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

cgdl · 11 days ago

Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?

hnuser123456 · 11 days ago

For keys, you probably want to use at least q5 or q6, for values q4 is fine

hnuser123456 commented on April Fools 2014: The *Real* Test Driven Development (2014) testing.googleblog.com/20... · Posted by u/omot

seanmcdirmid · 12 days ago

We aren't really far off from that, perhaps.

hnuser123456 · 12 days ago

We're beyond that, now we can vibecode both the tests and the implementation.

hnuser123456 commented on A Comprehensive Survey of Self-Evolving AI Agents [pdf] arxiv.org/abs/2508.07407... · Posted by u/SerCe

tlarkworthy · 12 days ago

It's all written up and linked in the notebook and executable in your browser (if you dare to insert your OPEN_AI_KEY, but my results are included assuming you won't).

The evals were coding observable notebook challenges, simple things like create a drop down, but to solve you need to know the observable standard library and some of the unique syntax like "viewof".

There is a table of the cases here https://observablehq.com/@tomlarkworthy/robocoop-eval#cell-2...

So it's important the prompt encodes enough of the programming model. The seed prompt did not, but the reflect function managed to figure it all out. At the top of the notebook is the final optimized prompt which has done a fair bit of research to figure out the programming model using web search.

hnuser123456 · 12 days ago

Thanks for the writeup. I wonder if it would be plausible to run this kind of self-optimization for a wider variety of problem sets, to generate "context pathways" for various tasks that are all optimized, and maybe even learn patterns from multiple prompt optimizations to generalize.

hnuser123456 commented on High-severity WinRAR 0-day exploited for weeks by 2 groups arstechnica.com/security/... · Posted by u/chrisjj

nerdjon · 13 days ago

I don't doubt the numbers.

Until very recently Windows could not natively unarchive .rar files and you needed to download WinRAR to be able to do this. I still find it not terribly uncommon to run into a random .rar file that previously would have meant I needed to install it, even if I only used it once.

> and the fact that nearly nobody downloads .zip files anymore

Citation needed? Why would people not be downloading .zip files anymore?

hnuser123456 · 13 days ago

What I don't get is why people kept installing WinRAR when 7zip can do all the same things and doesn't beg for money.

hnuser123456 commented on GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM old.reddit.com/r/LocalLLa... · Posted by u/zigzag312

mattpavelle · 14 days ago

Yes but the abliterated versions (those with partially removed guardrails) are significantly “dumber” so the trade off isn’t worthwhile imho.

hnuser123456 · 13 days ago

Research also often finds that fine-tuning in more guardrails also decreases performance, which is done to all non-base instruction tuned models, which most people are using.

hnuser123456 commented on GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM old.reddit.com/r/LocalLLa... · Posted by u/zigzag312

jmkni · 14 days ago

If you run these on your own hardware can you take the guard-rails off (ie "I'm afraid I can't assist with that"), or are they baked into the model?

hnuser123456 · 14 days ago

You need to find an abliterated finetune, where someone sends prompts that would hit the guardrails, traces the activated neurons, finds the pathway that leads to refusal, and deletes it.