Readit News logoReadit News
nbardy commented on A quarter of US-trained scientists eventually leave   arxiv.org/abs/2512.11146... · Posted by u/bikenaga
chazeon · a day ago
Well that’s a very misleading thing. If the US immigration policy wasn’t this hostile to populous countries, more Chinese will want to stay.
nbardy · a day ago
When are people going to drop the immigration is good at all costs assumption.

We need a well managed set of immigration polices or country WILL take advantage of US. These are our military rivals and we sell our most advanced math, physics and engineering seats to the highest bidder. It’s a self districting disaster and it’s not just on us to treat people better.

Look at the rate of Indian asylum seekers in Canada to see the most extreme case. It happens anywhere you extend naivety and boundless good will.

nbardy commented on GPT-5.2   openai.com/index/introduc... · Posted by u/atgctg
nbardy · 5 days ago
Those arc agi 2 improvements are insane.

Thats especially encouraging to me because those are all about generalization.

5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.

It’s one of those things you really feel in the model rather than whether it can tackle a harder problem or not, but rather can I go back and forth with this thing learning and correcting together.

This whole releases is insanely optimistic for me. If they can push this much improvement WITHOUT the new huge data centers and without a new scaled base model. Thats incredibly encouraging for what comes next.

Remember the next big data center are 20-30x the chip count and 6-8x the efficiency on the new chip.

I expect they can saturate the benchmarks WITHOUT and novel research and algorithmic gains. But at this point it’s clear they’re capable of pushing research qualitatively as well.

nbardy commented on Anthropic taps IPO lawyers as it races OpenAI to go public   ft.com/content/3254fa30-5... · Posted by u/GeorgeWoff25
nbardy · 14 days ago
You haven’t actually looked at their fundamentals. They’re profitable serving current models including training costs and are only losing money on future RD training, but if you project future revenue growth on future generations of models you get a clear path to profitability.

They charge higher costs than OpenAI and have faster growing API demand. They have great margins compared to the rest of the industry on inference.

Sure the revenue growth could stop but it hasn’t and there is no reason to think it will.

nbardy commented on Anthropic taps IPO lawyers as it races OpenAI to go public   ft.com/content/3254fa30-5... · Posted by u/GeorgeWoff25
HarHarVeryFunny · 14 days ago
It's interesting that Amazon don't appear interested in acquiring Anthropic, which would have seemed like somewhat of a natural fit given that they are already partnered, Anthropic have apparently optimized (or at least adapted) for Trainium, and Amazon don't have their own frontier model.

It seems that Amazon are playing this much like Microsoft - seeing themselves are more of a cloud provider, happy to serve anyone's models, and perhaps only putting a moderate effort into building their own models (which they'll be happy to serve to those who want that capability/price point).

I don't see the pure "AI" plays like OpenAI and Anthropic able to survive as independent companies when they are competing against the likes of Google, and with Microsoft and Amazon happy to serve whatever future model comes along.

nbardy · 14 days ago
Why are you assuming Anthropic is for sale? They have a clear path to profitability, booming growth, and a massive and mission driven founding team.

They could make more money keeping control of the company and have control.

nbardy commented on OpenAI declares 'code red' as Google catches up in AI race   theverge.com/news/836212/... · Posted by u/goplayoutside
RossBencina · 14 days ago
The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

nbardy · 14 days ago
This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

nbardy commented on TiDAR: Think in Diffusion, Talk in Autoregression   arxiv.org/abs/2511.08923... · Posted by u/internetguy
euleriancon · 25 days ago
Diffusion LMs do seem to be able to get more out of the same data. In a world where we are already training transformer based LLMs on all text available, diffusion LMs ability to continue learning on a fixed set of data may be able to outperform transformers

https://arxiv.org/abs/2511.03276

nbardy · 25 days ago
There’s another paper that shows you can get the same effect by training auto regression on Fill in the middle data.

So it’s more about the mask modeling objective than Diffusion.

nbardy commented on AI Exponentializes Your Tech Debt   vincentschmalbach.com/ai-... · Posted by u/vincent_s
nbardy · 25 days ago
This is so not a problem.

After a long stretch of I just ask codex to: Audit the code base and event changes and describe the data model and all related and possibly overlapping functions.

Plan a new redesign that is simpler, has less code, more reuse and cleaner design,

Execute the refactor Review the code and asses new code and reaudit

… repeat

You can queue these up in codex and it will just go about its way reducing your tech debt way faster than an engineer could.

nbardy commented on FAWK: LLMs can write a language interpreter   martin.janiczek.cz/2025/1... · Posted by u/todsacerdoti
nbardy · a month ago
They have been able to write languages for two years now.

I think I was the first to write an LLM language and first to use LLMs to write a language with this project. (Right at ChatGPT launch, gpt-3.5 https://github.com/nbardy/SynesthesiaLisp

nbardy commented on Building more with GPT-5.1-Codex-Max   openai.com/index/gpt-5-1-... · Posted by u/hansonw
boole1854 · a month ago
Today I did some comparisons of GPT-5.1-Codex-Max (on high) in the Codex CLI versus Gemini 3 Pro in the Gemini CLI.

- As a general observation, Gemini is less easy to work with as a collaborator. If I ask the same question to both models, Codex will answer the question. Gemini will read some intention behind the question, write code to implement the intention, and only then answer the question. In one case, it took me five rounds of repeatedly rewriting my prompt in various ways before I could get it to not code but just answer the question.

- Subjectively, it seemed to me that the code that Gemini wrote was more similar to code that I, as a senior-level developer, would have written than what I have been used to from recent iterations of GPT-5.1. The code seemed more readable-by-default and not merely technically correct. I was happy to see this.

- Gemini seems to have a tendency to put its "internal dialogue" into comments. For example, "// Here we will do X because of reason Y. Wait, the plan calls for Z instead. Ok, we'll do Z.". Very annoying.

I did two concrete head-to-head comparisons where both models had the same code and the same prompt.

First, both models were told to take a high-level overview of some new functionality that we needed and were told to create a detailed plan for implementing it. Both models' plans were then reviewed by me and also by both models (in fresh conversations). All three of us agreed that Codex's plan was better. In particular, Codex was better at being more comprehensive and at understanding how to integrate the new functionality more naturally into the existing code.

Then (in fresh conversations), both models were told to implement that plan. Afterwards, again, all three of us compared the resulting solutions. And, again, all three of us agreed that Codex's implementation was better.

Notably, Gemini (1) hallucinated database column names, (2) ignored parts of the functionality that the plan called for, and (3) did not produce code that was integrated as well with the existing codebase. In its favor, it did produce a better version of a particular finance-related calculation function than Codex did.

Overall, Codex was the clear winner today. Hallucinations and ignored requirements are big problems that are very annoying to deal with when they happen. Additionally, Gemini's tendencies to include odd comments and to jump past the discussion phase of projects both make it more frustrating to work with, at this stage.

nbardy · a month ago
Yea, I can't get gemini to stop and think, even if I tell it to not write code it will rewrite the code block each time
nbardy commented on Gemini 3   blog.google/products/gemi... · Posted by u/preek
stephc_int13 · a month ago
What I would do if I was in the position of a large company in this space is to arrange an internal team to create an ARC replica, covering very similar puzzles and use that as part of the training.

Ultimately, most benchmarks can be gamed and their real utility is thus short-lived.

But I think this is also fair to use any means to beat it.

nbardy · a month ago
This isn’t gaming the benchmark though. If training on similar data generalizes that’s called learning. Training on the exact set is memorization.

There is for a fact teams creating puzzles to RL against as training environments. As it’s beneficial to RL training and in particular compute efficient if you schedule the environment difficulty throughout training. There was a great recent paper on this. Creating environment data that generalizes outside the environment is a challenging engineering task and super valuable whether it looks like AGC AGI or not.

Also ARC AGI is general enough that if you create similar data you’re just creating generic visual puzzle data. Should all visual puzzle data be off limits ?

u/nbardy

KarmaCake day728March 28, 2014
About
nbardy @ GitHub

Multimodal models

Language models for art

View Original