Readit News logoReadit News
jumpCastle commented on OpenAI o3 and o4-mini   openai.com/index/introduc... · Posted by u/maheshrijal
gallerdude · 4 months ago
For coding, I like the Aider polyglot benchmark, since it covers multiple programming languages.

Gemini 2.5 Pro got 72.9%

o3 high gets 81.3%, o4-mini high gets 68.9%

jumpCastle · 4 months ago
It was a good benchmark until it entered the training set.
jumpCastle commented on OpenAI o3 and o4-mini   openai.com/index/introduc... · Posted by u/maheshrijal
ipsum2 · 4 months ago
Looks like a Claude Code clone.
jumpCastle · 4 months ago
But open source like aider
jumpCastle commented on The Llama 4 herd   ai.meta.com/blog/llama-4-... · Posted by u/georgehill
nattaylor · 5 months ago
Is pre-training in FP8 new?

Also, 10M input token context is insane!

EDIT: https://huggingface.co/meta-llama/Llama-3.1-405B is BF16 so yes, it seems training in FP8 is new.

jumpCastle · 5 months ago
Deepseek v3 was FP8
jumpCastle commented on The Llama 4 herd   ai.meta.com/blog/llama-4-... · Posted by u/georgehill
highfrequency · 5 months ago
Crazy that there are now five and a half companies that all have roughly state of the art LLMs.

> We developed a new training technique which we refer to as MetaP that allows us to reliably set critical model hyper-parameters such as per-layer learning rates and initialization scales. We found that chosen hyper-parameters transfer well across different values of batch size, model width, depth, and training tokens.

This sounds interesting. Anyone have a link to the paper or other documentation on MetaP?

jumpCastle · 5 months ago
It's quite similar to muP

https://github.com/microsoft/mup

jumpCastle commented on Humans have caused 1.5 °C of long-term global warming according to new estimates   lancaster.ac.uk/news/huma... · Posted by u/gmays
oldstrangers · 9 months ago
Saving the planet doesn't make the stock prices go up, so no one will care.

Private companies are now getting their own nuclear power stations to power AI. We can't get new nuclear power for public use, but private for profit initiatives? Absolutely.

jumpCastle · 9 months ago
Stock prices cannot go up without if the planet is destroyed
jumpCastle commented on Nvidia reportedly delays its next AI chip due to a design flaw   theverge.com/2024/8/3/242... · Posted by u/mgh2
jahewson · a year ago
It’s hard to be disciplined about a black box though. That’s one reason why we’re all speeding off at a thousand miles per hour on transformers - the architecture works, why try other things?
jumpCastle · a year ago
Attention was invented because Bengio lab had to be disciplined about a black box (google had more compute)
jumpCastle commented on Exo: Run your own AI cluster at home with everyday devices   github.com/exo-explore/ex... · Posted by u/simonpure
PostOnce · a year ago
Maybe you want to conduct experiments that the cloud API doesn't allow for.

Perhaps you'd like to plug it into a toolchain that runs faster than API calls can be passed over the network? -- eventually your edge hardware is going to be able to infer a lot faster than the 50ms+ per call to the cloud.

Maybe you would like to prevent the monopolists from gaining sole control of what may be the most impactful technology of the century.

Or perhaps you don't want to share your data with Microsoft & Other Evils (formerly known as dont be evil).

You might just like to work offline. Whole towns go offline, sometimes for days, just because of bad weather. Nevermind war and infrastructure crises.

Or possibly you don't like that The Cloud model has a fervent, unshakeable belief in the propaganda of its masters. Maybe that propaganda will change one day, and not in your favor. Maybe you'd like to avoid that.

There are many more reasons in the possibility space than my limited imagination allows for.

jumpCastle · a year ago
Aren't services like runpod solve half of these concerns?
jumpCastle commented on Firefox 128 enables "privacy-preserving" ad measurements by default   mstdn.social/@Lokjo/11277... · Posted by u/3by7
bozey07 · a year ago
I suppose I should finally switch to Librewolf.

I really don't like Firefox forks, for the slow updates and because I do genuinely use some bleeding edge features, but I'm tired of Mozilla.

jumpCastle · a year ago
Are the updates really slow?
jumpCastle commented on Ilya Sutskever to leave OpenAI   twitter.com/ilyasut/statu... · Posted by u/wavelander
jonathankoren · a year ago
More like hustle culture’s “spend more time with the family”
jumpCastle · a year ago
Spend more time with my side projects.
jumpCastle commented on 2023 ACM Turing Prize awarded to Avi Wigderson   awards.acm.org/about/2023... · Posted by u/nanna
hinkley · a year ago
> Computer scientists have discovered a remarkable connection between randomness and computational difficulty (i.e., identifying natural problems that have no efficient algorithms). Working with colleagues, Wigderson authored a highly influential series of works on trading hardness for randomness. They proved that, under standard and widely believed computational assumptions, every probabilistic polynomial time algorithm can be efficiently derandomized (namely, made fully deterministic). In other words, randomness is not necessary for efficient computation. This sequence of works revolutionized our understanding of the role of randomness in computation, and the way we think about randomness.

How would I go about catching up with this aspect of his research? It’s not often that I’ve never heard of a Turing winner, but this guy is completely off of my radar.

jumpCastle · a year ago
You can try his book. https://www.math.ias.edu/avi/book

u/jumpCastle

KarmaCake day42December 30, 2016
About
I design channel codes for flash memories.
View Original