Readit News logoReadit News
sottol commented on Gemini with Deep Think achieves gold-medal standard at the IMO   deepmind.google/discover/... · Posted by u/meetpateltech
vonneumannstan · a month ago
>it seems that the answer to whether or not a general model could perform such a feat is that the models were trained specifically on IMO problems, which is what a number of folks expected.

Not sure thats exactly what that means. Its already likely the case that these models contained IMO problems and solutions from pretraining. It's possible this means they were present in the system prompt or something similar.

sottol · a month ago
Or that they did significant retraining to boost IMO performance creating a more specialized model at the cost of general-purpose performance.
sottol commented on AI coding tools can reduce productivity   secondthoughts.ai/p/ai-co... · Posted by u/gk1
latenightcoding · a month ago
LLMs make me 10-20x more productive in frontend work which I barely do. But when it comes to low-level stuff (C/C++) I personally don't find it too useful. it just replaces my need to search stackoverflow.

edit: should have mentioned the low-level stuff I work on is mature code and a lot of times novel.

sottol · a month ago
Interesting, I find the exact opposite. Although to a much lesser extent (maybe 50% boost).

I ended shoehorned into backend dev in Ruby/Py/Java and don't find it improves my day to day a lot.

Specifically in C, it can bang out complicated but mostly common data-structures without fault where I would surely do one-off errors. I guess since I do C for hobby I tend to solve more interesting and complicated problems like generating a whole array of dynamic C-dispatchers from a UI-library spec in JSON that allows parsing and rendering a UI specified in YAML. Gemini pro even spat out a YAML-dialect parser after a few attempts/fixes.

Maybe it's a function of familiarity and problems you end using the AI for.

sottol commented on Blind spots on American cars are expanding   usa.streetsblog.org/2025/... · Posted by u/anigbrowl
bob1029 · 2 months ago
> and regulators aren't stepping in.

Regulators are the reason for this.

This article conveniently omits the reason for the gigantic A-pillars - Other safety regulations that enforce a certain coverage of airbags for the passengers. We can't magically regulate this one away. These kinds of higher order consequences tend to be a really painful, gradual realization.

I would gladly purchase a new vehicle with zero airbags in it if I were allowed to. Especially if the tradeoff is a 50% buff to visibility in the corners. I would also happily sign a form that locks up my vehicle's title for all eternity and prohibits any form of resale to satisfy the safety-at-all-costs extremists who caused this mess in the first place.

sottol · 2 months ago
I don't know that I 100% agree. I bet the A-pillar is for safety but hoods and grills are also getting so tall that some reports indicate the front blindspot can be as large as 16 feet! These grills are also more adept at killing pedestrians. I think it's partially because US safety is focused on occupants and ignores anyone outside the car afaict.

What I'm seeing in the suburban example graph in the article, is that the vehicle and hood have gotten way taller... I don't know how hoods/grills this high improve safety - I assume it's mostly the opposite. But they do "look rugged/beefy" - like all trucks and SUVs have to in order to sell - just look at the difference! [1]

"Millions of SUVs, trucks have dangerous front blind zone" [2]

[1] https://static1.hotcarsimages.com/wordpress/wp-content/uploa...

[2] https://www.nbcnews.com/news/us-news/americas-cars-trucks-ar... (or all the other writeups of this report)

sottol commented on I fought in Ukraine and here's why FPV drones kind of suck   warontherocks.com/2025/06... · Posted by u/_tk_
throwawayffffas · 2 months ago
I think you need to compare it to other man portable guided weapons like the FGM-148 Javelin. The Javelin is much much better in all respects, except perhaps range. But is about 100 - 200 times more expensive.

If you can afford* the Javelins and the TOW's of the world that's what you are going to use otherwise, you are stuck with FPVs.

Afford means not only fiscally, but production capacity wise as well.

sottol · 2 months ago
Doesn't a single javelin missile cost almost 200k? The drones I've seen I'd budget at 150-300$ plus explosives. I think that puts the javelin more at 500-1000x as expensive imo.
sottol commented on Japan builds near $700M fund to lure foreign academic talent   theregister.com/2025/06/1... · Posted by u/rntn
alephnerd · 2 months ago
It's too little.

Even countries like India are offering $50K-100K lab seed grants for western educated academics from the diaspora in high impact fields to take tenure track roles at major institutes, while offering free housing (as in an actual house) and a $15-30k salary depending on experience.

These EU programs are pennywise and pound foolish, and fail to incorporate private sector players or partnerships, and the lack of English fluency and established communities from a number of overrepresented countries in STEM makes the EU not as enticing.

You may as well go to America and earn the top dollar, or go to Australia, Japan, or Korea where you will earn a Western European salary but US level grants and have added cultural competency.

sottol · 2 months ago
How do the EU/Indian programs compare, eg in monetary outlay? It's hard to find numbers.
sottol commented on Japan builds near $700M fund to lure foreign academic talent   theregister.com/2025/06/1... · Posted by u/rntn
comrade1234 · 2 months ago
The EU announced a $500 million fund for this a few weeks ago. A couple of weeks ago Microsoft announced a $400 million investment in cloud computing here in Switzerland. Shows how ineffective $500 million is going to be...
sottol · 2 months ago
Why? $500M could pay 1000 scientists/academics (up to) a $50k pay-bump for 10 years each. Even if 50% would be lost to "administrative overhead", $25k over the usual EU market rate (which is lower than US) per year per scientist might entice many to move.

I don't know how this program would be structured, but imo this program is not doomed to fail due to underfinancing - of course this being an EU program it surely has other issues.

sottol commented on Gemini-2.5-pro-preview-06-05   deepmind.google/models/ge... · Posted by u/jcuenod
jcuenod · 3 months ago
82.2 on Aider

Still actually falling behind the official scores for o3 high. https://aider.chat/docs/leaderboards/

sottol · 3 months ago
Does 82.2 correspond to the "Percent correct" of the other models?

Not sure if OpenAI has updated O3, but it looks like "pure" o3 (high) has a score of 79.6% in the linked table, "o3 (high) + gpt-4.1" combo has a the highest score of 82.7%.

The previous Gemini 2.5 Pro Preview 05-06 (yea, not current 06-05!) was at 76.9%.

That looks like a pretty nice bump!

But either way, these Aider benchmarks seem to be most useful/trustworthy benchmarks currently and really the only ones I'm paying attention to.

sottol commented on My AI skeptic friends are all nuts   fly.io/blog/youre-all-nut... · Posted by u/tabletcorry
sottol · 3 months ago
I think another thing that comes out of not knowing the codebase is that you're mostly relegated to being a glorified tester.

Right now (for me) it's very frequent, depending on the type of project, but in the future it could be less frequent - but at some you've gotta test what you're rolling out. I guess you can use another AI to do that but I don't know...

Anyway, my current workflow is:

1. write detailed specs/prompt,

2. let agent loose,

3. pull down and test... usually something goes wrong.

3.1 converse with and ask agent to fix,

3.2 let agent loose again,

3.3 test again... if something goes wrong again:

3.3.1 ...

Sometimes the Agent gets lost in the fixes but now have a better idea what can go wrong and you can start over with a better initial prompt.

I haven't had a lot of success with pre-discussing (planning, PRDing) implementations, as in it worked, but not much better than directly trying to prompt what I want and takes a lot longer. But I'm not usually doing "normal" stuff as this is purely fun/exploratory side-project stuff and my asks are usually complicated but not complex if that makes sense.

I guess development is always a lot of testing, but this feels different. I click around but don't gain a lot of insight. It feels more shallow. I can write a new prompt and explain what's different but I haven't furthered my understanding much.

Also, not knowing the codebase, you might need a couple attempts at phrasing your ask just the right way. I probably had to ask my agent 5+ times, trying to explain in different ways how translate phone IMU yaw/pitch/roll into translations of the screen projection. Sometimes it's surprisingly hard to explain what you want to happen when you don't know the how it's implemented.

sottol commented on Mary Meeker's first Trends report since 2019, focused on AI   bondcap.com/reports/tai... · Posted by u/kjhughes
vivzkestrel · 3 months ago
what he s trying to ask is if 5 billion people will pay 20$ a month to the best AI model out there
sottol · 3 months ago
It's easy to forget that half the world's population lives in a few dollars a day [1] and sparing $20 month is unrealistic.

Also, access to market leading AI is not going to cost $20/mo when everything is said and done.

[1] https://blogs.worldbank.org/en/developmenttalk/half-global-p...

sottol commented on DumPy: NumPy except it's OK if you're dum   dynomight.net/dumpy/... · Posted by u/RebelPotato
turtletontine · 3 months ago
Pretty sure Numpy’s einsum[1] function allows all of this reasoning in vanilla numpy (albeit with a different interface that I assume this author likes less than theirs). Quite sure that first example of how annoying numpy can be could be written much simpler with einsum.

[1]: https://numpy.org/doc/stable/reference/generated/numpy.einsu...

sottol · 3 months ago
The author posted a previous article about why they don't like numpy and his problems with einsum:

https://dynomight.net/numpy/

u/sottol

KarmaCake day751July 2, 2022View Original