What's the value of a secret benchmark to anyone but the secret holder? Does your niche benchmark even influence which model you use for unrelated queries? If LLM authors care enough about your niche (they don't) and fake the response somehow, you will learn on the very next query that something is amiss. Now that query is your secret benchmark.
Even for niche topics it's rare that I need to provide more than 1 correction or knowledge update.
Now, personally, I would like to have sane defaults, where I can toggle stuff on and off, but we all know which way the wind blows in this case.
Now JWST is at near L2 but it is still in sunlight. It's solar-powered. There are a series of radiating layer to keep heat away from sensitive instruments. Then there's the solar panels themselves.
Obviously an orbital data center wouldn't need some extreme cooling but the key takeaway from me is that the solar panels themselves would shield much of the satellite from direct sunlight, by design.
Absent any external heating, there's only heating from computer chips. Any body in space will radiate away heat. You can make some more effective than others by increasing surface area per unit mass (I assume). Someone else mentioned thermoses as evidence of insulation. There's some truth to that but interestingly most of the heat lost from a thermos is from the same IR radiation that would be emitted by a satellite.
So in terms of power density you're looking at about 3 orders of magnitude difference. Heating and cooling is going to be a significant part of the total weight.
They made an effort to improve the product, but because everything in tech comes with side effects it turned out to be a bad decision which they rolled back. Sounds like highly professional behavior to me by people doing their best. Not everything will always work out, 100% of the time.
And this might finally reverse the trend of games being >100gb as other teams will be able to point to this decision why they shouldn't implement this particular optimization prematurely
I don’t think this is an inherent issue to the technology. Duplicate code detectors have been around for ages. Given an AI agent a tool which calls one, and ask it to reduce duplication, it will start refactoring.
Of course, there is a risk of going too far in the other direction-refactorings which technically reduce duplication but which have unacceptable costs (you can be too DRY). But some possible solutions: (a) ask it to judge if the refactoring is worth it or not - if it judges no, just ignore the duplication and move on; (b) get a human to review the decision in (a); (c) if AI repeatedly makes wrong decision (according to human), prompt engineering, or maybe even just some hardcoded heuristics
I have not seen evidence that there will be food system collapse driven by climate change that would be worse than those events, but my ears are open if you have some.
Do we have science that demonstrates humans don't autoregressively emit words? (Genuinely curious / uninformed).
From the outset, its not obvious that auto-regression through the state space of action (i.e. what LLMs do when yeeting tokens) is the difference they have with humans. Though I can guess we can distinguish LLMs from other models like diffusion/HRM/TRM that explicitly refine their output rather than commit to a choice then run `continue;`.