So what use case does this test setup reflect? Is there a relevant commercial use case here?
For example, spending the time to label a few examples yourself instead of just blindly sending it out to labeling.
(Not always the case, but another thing to keep in mind besides total time saved and value of learning)
The overall rate of participation in the labor work force is falling. I expect this trend to continue as AI makes the economy more and more dynamic and sets a higher and higher bar for participation.
Overall GDP is rising while labor participation rate is falling. This clearly points to more productivity with fewer people participating. At this point one of the main factors is clearly technological advancement, and within that I believe if you were to make a survey of CEOS and ask what technological change has allowed them to get more done with fewer people, the resounding consensus would definitely be AI
The question is what you will/should learn for your limited time alive. Society needs well educated (I include things "street smarts" and apprenticeship in educated here) people in many different subjects. Some subjects are important enough everyone needs to learn them (reading, writing, arithmetic). Some subjects are nearly useless but fun (tinplate film photography) and so worth knowing.
Things like basic computer skills are raising to the level where the majority of people today need them. However I'm not sure that scripting is itself quite at that level. (though it is important enough that a significant minority should have them)
I’m talking about a general trend I see in use of this term, not that it’s always a bad thing to say “I’m not technical so someone else should write the script”
I agree with everything you said!
Both things are happening in the world: people using this terminology to throw work at others needlessly, and people doing good division of labor.
Since this is HN some disclaimers -no that’s not always what’s happening, when “not technical” is thrown around -no it’s not always appropriate to use AI instead of asking an expert
It seems like if they in fact distilled then what we have found is that you can create a worse copy of the model for ~5m dollars in compute by training on its outputs.
EDIT: Here's a better treatment, and it is the case that they give the exact same orderings: https://ajayp.app/posts/2020/05/relationship-between-cosine-...
The rumour/reasoning I’ve heard is that most advances are being made on synthetic data experiments happening after post-training. It’s a lot easier and faster to iterate on these with smaller models.
Eventually a lot of these learnings/setups/synthetic data generation pipelines will be applied to larger models but it’s very unwieldy to experiment with the best approach using the largest model you could possibly train. You just get way fewer experiments per day done.
The models bigger labs are playing with seem to be converging to about what is small enough for a researcher to run an experiment overnight.
We can catch things early, it shouldn’t be limited to only for smokers.