- completion based methods, where you take a big model, give it some queries, and use the answers to post-train a smaller model. This is what deepseek did with qwen models, where they took ~800k traces made by R1 and used sft on smaller qwen2.5 models. What the sky team found in their experiments is that you can use as few as 1-2k traces to reach similar results. Much cheaper.
- logit/internal representations based methods, where you need access to the raw model, and for each pair q -> response you train the small model on the entire distribution of the logits at the same time. This is a method suited for model creators, where they can take a pair of big + small model of the same architecture, and "distill" it in the smaller one. This is likely how they train their -flash -mini -pico and so on.
The first method can be used via API access. The second one can't. You need access to things that API providers won't give you.
"Considering that the distillation requires access to the innards of the teacher model, it’s not possible for a third party to sneakily distill data from a closed-source model like OpenAI’s o1, as DeepSeek was thought to have done. That said, a student model could still learn quite a bit from a teacher model just through prompting the teacher with certain questions and using the answers to train its own models — an almost Socratic approach to distillation."
Which means the job offer still includes stock options, but during the job offer call we don’t talk up the future value of the stock options. We don’t create any expectation that the options will be worth anything.
Upside from a founder perspective is we end up giving away less equity than we otherwise might. Downside from a founder perspective is you need up increase cash compensation to close the gap in some cases, where you might otherwise talk up the value of options.
Main upside for the employee is they don’t need to worry too much about stock options intricacies because they don’t view them as a primary aspect of their compensation.
In my experience, almost everyone prefers cash over startup stock options. And from an employee perspective, it’s almost always the right decision to place very little value ($0) on the stock option component of your offer. The vast majority of cases stock options end up worthless.
Also, even if the company ends up worth a lot of money, there's no guarantee that a way to liquidate, such as an IPO, exit or secondary market, will become available in any reasonable time frame. And as a regular employee you have exceedingly little to say in bringing about such events. There's not much fun in having a winning lottery ticket that can't be cashed in, in fact it's highly stressful.
Off the top of my head, obesity seems like the obvious culprit to investigate. If so, I wonder if semaglutide will close this gap again?
What do you gain from bombing residential buildings?
It reminds me of what happened with the flat UI/anti-skeuomorphism wave a bit over a decade ago. It seemed like someone got so incensed by the faux leather in the iPhone's Find My Friends app (supposedly made to look like it had the same stitching as the leather upholstery in Steve Jobs' private jet) that they went on a crusade against anything "needlessly physical looking" in UI. We got the Metro design language from Microsoft as the fullest expression of it, with Apple somewhat following suit in iOS (but later walking back some things too) and later Google's Material Design walking it back a bit further (drop shadows making a big comeback).
But for a while there, it was genuinely hard to tell which bit of text was a label and which was a button, because it was all just bits of black or monocolor text floating on a flat white background. It's like whoever came up with the flat UI fad didn't realize how much hierarchy and structure was being conveyed by the lines, shadows and gradients that had suddenly gone out of vogue. All of a sudden we needed a ton of whitespace between elements to understand which worked together and which were unrelated. Which is ironic, because the whole thing started as a crusade against designers putting their own desire for artistic expression above their users' needs by wasting UI space on showing off their artistic skill with useless ornaments, but it led to designers putting their own philosophical purity above their users' needs, by wasting UI space on unnecessary whitespace and forcing low information density on everyone.