Hi! Very cool results. Are you able to share some numbers about the slope of the scaling curve you found, i.e. how performance responds to a growing nr of demonstrations?
Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model
A huge amount of the bulk in smart watches is the health sensors and vibration motor. Can easily slim things down if those aren't required along with a big display.
that makes sense, and now that I've seen proof that it's possible to build a watch-sized smartwatch, I'm even more puzzled that not a single one is on the market
wow, amazing how much functionality they packed into this tiny case. this would be the only smartwatch on the planet that isn't comically chunky, while still giving you the two things you need (namely time and notification icons, so you can break the habit of checking your phone). I'd pay for this, no doubt, if only I could!
This post caused me to update my cautious strategy to a somewhat less cautious one. Hoping to invite discussion about aspects Zvi might be missing, though
Happy to answer any questions on the model, hardware, etc
Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model