The auto regressive models consistently show better loss for the same number of training tokens
I find a lot of the conclusions compelling but I would’ve loved to see more epochs of training on the 1B model with a 10B dataset, as that model was showing epoch over epoch improvements
Diffusion requires more computation resources than autoregressive models, compute excess is proportional to the length of sequence. Time dilated RNNs and adaptive computation in image recognition hint us that we can compute more with same weights and achieve better results.
Which, I believe, also hint at the at least one flaw of the TS study - I did not see that they matched DLM and AR by compute, they matched them only by weights.
They already did that exercise:
> 3-car trains running at 30-40 trains per hour (a normal peak frequency for automated or even some human-driven metro lines) reach a capacity of about 18,000 passengers per hour per direction, well above the expected demand of any American line that doesn’t go through Manhattan.
> They already did that exercise
No, they didn't.They took "30-40 trains per hour" out of thin air and exercise was to calculate whether it is even possible to have more frequent shorter trains.
There is a safe minimal distance between trains, in fact, a safe distance for a given speed. Shorter trains are not exempt from obeying it. You can make shorter trains more frequent at the expense of lowering traveling speed.
What is the cap of throughput is due to these speed limitations is an exercise left for the author of the article.
They can tell you whether language is statically typed or not, does it have an effect typing (it necessarily needs higher order types), does it have type inference, etc,
A syntax of one is semantic of other.
Also, please experience being driven in the car in addition to drive a car.
In case of being driven you are not paying attention to the road and do not know why some acceleration did happen. In this situation your brain can decide that your senses are lying to it because body is poisoned and will invoke gag reflex, which we call ride sickness.
I look into that because I have to drive a family where two members are prone to motion sickness. I will definitely not look into any electric car because of this.
[1] https://abcnews.go.com/Business/ev-drivers-passengers-motion...
In my opinion, it is unavoidable.
Is this year 2000? Chinese cars are overwhelmingly tuned for much softer ride experience at expense of feeling performance / sporty. Especially 50k+ tier from last few years, most perform better than Euro cars in terms of noise, vibration harshness. You generally have to scrape to bottom barrel entry level 10-15k PRC cars to get bad ride experiences now. Chinese roads also great now, down to rural.
Quality's caught up since 2020s. Sure you can wait 10 years, but there's industry indicators like problems per 100 vehicles (PP100) where PRC EVs are fine / better than foreign bands (built in PRC factories. At least mechanically (power trains, batteries, chassis). Most PRC weakeness comes from stuff like infotainment, drive assist last few years because they've been iterating software a little too fast. There's also proprietary fleet data on EV taxis / rideshare that's been driven to death, and those hold up fine too.
Rolls Royce tuned their PRC cars to be EXTRA PLUSH, because PRC buyers prefers extra cloudy rides vs Euro buyers that prefers firmer / responsive, NA softer than EU, MENA somewhere between EU/NA.
> Is this year 2000? Chinese cars are overwhelmingly tuned for much softer ride experience at expense of feeling performance / sporty.
It is year 2025 in a country that was flooded with Chinese cars last three years. You can guess which one.There is nothing soft in ride of any Chinese car in my experience. These cars are in taxis here and you can experience ride in pretty much every model and brand, from basic to luxury. No Chinese car I've been driven in compensate for sudden rolls.
European cars have much softier ride than anything Chinese, even Chinese "luxury" brands sold here. As you mentioned that, then "sporty, firm and responsive" BMW 5series' are much more pleasant to be schoffered in than anything Chinese.
Chinese luxury car brand Hongqi put V6 turbocharged hybrid motors into their full sedan models [1], this is really a shame!
[1] https://en.wikipedia.org/wiki/Hongqi_H9
This shows they do not understand what quality of ride is. What vibrations are, how they affect quality of ride, how electric motors exacerbate vibrations [2], how motor's torque output affect quality of ride, etc, etc.
[2] https://abcnews.go.com/Business/ev-drivers-passengers-motion...
> We essentially have rolled out an L1 through L5, where L5 is the Holy Grail with fully autonomous end-to-end workflows. L1 is where we are today, and maybe heading into L2. L3 involves orchestration and then planning and decision-making. When we get to L5, we’ll be asking questions like, ‘Are junior-level engineers really needed?’
We're seeing this in the software development world too, where it's becoming harder and harder for junior engineers to both learn programing and to be successful in their careers. If the only thing that's needed are senior engineers, how do people grow to become senior engineers? It's a harrowing prospect.
With FPGA, you can have your purpose built chip overnight.
Thus, in my not so humble opinion, one should use whatever means one can to make FPGAs more efficient.