You're seeing less smog because people are driving modern cars with modern emission systems because we live in the future, smog-producing vehicles have been taken out of service, and drawing conclusions based on mere correlation of the two. It has nothing to do with ethanol.
A) why do you think car companies started to need to develop more modern emission systems to begin with? That’s right - California, a huge car market, started creating and enforcing standards through the introduction of CARB. Prior to this car companies had no incentives and weren’t doing this
B) there’s more to smog than just cars. CARB tackled emissions across multiple industries.
C) average cars last too long. The reason cars modernized was because CARB made owning and operating older vehicles impractical/impossible.
D) population and vehicle miles driven kept growing so per unit emissions need to shrink faster than that growth and it did. Thanks to CARB.
Is ethanol the primary reason we don’t have smog now? No, but the problem was so bad that CARB took a comprehensive approach at tackling the problem on many angles. And importantly they succeeded. It’s quite a silly position to take that “this problem would have solved itself”. It’s the twin to the fatalism position of “this problem is too big and complicated to solve”
Mathematically it comes from the fact that this transformer block is this parallel algorithm. If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.
Its true for basically all hardware and most models. You can draw this Pareto curve of how much throughput per GPU vs how many tokens per second per stream. More tokens/s less total throughput.
See this graph for actual numbers:
Token Throughput per GPU vs. Interactivity gpt-oss 120B • FP4 • 1K / 8K • Source: SemiAnalysis InferenceMAX™
https://inferencemax.semianalysis.com/
I think you skipped the word “total throughout” there right? Cause tok/s is a measure of throughput, so it’s clearer to say you increase throughput/user at the expense of throughput/gpu.
I’m not sure about the comment about speculative decode though. I haven’t served a frontier model but generally speculative decode I believe doesn’t help beyond a few tokens, so I’m not sure you can “speculatively decode harder” with fewer users.