Readit News logoReadit News
thunderbird120 commented on The World War Two bomber that cost more than the atomic bomb   bbc.com/future/article/20... · Posted by u/pseudolus
nopelynopington · 3 months ago
There's a good book by Malcolm Gladwell called "the bomber mafia" that's all about the airforce doctrine of indiscriminate bombing to break the population and the rebel faction within the airforce who advocated high precision bombing instead. Crippling industry instead of punishing civilians.

They eventually lost out to the former, culminating in horrific napalm raids on Japan that continued even after the atomic bombs were dropped and had more casualties

thunderbird120 · 3 months ago
Precision bombing during WW2 was not possible at the required scale. To put a bomb precisely on target back then you needed something like a dive bomber, a tactic which is incompatible with strategic-scale bombing. Even "precise" methods using advanced analog computers like the Norden bombsight could only do so much.

>Under combat conditions the Norden did not achieve its expected precision, yielding an average CEP in 1943 of 1,200 feet (370 m)[1]

This means that 50% of bombs fell within 1,200 feet of the target, which is an absolutely awful accuracy if you're trying to hit anything specific.

This was further compounded during the campaign against Japan by the heavy reliance of Japanese wartime industry on cottage industries which were dispersed almost randomly within Japanese population centers, rather than being located within specialized industrial districts. From a purely strategic standpoint which is only concerned with destroying the enemy's ability to make war, the most effect way to disrupt these kinds of industry with 1945 technology was essentially to burn every building in the city to the ground. Other options were simply ineffective.

[1]https://en.wikipedia.org/wiki/Norden_bombsight

thunderbird120 commented on Google is winning on every AI front   thealgorithmicbridge.com/... · Posted by u/vinhnx
marcusb · 8 months ago
From the article:

> I’m forgetting something. Oh, of course, Google is also a hardware company. With its left arm, Google is fighting Nvidia in the AI chip market (both to eliminate its former GPU dependence and to eventually sell its chips to other companies). How well are they doing? They just announced the 7th version of their TPU, Ironwood. The specifications are impressive. It’s a chip made for the AI era of inference, just like Nvidia Blackwell

thunderbird120 · 8 months ago
Nice to see that they added that, but that section wasn't in the article when I wrote that comment.
thunderbird120 commented on Google is winning on every AI front   thealgorithmicbridge.com/... · Posted by u/vinhnx
thunderbird120 · 8 months ago
This article doesn't mention TPUs anywhere. I don't think it's obvious for people outside of google's ecosystem just how extraordinarily good the JAX + TPU ecosystem is. Google several structural advantages over other major players, but the largest one is that they roll their own compute solution which is actually very mature and competitive. TPUs are extremely good at both training and inference[1] especially at scale. Google's ability to tailor their mature hardware to exactly what they need gives them a massive leg up on competition. AI companies fundamentally have to answer the question "what can you do that no one else can?". Google's hardware advantage provides an actual answer to that question which can't be erased the next time someone drops a new model onto huggingface.

[1]https://blog.google/products/google-cloud/ironwood-tpu-age-o...

thunderbird120 commented on What Bikini Atoll Looks Like Today (2017)   medium.com/stanford-magaz... · Posted by u/voxadam
genewitch · 9 months ago
> There's no way we're going to convince the middle classes of the central economies to reduce consumption to that level, or even to convince people in that class of development economy to stop aiming for more.

What if there is 200% tariffs on junk they shouldn't be buying anyhow? What if a new car becomes so expensive that the idea of having to replace it in 3-5/years induces outrage and class action lawsuits? What if you were only allowed to own one residence? What if out of season foods were fantastically expensive unless you had a community "garden"?

I know, HN, straight to -4. I'll meet you down there.

thunderbird120 · 9 months ago
People would correctly identify that their standard of living is being reduced for ideological reasons without tangible individual benefits and would likely not respond well to that, resulting in a loss of political power for whatever movement instituted those policies and a reversal of said policies.
thunderbird120 commented on Google Titans Model Explained: The Future of Memory-Driven AI Architectures   medium.com/@sahin.samia/g... · Posted by u/cmbailey
vessenes · 10 months ago
FWIW, Phil Wang (lucidrains) has been working on Titans reimplementation since roughly the day the paper was released. It looks to me from the repository that some of the paper's claims have not been reproduced yet, and reading between the lines, it might be Wang considers the paper to not be that groundbreaking after all -- hard to say definitively but the commit speed has definitely slowed down, and the last comments involve failing to replicate some of the key claims.

Unfortunately. The paper looks really good, and I'd like for it to be true.

https://github.com/lucidrains/titans-pytorch

thunderbird120 · 10 months ago
Yeah, that's the normal outcome for papers like this. Papers which claim to be groundbreaking improvements on Transformers universally aren't. Same story roughly once a month for the past 5 years.
thunderbird120 commented on TSMC 2nm Process Disclosure – How Does It Measure Up?   semiwiki.com/semiconducto... · Posted by u/sroussey
zozbot234 · 10 months ago
A standard cell can be a critical performance bottleneck as part of a chip, so it makes sense to offer "high performance" cell designs that can help unblock these where appropriate. But chip cooling operates on the chip as a whole, and there you gain nothing by picking a "higher raw performance" design.
thunderbird120 · 10 months ago
If that were totally true you would expect to see more or less uniform ratios of HP/HD cells mixes across different product types, but that's very much not the case. Dennard scaling may be dying but it's not dead yet. You can still sacrifice efficiency to gain performance. It's not zero sum.
thunderbird120 commented on TSMC 2nm Process Disclosure – How Does It Measure Up?   semiwiki.com/semiconducto... · Posted by u/sroussey
zozbot234 · 10 months ago
The death of Dennard scaling means that power efficiency is king, because a more power efficient chip is also a chip that can keep more of its area powered up over time for any given amount of cooling - which is ultimately what matters for performance. This effect becomes even more relevant as node sizes decrease and density increases.
thunderbird120 · 10 months ago
If it were that simple fabs wouldn't offer a standard cell libraries in both high performance and high density varieties. TSMC continues to provide both for their 2nm process. A tradeoff between power efficiency and raw performance continues to exist.
thunderbird120 commented on TSMC 2nm Process Disclosure – How Does It Measure Up?   semiwiki.com/semiconducto... · Posted by u/sroussey
thiago_fm · 10 months ago
Is there an advantage on going 2nm given the costs? Maybe somebody can clearly answer this here on HN, I love this subject!

It's interesting how the whole valuation of TSMC(and some from NVidia) are aligned by their current advantage on the 3nm process.

Intel on 18A is literally TSMC's 3nm process + backside power delivery, which means more power efficiency, performance also less heat.

It's definitely what they need to get them back into the processor game and beat everybody, maybe we will see Apple doing designs with the Intel factory before 2030?

Hope they don't fail their deadlines: summer this year to be producing 18A, and 2026 mass production.

thunderbird120 · 10 months ago
>Intel on 18A is literally TSMC's 3nm process + backside power delivery, which means more power efficiency, performance also less heat.

That's a pretty serious abuse of the word "literally" given that they have nothing in common except vague density figures which don't mean that much at this point.

Here's a line literally from the article

>Based on this analysis it is our belief that Intel 18A has the highest performance for a 2nm class process with TSMC in second place and Samsung in third place.

Given what we currently know about 18A, Intel's process appears to be less dense but with a higher emphasis on performance, which is in line with recent Intel history. Just looking at the density of a process won't tell you everything about it. If density were everything then Intel's 14nm++++ chips wouldn't have managed to remain competitive in raw performance for so many years against significantly denser processes. Chip makers have a bunch of parameters they have to balance when designing new nodes. This has only gotten more important as node shrinks have become more difficult. TSMC has always leaned more towards power efficiency, largely because their rise to dominance was driven by mobile focused chips. Intel's processes have always prioritized performance more as more of their products are plugged into the wall. Ideally, you want both but R&D resources are not unlimited.

thunderbird120 commented on ChatGPT Pro   openai.com/index/introduc... · Posted by u/meetpateltech
upghost · a year ago
So why are the context windows so "small", then? It would seem that if the cost was not so great, then having a larger context window would give an advantage over the competition.
thunderbird120 · a year ago
The cost for both training and inference is vaguely quadratic while, for the vast majority of users, the marginal utility of additional context is sharply diminishing. For 99% of ChatGPT users something like 8192 tokens, or about 20 pages of context would be plenty. Companies have to balance the cost of training and serving models. Google did train an uber long context version of Gemini but since Gemini itself fundamentally was not better than GPT-4 or Claude this didn't really matter much, since so few people actually benefited from such a niche advantage it didn't really shift the playing field in their favor.
thunderbird120 commented on ChatGPT Pro   openai.com/index/introduc... · Posted by u/meetpateltech
dartos · a year ago
During inference time, yes, but training time does scale exponentially as backpropagation still has to happen.

You can’t use fancy flash attention tricks either.

thunderbird120 · a year ago
No, additional context does not cause exponential slowdowns and you absolutely can use FlashAttention tricks during training, I'm doing it right now. Transformers are not RNNs, they are not unrolled across timesteps, the backpropagation path for a 1,000,000 context LLM is not any longer than a 100 context LLM of the same size. The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations. These calculations can be further parallelized using tricks like ring attention to distribute very large attention calculations over many nodes. This is how google trained their 10M context version of Gemini.

u/thunderbird120

KarmaCake day1252August 1, 2018View Original