Readit News logoReadit News
Q6T46nT668w6i3m · a year ago
I believe “world models” are the future of the field so I really need better performance in areas like IVP, FFT, special functions (e.g., harmonics), and dynamic programming. The H100 advances (e.g., DPX instructions) are terrific but they feel like a starting point. Hell, improved geometric operations (e.g., triangulation and intersection) would be killer too and surely that expertise exists at NVIDIA! The H100, especially for the price, feels terrible when you’re training a neural network bottlenecked on an operation that flies on a consumer CPU when you know there’s GPU optimizations that have been left on the floor.
tgtweak · a year ago
I suspect these can be patched in as well - most of these functions have implementations in CUDA implying they should be able to run on the hardware even without dedicated instructions.
rbanffy · a year ago
This was one of the selling points for RISC: to focus on the most frequently used instructions and implement the rest in software would yield smaller and faster designs.

I have the feeling that GPUs are on the sweet spot where smaller footprint directly translates into more executions and, therefore, higher throughput for the same chip area.

pjmlp · a year ago
With an "OS" to go along with it,

https://www.nvidia.com/en-us/data-center/products/ai-enterpr...

It is this kind of delivery that the competition misses out.

amluto · a year ago
I imagine that Nvidia is trying to build a more sustainable moat. If the only things they have that their competitors don’t are a nice development framework, nice libraries and nice drivers, it’s not that hard for a customer to get their software working on a competing hardware platform and cut out a bunch of Nvidia’s enormous markup. But if Nvidia also strong buy-in with datacenter operators and an entire platform to magically run people’s applications without having them need to think about how they’re deployed, then they can try for an AWS-like moat in which customers want to avoid the ongoing cost of DIYing their stack.
shiftpgdn · a year ago
Tiny corp/tinygrad has been working for a year+ and has raised something like $5 million dollars to try to get AMD chips up to speed to Nvidia, without much success. Check out twitter where George Hotz has been very vocal about AMD needing to open source their chips to allow someone to help them get up to speed.
gitfan86 · a year ago
Assuming that the hardware isn't a moat because other people make similar hardware is a mistake.

The networking alone is a huge bottleneck at scale. A competitor has to be better at networking AND chips to be competitive.

robot · a year ago
"it’s not that hard for a customer to get their software working on a competing hardware platform and cut out a bunch of Nvidia’s enormous markup"

Agree and yet none of the contenders were able to work out their software play (Intel, AMD, chip startups) for more than a year which shows how corporates move slow.

Google is not selling their TPUs AFAIK and their tooling is completely focused on internal use.

So really interesting to see no one else is properly addressing the need even though they have chips (and the chip itself is much simpler than a cpu, a systolic matrix multiplier array).

kkielhofner · a year ago
I'm very vocal about this to the point where the naive/cursory view is that I'm an "Nvidia fanboy". It's amazing how many times I've had to try to relate this point and how much hate I get for it - Nvidia is lightyears ahead of AMD and the overall ROCm ecosystem in terms of software support. AMD makes fantastic hardware but at the end of the day it doesn't do anything without software. This is very obvious and very basic.

CUDA will do whatever you want and it more-or-less just works. ROCm (after > six years) is still:

- Won't work on your hardware

- Used to work on your hardware but we removed support within a few years

- Burn 10x more time trying to get something to work

- Be perpetually behind CUDA in terms of what you want/need to do

- Sorry, that just won't work

- Performance is lower than it should be for what is often actually better hardware, to the point where a superior newer generation AMD GPU gets bested by a previous generation Nvidia GPU with inferior (on paper) hardware specs

I've been trying ROCm since it was initially released > six years ago. I want AMD to succeed - I've purchased every new generation of AMD GPU in these six years to evaluate the suitability of AMD/ROCm for my workloads. Once a quarter or so I check back in to evaluate ROCm.

Every. Single. Time. I come away laughing/shaking my head at how abysmal it is. Then I go back to CUDA and sit in wonder at how well it actually works and throw even more money at Nvidia because I just need get things done and my concerns about their monopoly, artificial market segmentation, ridiculously high margins, etc are a distant second to my livelihood.

AMD (and others) need to understand what Jensen Huang has been saying for years - 30% of their development spend is on software. As the announcements this week show, Nvidia is using their greater and greater financial resources and market share to continue to lap AMD in the only thing people actually care about: here's our product and here's what you can actually do with it.

Many people with a fundamental hate/disgust for Nvidia will come back and say "ok bootlicker, it's supported in torch you're spreading FUD". Ok, take a look at the Nvidia platform you linked and show me where the ROCm equivalent is. Take a look at inference serving platforms which are one of the things I care most about. Look at flash attention, alibi, and the countless other software components that you actually need beyond torch in many cases. Watch even basic torch crash all over the place with ROCm.

Sure, you /might/ be able to train or run local one-off inference with AMD. How do I actually run this thing for my users? Crickets -or- maybe vLLM support for ROCm for LLMs (nothing for other models). Then dig just a little bit deeper and realize even vLLM isn't feature complete, requires patches, specific versions all around, and from personal experience a lot of github/blog spelunking and pain. With CUDA it's `docker run` and flies.

With CUDA I can run torchserve, HF TGI, vLLM, Triton, and a number of others to actually serve models up for users so I can make money from my work. ROCm, meanwhile, can barely run local experiments.

AMD needs to get it together.

treme · a year ago
Crazy that video game graphics business happened to be the secret level portal to AI land
dragontamer · a year ago
Watch this 90s commercial for 3dfx: https://www.youtube.com/watch?v=ooLO2xeyJZA

GPUs always had more compute / Gigaflops than traditional computers. GPUs in fact have more to do with 80s-era supercomputer architecture than normal CPUs.

https://www.youtube.com/watch?v=ODIqbTGNee4

dehrmann · a year ago
I'm not sure if it's just nostalgia, but I still like the final 3dfx logo, at least in a video game context.
jdawg777 · a year ago
That was an unexpected twist.
pjmlp · a year ago
Ever since GPUs became programable, which goes earlier than many think (see TMS34010 from 1986), there has been many attempts to use the cards for general purpose compute.

It turns out that anything related to neural networks, and similar AI approaches, is all about compute.

dragontamer · a year ago
I'd say almost the opposite in practice.

GPUs, as they became programmable (and maybe even a little bit before that...), started to take cues from SIMD Supercomputers. So the compute methodologies were researched first, and then GPUs (ie: applications to graphics) were applied afterwards.

I've heard rumors that the first programmable GPUs were considered because GPUs already were in SIMD-style compute and running instructions in a programmable way at the hardware/firmware level. It just needed to be "revealed" to OpenGL or DirectX programmers.

falcor84 · a year ago
In hindsight I think it makes good sense - game graphics were always aiming to represent worlds with as high fidelity as possible.
barumrho · a year ago
Good point, but it is interesting that the computations to render visual worlds in high fidelity is the same computation to "ingest" the data to create a model.

Reminds me of how a mic is a speaker and speaker is a mic.

p_l · a year ago
nVidia claims that they went into video games because it provided a way to fund their goal of compute accelerators
packetlost · a year ago
I don't believe that claim one bit. That reads like revisionist history if I've ever seen it. If there's sources that back it up, fine, but until I see pretty reasonable proof I'm going to take that as CEO grandstanding while the market is hot.
pxtail · a year ago
Of course, obviously ever since inception of the company they had the noble goal of uplifting human race. It was just the set of unfortunate circumstances that forced them to make trivial utility devices just to survive long enough
ripe · a year ago
"When we were selling shovels to coal miners, we always secretly knew there was gold in those hills."

Yeah, sure.

tgtweak · a year ago
Highly doubt this was the initial idea for nvidia given they were graphics-only for a very long time. CUDA definitely felt like more of a value-add for the first 5-6 years than a concerted effort to build accelerators and to fund that with graphics demand. First "tesla" line of GPUs - which had very little compute-only focus - were in 2007.
m3kw9 · a year ago
And tell me what is Nvidia trying to fund with the AI business?
amelius · a year ago
Then why does my graphics card have 5 video outputs?
MangoCoffee · a year ago
Nah, Intel said Nvidia just got "lucky"
xyzzy_plugh · a year ago
So says every loser when they are bested by their own ignorance.
baq · a year ago
Does a series of matrix multiplications have a soul?
AYBABTME · a year ago
Matrix multiplication is just an effective representation for the dense graph structures. The same graphs could be implemented in other ways (adjacency list, edge list), perform the same overall logic, without doing vector/matrix/tensor math. It seems like the magic is in the overall idea of neurons and networks of them following increasingly interesting architectures and training mechanisms.

But because these graphs are mostly dense and involve numerical operations, matrices/tensors are a great implementation.

sirsinsalot · a year ago
Does a series of electrical impulses in a brain?
bheadmaster · a year ago
Define "soul".
paulmd · a year ago
does a SQL table provide a stochastic representation of conceptual symbolics?

oh, sorry, I thought we were just asking questions

VladimirGolovin · a year ago
Does a bunch of atoms arranged into proteins have a soul?

Deleted Comment

transcriptase · a year ago
It would appear the G in GPU now stands for AI.
zacksiri · a year ago
They might re-purpose the G to Generative

Generative Processing Unit works for all cases.

m3kw9 · a year ago
When AI is in everything, G would become General
black_puppydog · a year ago
Together with the newly-peripheral PPU (formerly known as CPU)
adverbly · a year ago
I think the fact that people sometimes use the letter g to denote generalized intelligence might be useful here.

You could call them gPUs.

https://en.m.wikipedia.org/wiki/G_factor_(psychometrics)

sesuximo · a year ago
Apu already taken by aviation tho
Uvix · a year ago
And by AMD (Accelerated Processing Unit, their term for CPUs with good integrated graphics like what Xbox and PlayStation consoles use).
whamlastxmas · a year ago
And The Simpsons
k8sToGo · a year ago
So is GPU (Ground power unit)
Dalewyn · a year ago
Auxiliary Power Unit.

Both aviation and computers use them, though the latter more often call them UPS (Uninterruptible Power Supply).

Deleted Comment

bionhoward · a year ago
If everyone and their cousin is bullish on compute, then what’s the bear thesis here? Why might compute NOT be the best answer to our challenges as software engineers? Why might a focus on compute scaling ultimately be inferior to something else?

I seek all kinds of answers, including ones about fundamental logic, mathematical physics, etc

whiterknight · a year ago
When we don’t have a model of the problem it takes enormous amounts of power to synthesize one out of neurons or another general function approximating primitive.

But once we understand a little bit about the problem we can model 80-90% of its behavior with a handful of parameters. Add in some bias and noise parameters and you have an accurate trainable machine learning model that’s orders of magnitude more efficient.

Take for example a spring which can be modeled by 1 or 2 parameters. But its impulse response looks like a sin curve multiple by exponential decay.

If you just train neurons to match input/outputs from a spring you need a ridiculous number of model parameters to describe that shape.

CNNS have seen an enormous amount of success due to this fact: a lot of processes can be modeled by convolution.

Kon-Peki · a year ago
> what’s the bear thesis here?

Nvidia is ceding the low-end GPU market to anyone who wants it. Not only does it allow a competitor to establish a reliable source of revenue for their R&D department, but it could cut off the sale of the binned chips that are inevitably produced on the expensive, tiny processes that Nvidia uses - which would hurt their margins to some degree.

jauntywundrkind · a year ago
Innovators Dilemma seems strong on this one. Except typically the flight upmarket is driven by competition below. In this case Nvidia isn't being driven upmarket, the profits are just too tempting focusing on upmarket & there's not much competition downmarket.
paulmd · a year ago
"obviously chevy is just ceding the low-end market to anyone willing to make a camaro for the price of a camry, just think of all the profit waiting for anyone willing to establish themselves in this market"

bro there is like $10 of margin in your idea for a $200 gpu lol, nobody is "ceding" anything (actually 4060 is a more advanced card than 7600 on literally every front, for ~10% more money) but the cost floor has climbed to the point where $200-300 gpus just don't progress that much anymore.

There's very good reasons for this - shrinks are the least effective on low-tier cards (because memory controllers don't shrink), and you simply don't gain much actual savings from shrinking a 200mm2 die - congrats it's 150mm2 now, on a more expensive node, meaning your $10 chip is now $9. And meanwhile gamers want more VRAM every year, manufacturing and testing and shipping costs have gone up (and cost the same for a 4090 as a 4060), etc. The economics of low-end cards is literally terrible and they are simply falling off the edge of profitability.

Intel is willing to lose money hand-over-fist just to get into the market, but AMD and NVIDIA are pretty much charging fair-ish prices, and gamers just are too emotionally immature to accept that moore's law really really actually is dead for realsies and things aren't going to progress 40% perf/$ per gen anymore.

It's so weird, nobody cries about the CPU market like this. A 1600AF went for $85, a 3600 went for $160, nobody said "boo" when the 5600X increased that to $330 or whatever. Nowadays you are spending at least 50% more on your CPU than you did 5 years ago, sometimes closer to 2x. The enthusiast market is buying $250-400 cpus now, not $85-160. And obviously everyone understands that upgrading your CPU every gen is terrible value too, especially when prices have drifted upwards. But they don't have a half-decade of negging from reviewers telling them that this is a market in crisis, and that they should feel bad about buying a CPU, etc.

The literal half-decade of warfare from reviewers against the GPU market is so tired at this point. Bro, things are going to slow down, it just is how it is. GPUs are the processor that's most dependent on moore's law providing growth in transistors at the same cost, and wafer price increases hit them the hardest. Go complain to TSMC instead, or ASML, or the brick wall - it's ultimately a physics problem. But there's a hell of a lot of clicks and youtube ad money to be made whining about it in the meantime.

At least reviewers are finally coming to jesus on DLSS - mostly because they know AMD will finally have a decent upscaler within a year tops, and that RDNA4/5 will be pushing forward on tensor etc. The writing was on the wall as soon the specs leaked for PS5 Pro, which is basically adopting RTX features wholesale. https://www.youtube.com/watch?v=CbJYtixMUgI https://www.youtube.com/watch?v=BG-7vyw2YRg&t=1625s

loudmax · a year ago
I'm absolutely not going to short Nvidia stock, but it's plausible that they're overvalued.

Nvidia GPUs are pretty flexible in terms of computation and extremely power hungry. It may be that a next generation of more specialized hardware, such as TPUs or something, outperforms Nvidia GPUs on machine learning tasks to such an extent that those GPUs are obsolete for those tasks. This next generation could come to market sooner than Nvidia anticipates.

Another possibility is that ML researchers figure out some ways to radically reduce the amount of compute required for good training and inference on _less_ specialized hardware. It's really impressive what you can do with llama.cpp. If open source models running on consumer grade hardware ever get to 90% as good as ChatGPT (which, to be clear, is absolutely not the case currently), then those top end GPUs are overkill for most use cases.

I don't think either of those scenarios is particularly likely, but they're at least plausible.

paulmd · a year ago
> Another possibility is that ML researchers figure out some ways to radically reduce the amount of compute required for good training and inference on _less_ specialized hardware

just like the creation of radically simpler internal combustion engines led to us spending a lot less on internal combustion engines, right? /s

vinyl7 · a year ago
If we focus on writing better software with performance in mind instead of this insane stack of abstraction disaster, we could easily get massive increases in compute capability with current hardware.

The most impressive thing about modern computing is that we've had exponential increase in compute speed, yet everything runs as slow as it did 30 years ago

kilpikaarna · a year ago
Not sure about your definition of bearish, but concerned about the geopolitical risk of relying on this one company and their one supplier. In addition to everything else.

Also bearish on programmers keeping up on their fundamental algorithms, rather than trying to throw NNs at every problem.

ApolloFortyNine · a year ago
>Also bearish on programmers keeping up on their fundamental algorithms, rather than trying to throw NNs at every problem.

Besides interviews, at least 90% of developers have no need for 'fundamental algorithms'. The library being used uses them in some way sure, but the vast majority of devs simply need to know how to use the tool, not how the tool itself is developed.

cactusplant7374 · a year ago
The bear case:

1) AGI won't happen because we are on the wrong path

2) AI being a big part of our lives is still a theory. Aswath Damodaran has some brief thoughts on this.

But the biggest bear case has to be that the technology won't get better. Essentially, everyone assumes that it will without reservations.

ajross · a year ago
Going to skip the "fundamental logic, mathematical physics, etc" angle and go with:

"That AI isn't all that that and won't make much money" seems to be by far the biggest one. So far the applications are impressive and a little scary, but not actually something that anyone is going to pay for. Apple makes a zillion dollars because people want its phones. Google makes a zillion dollars because people want to sell junk to folks on the internet.

You need to posit a product built out of compute that does more. Maybe replaces a bunch of existing workers in an existing industry, something like that. So far the market is still looking.

aurareturn · a year ago
At my work, I'm finding so many incredible things that GPT4 API can do to make my company run much more efficiently.

For example, being able to feed a potential customer's invoice into GPT and ask it to see what kind of services we can offer to beat its price. Our sales people had to spend hours doing this before. Now it's done in 2 minutes through an engineered prompt. And it's incredibly accurate.

The problem with GPT4 API is context size and price. That's it. Both are bottlenecked by faster and cheaper compute.

That's my bull case for more compute, not bear case like OP asked.

Macha · a year ago
> You need to posit a product built out of compute that does more. Maybe replaces a bunch of existing workers in an existing industry, something like that. So far the market is still looking.

Although, even if that is the ultimate result, there's still going to be plenty of money changing hands on the way to that conclusion. See also all the blockchain/web3 companies, when there was (to me at least) clearly a lot less substance/potential there.

whiterknight · a year ago
These giant GPUs aren’t getting more efficient, they draw even more power to do more work. It’s impressive and will have important use cases.

But fundamentally technology gets better when we can do more with less.

dotnet00 · a year ago
They are getting more efficient. They draw say, 2x the power, but do 4x the work. The H100 is apparently ~5x more efficient than the A100.
stephbu · a year ago
Ironically the power per cycle is decreasing - power and thermal dissipation are really the limits NVIDIA is exploring. It’s what the software does with those cycles that is leaping exponentially.
hackerlight · a year ago
I think we can be certain that AGI will be compute intensive. Something Ilya Sutskever said made that clear. If you only have a small model, there's logically not much you can do with it. You can represent a single edge of an object, maybe. But it's not enough capacity to represent multiple edges and how they mix together to form an object. And if it can't do that, then it has no representations it can use for reasoning.

There's still the secondary question of how compute heavy it will be, and I don't think anyone knows. But Sam Altman, in a recent speech he gave in Korea, expressed confidence that there isn't a limit in sight for returns from GPT scaling.

jstummbillig · a year ago
> Why might a focus on compute scaling ultimately be inferior to something else?

a) Humans at some point between here and eternity become more efficient easier, than scaling compute is hard. Seems unlikely.

b) Compute is overrated, now or will be in the near future. I will be happy to donate to the church of "compute is overrated" if that makes people get off of gpt-4+ and let me cook. Read that as: I doubt it.

I don't see a c)

hnthrowaway0328 · a year ago
I think eventually AMD and other players are going to mive in with force and we will see a surplus of supply in a few years, especially when China starts to spit out a lower end version of...everything.
xyzzy_plugh · a year ago
Nvidia is too big. China could not easily get away with spinning complete rip offs that even support CUDA without poking a hole in their hull. Nvidia is working very hard to placate China. If China does go down this route, it'll be for China-only for a considerable time. Nvidia can afford to placate China and avoid that scenario, should their economics continue to trend upwards, practically forever.
AYBABTME · a year ago
I would normally be tempted to think the S&P500 should look linear or similar-ish to last year, and so on. But I think there's a valid thesis where rapid technological advancement does indeed just grow the pie exponentially: where the amount of value that becomes unlocked in a non-zero-sum manner grows tremendously.

Just with current AI models, the amount of value that is waiting to be created (take technology X, add AI to it) is incredible. Casual things that used to take years for a team to build, can now be solved by throwing a GPU at it with a generic model that is fine-tuned a bit. Basically, things that were unpractical 2y ago are now on the table.

The bear thesis is that compute will stop being scarce. Which is plausible, since in capitalism, the best cure for high prices tends to be high prices.

Something crazy to think about is that Accelerando by Charles Stross is starting to look like a prophecy being slowly fulfilled.

gitfan86 · a year ago
The demand for compute increases dramatically as the functionality and reliability go up.
swingingFlyFish · a year ago
Examples?
Cacti · a year ago
I mean the entire tech industry is predicated on a continued exponential growth of computer power. It could be this 70 years and next couple dozen are a blip of what will be millennia of linear returns.
whamlastxmas · a year ago
I mean I could baselessly argue that the second we have AGI, we will very shortly after have ASI, after which I think most computing could very likely be hundreds or thousands of times more efficient. It’s possible there exists far more computing power in the world than we will ever need.
flohofwoe · a year ago
But Can It Run Crysis?

(seriously though, don't call it a "GPU" when rendering takes the back seat)

ksec · a year ago
I guess Blackwell is too late in the design cycle to use N3. It would be interesting to see, at these sort of margin and volume. Would it make sense to have GPU on latest node? Next Gen 3nm GPU in 2025 and if they could move aggressively 2nm GPU in 2026.
dsir · a year ago
Does anyone know the numbers in layman's terms regarding the demand for compute and what our systems/chips are able to reasonably process with this new tech?

I'm curious if the technology is now vastly out preforming the demand here or if the demand for compute is outpacing the tech.

trueismywork · a year ago
Demand is infinite