I'm looking at my Jetson Nano in the corner which is fulfilling its post-retirement role as a paper weight because Nvidia abandoned it in 4 years.
Nvidia Jetson Nano, A SBC for "AI" debuted with already aging custom Ubuntu 18.04 and when 18.04 went EOL, Nvidia abandoned it completely without any further updates to its proprietary jet-pack or drivers and without them all of Machine Learning stack like CUDA, Pytorch etc. became useless.
I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel.
In general, Nvidia's relationship with Linux has been... complicated. On the one hand, at least they offer drivers for it. On the other, I have found few more reliable ways to irreparably break a Linux installation than trying to install or upgrade those drivers. They don't seem to prioritize it as a first class citizen, more just tolerate it the bare minimum required to claim it works.
Now that the majority of their revenue is from data centers instead of Windows gaming PCs, you'd think their relationship with Linux should improve or already has.
I've had a similar experience, my Xavier NX stopped working after the last update and now it's just collecting dust. To be honest, I've found the Nvidia SBC to be more of a hassle than it's worth.
The Orin series and later use UEFI and you can apparently run upstream, non-GPU enabled kernels on them. There's a user guide page documenting it. So I think it's gotten a lot better, but it's sort of moot because the non-GPU thing is because the JetPack Linux fork has a specific 'nvgpu' driver used for Tegra devices that hasn't been unforked from that tree. So, you can buy better alternatives unless you're explicitly doing the robotics+AI inference edge stuff.
But the impression I get from this device is that it's closer in spirit to the Grace Hopper/datacenter designs than it is the Tegra designs, due to both the naming, design (DGX style) and the software (DGX OS?) which goes on their workstation/server designs. They are also UEFI, and in those scenarios, you can (I believe?) use the upstream Linux kernel with the open source nvidia driver using whatever distro you like. In that case, this would be a much more "familiar" machine with a much more ordinary Linux experience. But who knows. Maybe GH200/GB200 need custom patches, too.
Time will tell, but if this is a good GPU paired with a good ARM Cortex design, and it works more like a traditional Linux box than the Jeton series, it may be a great local AI inference machine.
AGX also has UEFI firmware which allows you to install ESXi. Then you can install any generic EFI arm64 iso in a VM with no problems, including windows.
And unless there is some expanded maintenance going on, 22.04 is EOL in 2 years. In my experience, vendors are not as on top of security patches as upstream. We will see, but given NVIDIA's closed ecosystem, I don't have high hopes that this will be supported long term.
rk3588 is pretty close, I believe it's usable today, just missing a few corner cases with HDMI or some such. I believe that last patches are either pending or already applied to an RC.
If its stack still works, you might be able to sell or donate it to a student experimenting. They can still learn quite a few things with it. Maybe even use it for something.
Using outdated tensorflow (v1 from 2018) or outdated PyTorch makes learning harder than it need to be, considering most resources online use much newer versions of the frameworks. If you're learning the fundamentals and working from first principle and creating the building blocks yourself, then it adds to the experience. However, most most people just want to build different types of nets, and it's hard to do when the code won't work for you.
Today I'm using 2x 3090's which are over 4 years old at this point and still very usable. To get 48gb vram I would need 3x 5070ti - still over $2k.
In 4 years, you'll be able to combine 2 of these to get 256gb unified memory. I expect that to have many uses and still be in a favorable form factor and price.
Eh? By all indications compute is now evolving SLOWER than ever. Moore's Law is dead, Dennard scaling is over, the latest fab nodes are evolutionary rather than revolutionary.
This isn't the 80s when compute doubled every 9 months, mostly on clock scaling.
I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!
This is something every company should make sure they have: an onboarding path.
Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.
Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.
Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).
1000% all these ai hardware companies will fail if they don't have this. You must have a cheap way to experiment and develop. Even if you want to only sell a $30000 datacenter card you still need a very low cost way to play.
Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star
It really mystifies me that Intel AMD and other hardware companies obviously Nvidia in this case Don't either have a consortium or each have their own in-house Linux distribution with excellent support.
Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.
Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.
Not just in the drag the feet communication. Getting the tech people a line problem.
Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.
Raptor Computing provides POWER9 workstations. They're not cheap, still use last-gen hardware (DDR4/PCIe 4 ... and POWER9 itself) but they're out there.
There were Phi cards, but they were pricey and power hungry (at the time, now current GPU cards probably meet or exceed the Phi card's power consumption) for plugging into your home PC. A few years back there was a big fire sale on Phi cards - you could pick one up for like $200. But by then nobody cared.
The developers they are referring to aren’t just enthusiasts; they are also developers who were purchasing SuperMicro and Lambda PCs to develop models for their employers. Many enterprises will buy these for local development because it frees up the highly expensive enterprise-level chip for commercial use.
This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!
The 1 PetaFLOP spec and 200GB model capacity specs are for FP4 (4-bit floating point), which means inference not training/development. It's still be a decent personal development machine, but not for that size of model.
This looks like a bigger brother of Orin AGX, which has 64GB of RAM and runs smaller LLMs. The question will be power and performance vs 5090. We know price is 1.5x
I’m not so sure it’s negligible. My anecdotal experience is that since Apple Silicon chips were found to be “ok” enough to run inference with MLX, more non-technical people in my circle have asked me how they can run LLMs on their macs.
Surely a smaller market than gamers or datacenters for sure.
If this is gonna be widely used by ML engineers, in biopharma, etc and they land 1000$ margins at half a million sales that's half a billion in revenue, with potential to grow.
The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.
[...]
AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.
Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.
I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.
From the people I talk to the enthusiast market is nvidia 4090/3090 saturated because people want to do their fine tunes also porn on their off time. The Venn diagram of users who post about diffusion models and llms running at home is pretty much a circle.
Yeah, I really don't think the overlap is as much as you imagine. At least in /r/localllama and the discord servers I frequent, the vast majority of users are interested in one or the other primarily, and may just dabble with other things. Obviously this is just my observations...I could be totally misreading things.
> I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!
They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.
This is somewhat similar to what GeForce was to gamers back in the days, but for AI enthusiasts. Sure, the price is much higher, but at least it's a completely integrated solution.
Yep that's what I'm thinking as well. I was going to buy a 5090 mainly to play around with LLM code generation, but this is a worthy option for roughly the same price as building a new PC with a 5090.
i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.
The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.
I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.
Not expecting this to compete with the 5x series in terms of gaming; But it's interesting to note the increase in gaming performance Jensen was speaking about with Blackwell was larger related to inferenced frames generated by the tensor cores.
I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?
> they seem to be doing everything right in the last few years
About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.
There will undoubtably be a Mac Studio (and Mac Pro?) bump to M4 at some point. Benchmarks [0] reflect how memory bandwidth and core count [1] compare to processor improvements. Granted, ymmv to your workload.
The nVidia price is closer (USD 3k) to a top Mac mini but I trust Apple more for the end-to-end support from hardware to apps than nVidia. Not an Apple fanboy but an user/dev, and I don't think we realize what Apple really achieved, industrially speaking. The M1 was launched in late 2020.
it eats into all NVDA consumer-facing clients no? I can see why openai and etc are looking for alternative hardware solution to train their next model.
Am I the only one disappointed by these? They cost roughly half the price of a macbook pro and offer hmm.. half the capacity in RAM. Sure speed matters in AI, but what do I do with speed when I can't load a 70b model.
On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.
Hm? They have 128GB of RAM. Macbook Pros cap out at 128GB as well. Will be interesting to see how a Project Digits machine performs in terms of inference speed.
Bro we can connect two ProjectDigits as well. I was only looking at the M4 macbook because 128gb unified memory. Now this beast can cook better LLMs at just 3K with 4TB SSD too.
M4 Macbook Max (128 GB unified ram and 4TB Storage) is 5999.
So, No more apple for me. I will just get the Digits. And can create a workstation as well.
They are perfectly fine for certain people. I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious and would prefer using existing devices rather than pay for subscriptions where possible.
And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.
Nvidia releases a Linux desktop supercomputer that's better price/performance wise than anything Wintel is doing and their whole new software stack will only run on WSL2. They aren't porting to Win32. Wow, it may actually be the year of Linux on the Desktop.
Not sure how to judge better price/perf. I wouldn't expect 20 Neoverse N2 cores to do particularly well vs 16 zen5 cores. The GPU side looks promising, but they aren't mentioning memory bandwidth, configuration, spec, or performance.
Did see vague claims of "starting at $3k", max 4TB nvme, and max 128GB ram.
I'd expect AMD Strix Halo (AI Max plus 395) to be reasonably competitive.
NVIDIA is likely citing 1 PFlops at FP 4 sparse (they did this for GB200), so that is 128 TFlops BF16 dense, or 2/3 of what RTX 4090 is capable of. I would put the memory bandwidth at 546 GBps, using the same 512 bit LPDDR5X 8533 Mbps as Apple M4 max.
Because NVidia naturally doesn't want to pay for Windows licenses.
NVidia works closely with Microsoft to develop their cards, all major features come first in DirectX, before landing on Vulkan and OpenGL as NVidia extensions, and eventually become standard after other vendors follow up with similar extensions.
Here he says that in order for the cloud and the PC to be compatible, he's going to only support WSL2, the Windows subsystem for Linux which is a Linux API on top of Windows.
Here's a link to the part of the keynote where he says this:
> their whole new software stack will only run on WSL2. They aren't porting to Win32
Wait, what do you mean exactly? Isn't WSL2 just a VM essentially? Don't you mean it'll run on Linux (which you also can run on WSL2)?
Or will it really only work with WSL2? I was excited as I thought it was just a Linux Workstation, but if WSL2 gets involved/is required somehow, then I need to run the other direction.
Yes, WSL2 is essentially a highly integrated VM. I think it's a bit of a joke to call Ubuntu WSL2, because it seems like most Ubuntu installs are either VMs for Windows PCs or on Azure Cloud.
I see the most direct competitor in the Mac Studio, though of course we will have to wait for reviews to gauge how fair that comparison is. The Studio does have a fairly large niche as a solid workstation, though, so I could see this being successful.
For general desktop use, as you described, nearly any piece of modern hardware, from a RasPI, to most modern smartphones with a dock, could realistically serve most people well.
The thing is, you need to serve both, low-end use cases like browsing, and high-end dev work via workstations, because even for the "average user", there is often one specific program on which they need to rely and which has limited support outside the OS they have grown up with. Course, there will be some programs like Desktop Microsoft Office which will never be ported, but still, Digitis could open the doors to some devs working natively on Linux.
A solid, compact, high-performance, yet low power workstation with a fully supported Linux desktop out of the box could bridge that gap, similar to how I have seen some developers adopt macOS over Linux and Windows since the release of the Studio and Max MacBooks.
Again, we have yet to see independent testing, but I would be surprised if anything of this size, simplicity, efficiency and performance was possible in any hardware configuration currently on the market.
I'm a bit surprised by the amount of comments comparing the cost to (often cheap) cloud solutions. Nvidia's value proposition is completely different in my opinion. Say I have a startup in the EU that handles personal data or some company secrets and wants to use an LLM to analyse it (like using RAG). Having that data never leave your basement sure can be worth more than $3000 if performance is not a bottleneck.
Heck, I'm willing to pay $3000 for one of these to get a good model that runs my requests locally. It's probably just my stupid ape brain trying to do finance, but I'm infinitely more likely to run dumb experiments with LLMs on hardware I own than I am while paying per token (to the point where I currently spend way more time with small local llamas than with Claude), and even though I don't do anything sensitive I'm still leery of shipping all my data to one of these companies.
This isn't competing with cloud, it's competing with Mac Minis and beefy GPUs. And $3000 is a very attractive price point in that market.
Even for established companies this is great. A tech company can have a few of these locally hosted and users can poll the company LLM with sensitive data.
The price seems relatively competitive even compared to other local alternatives like "build your own PC". I'd definitely buy one of this (or even two if it works really well) for developing/training/using models that currently run on cobbled together hardware I got left after upgrading my desktop.
> Having that data never leave your basement sure can be worth more than $3000 if performance is not a bottleneck
I get what you're saying, but there are also regulations (and your own business interest) that expects data redundancy/protection which keeping everything on-site doesnt seem to cover
There's a market not described here: bioinformatics.
The owner of the market, Illumina, already ships their own bespoke hardware chips in servers called DRAGEN for faster analysis of thousands of genomes. Their main market for this product is in personalised medicine, as genome sequencing in humans is becoming common.
Other companies like Oxford Nanopore use on-board GPUs to call bases (i.e., from raw electric signal coming off the sequencer to A, T, G, C) but it's not working as well as it could due to size and power constraints. I feel like this could be a huge game changer for someone like ONT, especially with cooler stuff like adaptive sequencing.
Other avenues of bioinformatics, such as most day-to-day analysis software, is still very CPU and RAM heavy.
This is, at least for now, a relatively small market. Illumina acquired the company manufacturing these chips for $100M. Analysis of a genome in the cloud generally costs below $10 on general purpose hardware.
It is of course possible that these chips enable analyses that are currently not possible/prohibited by cost, but at least for now, this will not be the limiting factor for genomics, but cost of sequencing (which is currently $400-500 per genome)
The bigger picture is that OpenAI o3/o4.. plus specialized models will blow open the doors to genome tagging and discovery, but that is still 1 to 3 years away for ASI to kick in.
While I kinda agree with you, I don't think we will ever find a meaningful way to throw genome sequencing data at LLMs. It's simple too much data.
I've worked in a project some years ago where we were using data from genome sequencing of a bacteria. Every sequenced sample was around 3GB of data and sample size was pretty small with only about 100 samples to study.
I think the real revolution will happen because code generation through LLMs will allow biologists to write 'good enough' code to transform, process and analyze data. Today to do any meaningful work with genome data you need a pretty competent bioinformatician, and they are a rare breed. Removing this bottleneck is what will allow us to move faster in this field.
In case you're curious, I googled. It runs this thing called "DGX OS":
"DGX OS 6 Features The following are the key features of DGX OS Release 6:
Based on Ubuntu 22.04 with the latest long-term Linux kernel version 5.15 for the recent hardware and security updates and updates to software packages, such as Python and GCC.
Includes the NVIDIA-optimized Linux kernel, which supports GPU Direct Storage (GDS) without additional patches.
Provides access to all NVIDIA GPU driver branches and CUDA toolkit versions.
Uses the Ubuntu OFED by default with the option to install NVIDIA OFED for additional features.
Nvidia just did what Intel/AMD should have done to threaten CUDA ecosystem - release a "cheap" 128GB local inference appliance/GPU. Well done Nvidia, and it looks bleak for any AI Intel/AMD efforts in the future.
I think you nailed it. Any basic SWOT analysis of NVidia’s position would surely have to consider something like this from a competitor - either Apple, who is already nibbling around the edges of this space, or AMD/Intel who could/should? be.
It’s obviously not guaranteed to go this route, but an LLM (or similar) on every desk and in every home is a plausible vision of the future.
Okay, so this is not a peripheral that you connect to your computer to run specialized tasks, this is a full computer running Linux.
It's a garden hermit. Imagine a future where everyone has one of those(not exactly this version but some future version), it lives with you it learns with you and unlike the cloud based SaaS AI you can teach it things immediately and diverge from the average to your advantage.
I'd love to own one, but doubt this will go beyond a very specific niche. Despite there being advantages, very few still operate their own Plex server over subscriptions to streaming services, and on the local front, I feel that the progress of hardware, alongside findings that smaller models can handle a variety of tasks quite well, will mean a high performance, local workstation of this type will have niche appeal at most.
I have this feeling that at some point it will be very advantageous to have personal AI because when you use something that everyone can use the output of this something becomes very low value.
Maybe it will still make sense to have your personal AI in some data center, but on the other hand, there is the trend of governments and mega corps regulating what you can do with your computer. Try going out of the basics, try to do something fun and edge case - it is very likely that your general availability AI will refuse to help you.
when it is your own property, you get the chance to overcome restrictions and develop the thing beyond the average.
As a result, having something that can do things that no other else can do and not having restrictions on what you can do with this thing can become the ultimate superpower.
Nvidia Jetson Nano, A SBC for "AI" debuted with already aging custom Ubuntu 18.04 and when 18.04 went EOL, Nvidia abandoned it completely without any further updates to its proprietary jet-pack or drivers and without them all of Machine Learning stack like CUDA, Pytorch etc. became useless.
I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel.
In general, Nvidia's relationship with Linux has been... complicated. On the one hand, at least they offer drivers for it. On the other, I have found few more reliable ways to irreparably break a Linux installation than trying to install or upgrade those drivers. They don't seem to prioritize it as a first class citizen, more just tolerate it the bare minimum required to claim it works.
https://youtube.com/watch?v=OF_5EKNX0Eg
But the impression I get from this device is that it's closer in spirit to the Grace Hopper/datacenter designs than it is the Tegra designs, due to both the naming, design (DGX style) and the software (DGX OS?) which goes on their workstation/server designs. They are also UEFI, and in those scenarios, you can (I believe?) use the upstream Linux kernel with the open source nvidia driver using whatever distro you like. In that case, this would be a much more "familiar" machine with a much more ordinary Linux experience. But who knows. Maybe GH200/GB200 need custom patches, too.
Time will tell, but if this is a good GPU paired with a good ARM Cortex design, and it works more like a traditional Linux box than the Jeton series, it may be a great local AI inference machine.
This is more like a micro-DGX then, for $3k.
I can only think of raspberry pi...
Compute is evolving way too rapidly to be setting-and-forgetting anything at the moment.
In 4 years, you'll be able to combine 2 of these to get 256gb unified memory. I expect that to have many uses and still be in a favorable form factor and price.
This isn't the 80s when compute doubled every 9 months, mostly on clock scaling.
Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.
Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.
Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).
Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star
Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.
Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.
Not just in the drag the feet communication. Getting the tech people a line problem.
Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.
https://www.raptorcs.com/content/base/products.html
This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!
That said, enthusiasts do help drive a lot of the improvements to the tech stack so if they start using this, it’ll entrench NVIDIA even more.
Surely a smaller market than gamers or datacenters for sure.
I mean, this is awfully close to being "Her" in a box, right?
Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.
No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).
The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.
[...]
AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.
Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.
I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.
They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.
i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.
The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.
No, they can’t. GPU databases are niche products with severe limitations.
GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.
I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.
That's the best time to buy. ;)
I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?
About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.
0. https://www.macstadium.com/blog/m4-mac-mini-review
1. https://www.apple.com/mac/compare/?modelList=Mac-mini-M4,Mac...
Apple M chips are pretty efficient.
On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.
Also I'm unfamiliar with macs is there really a MacBook pro with 256GB of RAM?
Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.
I don't think Digits will perform well either.
If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.
And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.
They are good for single batch inference and have very good tok/sec/user. ollama works perfectly in mac.
Did see vague claims of "starting at $3k", max 4TB nvme, and max 128GB ram.
I'd expect AMD Strix Halo (AI Max plus 395) to be reasonably competitive.
[0]: https://newsroom.arm.com/blog/arm-nvidia-project-digits-high...
NVidia works closely with Microsoft to develop their cards, all major features come first in DirectX, before landing on Vulkan and OpenGL as NVidia extensions, and eventually become standard after other vendors follow up with similar extensions.
Here's a link to the part of the keynote where he says this:
https://youtu.be/MC7L_EWylb0?t=7259
Wait, what do you mean exactly? Isn't WSL2 just a VM essentially? Don't you mean it'll run on Linux (which you also can run on WSL2)?
Or will it really only work with WSL2? I was excited as I thought it was just a Linux Workstation, but if WSL2 gets involved/is required somehow, then I need to run the other direction.
?
Yeah starting at $3,000. Surely a cheap desktop computer to buy for someone who just wants to surf the web and send email /s.
There is a reason why it is for "enthusiasts" and not for the general wider consumer or typical PC buyer.
For general desktop use, as you described, nearly any piece of modern hardware, from a RasPI, to most modern smartphones with a dock, could realistically serve most people well.
The thing is, you need to serve both, low-end use cases like browsing, and high-end dev work via workstations, because even for the "average user", there is often one specific program on which they need to rely and which has limited support outside the OS they have grown up with. Course, there will be some programs like Desktop Microsoft Office which will never be ported, but still, Digitis could open the doors to some devs working natively on Linux.
A solid, compact, high-performance, yet low power workstation with a fully supported Linux desktop out of the box could bridge that gap, similar to how I have seen some developers adopt macOS over Linux and Windows since the release of the Studio and Max MacBooks.
Again, we have yet to see independent testing, but I would be surprised if anything of this size, simplicity, efficiency and performance was possible in any hardware configuration currently on the market.
That end of the market is occupied by Chromebooks... AKA a different GNU/Linux.
This isn't competing with cloud, it's competing with Mac Minis and beefy GPUs. And $3000 is a very attractive price point in that market.
https://www.reddit.com/r/LocalLLaMA/
I get what you're saying, but there are also regulations (and your own business interest) that expects data redundancy/protection which keeping everything on-site doesnt seem to cover
The owner of the market, Illumina, already ships their own bespoke hardware chips in servers called DRAGEN for faster analysis of thousands of genomes. Their main market for this product is in personalised medicine, as genome sequencing in humans is becoming common.
Other companies like Oxford Nanopore use on-board GPUs to call bases (i.e., from raw electric signal coming off the sequencer to A, T, G, C) but it's not working as well as it could due to size and power constraints. I feel like this could be a huge game changer for someone like ONT, especially with cooler stuff like adaptive sequencing.
Other avenues of bioinformatics, such as most day-to-day analysis software, is still very CPU and RAM heavy.
It is of course possible that these chips enable analyses that are currently not possible/prohibited by cost, but at least for now, this will not be the limiting factor for genomics, but cost of sequencing (which is currently $400-500 per genome)
I've worked in a project some years ago where we were using data from genome sequencing of a bacteria. Every sequenced sample was around 3GB of data and sample size was pretty small with only about 100 samples to study.
I think the real revolution will happen because code generation through LLMs will allow biologists to write 'good enough' code to transform, process and analyze data. Today to do any meaningful work with genome data you need a pretty competent bioinformatician, and they are a rare breed. Removing this bottleneck is what will allow us to move faster in this field.
"DGX OS 6 Features The following are the key features of DGX OS Release 6:
Based on Ubuntu 22.04 with the latest long-term Linux kernel version 5.15 for the recent hardware and security updates and updates to software packages, such as Python and GCC.
Includes the NVIDIA-optimized Linux kernel, which supports GPU Direct Storage (GDS) without additional patches.
Provides access to all NVIDIA GPU driver branches and CUDA toolkit versions.
Uses the Ubuntu OFED by default with the option to install NVIDIA OFED for additional features.
Supports Secure Boot (requires Ubuntu OFED).
Supports DGX H100/H200."
It’s obviously not guaranteed to go this route, but an LLM (or similar) on every desk and in every home is a plausible vision of the future.
Deleted Comment
It's a garden hermit. Imagine a future where everyone has one of those(not exactly this version but some future version), it lives with you it learns with you and unlike the cloud based SaaS AI you can teach it things immediately and diverge from the average to your advantage.
Maybe it will still make sense to have your personal AI in some data center, but on the other hand, there is the trend of governments and mega corps regulating what you can do with your computer. Try going out of the basics, try to do something fun and edge case - it is very likely that your general availability AI will refuse to help you.
when it is your own property, you get the chance to overcome restrictions and develop the thing beyond the average.
As a result, having something that can do things that no other else can do and not having restrictions on what you can do with this thing can become the ultimate superpower.
In the past, in Europe, some wealthy people used to look after of a scholar living on their premises so they can ask them questions etc.