Nvidia's Project Digits is a 'personal AI supercomputer'

I'm looking at my Jetson Nano in the corner which is fulfilling its post-retirement role as a paper weight because Nvidia abandoned it in 4 years.

Nvidia Jetson Nano, A SBC for "AI" debuted with already aging custom Ubuntu 18.04 and when 18.04 went EOL, Nvidia abandoned it completely without any further updates to its proprietary jet-pack or drivers and without them all of Machine Learning stack like CUDA, Pytorch etc. became useless.

I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel.

lolinder · 8 months ago

This is a very important point.

In general, Nvidia's relationship with Linux has been... complicated. On the one hand, at least they offer drivers for it. On the other, I have found few more reliable ways to irreparably break a Linux installation than trying to install or upgrade those drivers. They don't seem to prioritize it as a first class citizen, more just tolerate it the bare minimum required to claim it works.

dotancohen · 8 months ago

  > Nvidia's relationship with Linux has been... complicated.

For those unfamiliar with Linus Torvalds' two-word opinion of Nvidia:

https://youtube.com/watch?v=OF_5EKNX0Eg

stabbles · 8 months ago

Now that the majority of their revenue is from data centers instead of Windows gaming PCs, you'd think their relationship with Linux should improve or already has.

FuriouslyAdrift · 8 months ago

The Digits device runs the same nVidia DGX OS (nVidia custom Ubuntu distro) that they run on their cloud infra.

vladslav · 8 months ago

I've had a similar experience, my Xavier NX stopped working after the last update and now it's just collecting dust. To be honest, I've found the Nvidia SBC to be more of a hassle than it's worth.

busterarm · 8 months ago

Xavier AGX owner here to report the same.

aseipp · 8 months ago

The Orin series and later use UEFI and you can apparently run upstream, non-GPU enabled kernels on them. There's a user guide page documenting it. So I think it's gotten a lot better, but it's sort of moot because the non-GPU thing is because the JetPack Linux fork has a specific 'nvgpu' driver used for Tegra devices that hasn't been unforked from that tree. So, you can buy better alternatives unless you're explicitly doing the robotics+AI inference edge stuff.

But the impression I get from this device is that it's closer in spirit to the Grace Hopper/datacenter designs than it is the Tegra designs, due to both the naming, design (DGX style) and the software (DGX OS?) which goes on their workstation/server designs. They are also UEFI, and in those scenarios, you can (I believe?) use the upstream Linux kernel with the open source nvidia driver using whatever distro you like. In that case, this would be a much more "familiar" machine with a much more ordinary Linux experience. But who knows. Maybe GH200/GB200 need custom patches, too.

Time will tell, but if this is a good GPU paired with a good ARM Cortex design, and it works more like a traditional Linux box than the Jeton series, it may be a great local AI inference machine.

moondev · 8 months ago

AGX also has UEFI firmware which allows you to install ESXi. Then you can install any generic EFI arm64 iso in a VM with no problems, including windows.

halJordan · 8 months ago

It runs their dgx os and Jensen specifically said it would be a full part if their hw stack

startupsfail · 8 months ago

If this is DGX OS, then yes, this is what you’ll find installed on their 4-cards workstations.

This is more like a micro-DGX then, for $3k.

yoyohello13 · 8 months ago

And unless there is some expanded maintenance going on, 22.04 is EOL in 2 years. In my experience, vendors are not as on top of security patches as upstream. We will see, but given NVIDIA's closed ecosystem, I don't have high hopes that this will be supported long term.

saidinesh5 · 8 months ago

Is there any recent, powerful SBC with fully upstream kernel support?

I can only think of raspberry pi...

sliken · 8 months ago

rk3588 is pretty close, I believe it's usable today, just missing a few corner cases with HDMI or some such. I believe that last patches are either pending or already applied to an RC.

shadowpho · 8 months ago

Radha but that’s n100 aka x64

msh · 8 months ago

The odroid H series. But that packs a x86 cpu.

nickpsecurity · 8 months ago

If its stack still works, you might be able to sell or donate it to a student experimenting. They can still learn quite a few things with it. Maybe even use it for something.

sangnoir · 8 months ago

Using outdated tensorflow (v1 from 2018) or outdated PyTorch makes learning harder than it need to be, considering most resources online use much newer versions of the frameworks. If you're learning the fundamentals and working from first principle and creating the building blocks yourself, then it adds to the experience. However, most most people just want to build different types of nets, and it's hard to do when the code won't work for you.

tcdent · 8 months ago

If you're expecting this device to stay relevant for 4 years you are not the target demographic.

Compute is evolving way too rapidly to be setting-and-forgetting anything at the moment.

tempoponet · 8 months ago

Today I'm using 2x 3090's which are over 4 years old at this point and still very usable. To get 48gb vram I would need 3x 5070ti - still over $2k.

In 4 years, you'll be able to combine 2 of these to get 256gb unified memory. I expect that to have many uses and still be in a favorable form factor and price.

mrybczyn · 8 months ago

Eh? By all indications compute is now evolving SLOWER than ever. Moore's Law is dead, Dennard scaling is over, the latest fab nodes are evolutionary rather than revolutionary.

This isn't the 80s when compute doubled every 9 months, mostly on clock scaling.

I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!

rbanffy · 8 months ago

This is something every company should make sure they have: an onboarding path.

Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.

Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.

Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).

nimish · 8 months ago

1000% all these ai hardware companies will fail if they don't have this. You must have a cheap way to experiment and develop. Even if you want to only sell a $30000 datacenter card you still need a very low cost way to play.

Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star

AtlasBarfed · 8 months ago

It really mystifies me that Intel AMD and other hardware companies obviously Nvidia in this case Don't either have a consortium or each have their own in-house Linux distribution with excellent support.

Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.

Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.

Not just in the drag the feet communication. Getting the tech people a line problem.

Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.

p_ing · 8 months ago

Raptor Computing provides POWER9 workstations. They're not cheap, still use last-gen hardware (DDR4/PCIe 4 ... and POWER9 itself) but they're out there.

https://www.raptorcs.com/content/base/products.html

UncleOxidant · 8 months ago

There were Phi cards, but they were pricey and power hungry (at the time, now current GPU cards probably meet or exceed the Phi card's power consumption) for plugging into your home PC. A few years back there was a big fire sale on Phi cards - you could pick one up for like $200. But by then nobody cared.

sheepscreek · 8 months ago

The developers they are referring to aren’t just enthusiasts; they are also developers who were purchasing SuperMicro and Lambda PCs to develop models for their employers. Many enterprises will buy these for local development because it frees up the highly expensive enterprise-level chip for commercial use.

This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!

HarHarVeryFunny · 8 months ago

The 1 PetaFLOP spec and 200GB model capacity specs are for FP4 (4-bit floating point), which means inference not training/development. It's still be a decent personal development machine, but not for that size of model.

numba888 · 8 months ago

This looks like a bigger brother of Orin AGX, which has 64GB of RAM and runs smaller LLMs. The question will be power and performance vs 5090. We know price is 1.5x

stogot · 8 months ago

How does it run 400B models across two? I didn’t see that in the article

dagmx · 8 months ago

I think the enthusiast side of things is a negligible part of the market.

That said, enthusiasts do help drive a lot of the improvements to the tech stack so if they start using this, it’ll entrench NVIDIA even more.

Karupan · 8 months ago

I’m not so sure it’s negligible. My anecdotal experience is that since Apple Silicon chips were found to be “ok” enough to run inference with MLX, more non-technical people in my circle have asked me how they can run LLMs on their macs.

Surely a smaller market than gamers or datacenters for sure.

qwertox · 8 months ago

You could have said the same about gamers buying expensive hardware in the 00's. It's what made Nvidia big.

gr3ml1n · 8 months ago

AMD thought the enthusiast side of things was a negligible side of the market.

epolanski · 8 months ago

If this is gonna be widely used by ML engineers, in biopharma, etc and they land 1000$ margins at half a million sales that's half a billion in revenue, with potential to grow.

option · 8 months ago

today’s enthusiast, grad student, hacker is tomorrow’s startup founder, CEO, CTO or 10x contributor in large tech company

VikingCoder · 8 months ago

If I were NVidia, I would be throwing everything I could at making entertainment experiences that need one of these to run...

I mean, this is awfully close to being "Her" in a box, right?

computably · 8 months ago

Yeah, it's more about preempting competitors from attracting any ecosystem development than the revenue itself.

bloomingkales · 8 months ago

Jensen did say in recent interview, paraphrasing, “they are trying to kill my company”.

Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.

No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).

T-A · 8 months ago

From https://www.tomshardware.com/pc-components/cpus/amds-beastly...

The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.

[...]

AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.

Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.

nomel · 8 months ago

> since NVIDIA is going head on with Apple

I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.

JoshTko · 8 months ago

Which interview was this?

hkgjjgjfjfjfjf · 8 months ago

You missed the Ryzen hx ai pro 395 product announcement

llm_trw · 8 months ago

From the people I talk to the enthusiast market is nvidia 4090/3090 saturated because people want to do their fine tunes also porn on their off time. The Venn diagram of users who post about diffusion models and llms running at home is pretty much a circle.

dist-epoch · 8 months ago

Not your weights, not your waifu

Tostino · 8 months ago

Yeah, I really don't think the overlap is as much as you imagine. At least in /r/localllama and the discord servers I frequent, the vast majority of users are interested in one or the other primarily, and may just dabble with other things. Obviously this is just my observations...I could be totally misreading things.

numba888 · 8 months ago

> I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!

They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.

qwertox · 8 months ago

This is somewhat similar to what GeForce was to gamers back in the days, but for AI enthusiasts. Sure, the price is much higher, but at least it's a completely integrated solution.

Karupan · 8 months ago

Yep that's what I'm thinking as well. I was going to buy a 5090 mainly to play around with LLM code generation, but this is a worthy option for roughly the same price as building a new PC with a 5090.

trhway · 8 months ago

>enthusiast AI dev segment

i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.

The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.

tatersolid · 8 months ago

> these databases can be executed on GPU with a significant performance gain vs. CPU

No, they can’t. GPU databases are niche products with severe limitations.

GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.

tarsinge · 8 months ago

> I sure wished I held some Nvidia stocks

I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.

lioeters · 8 months ago

> hope we get a crash

That's the best time to buy. ;)

paxys · 8 months ago

“Bigger” in what sense? For AI? Sure, because this an AI product. 5x series are gaming cards.

a________d · 8 months ago

Not expecting this to compete with the 5x series in terms of gaming; But it's interesting to note the increase in gaming performance Jensen was speaking about with Blackwell was larger related to inferenced frames generated by the tensor cores.

I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?

Karupan · 8 months ago

Bigger in the sense of the announcements.

AuryGlenz · 8 months ago

Eh. Gaming cards, but also significantly faster. If the model fits in the VRAM the 5090 is a much better buy.

GaryNumanVevo · 8 months ago

I bet $100k on NVIDIA stocks ~7 years ago, just recently closed out a bunch of them

axegon_ · 8 months ago

> they seem to be doing everything right in the last few years

About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.

sliken · 8 months ago

Use a filesystem that snapshots AND do a complete backup.

mycall · 8 months ago

Buy a second system which you can touch?

technofiend · 8 months ago

Will there really be a mac mini wirh Max or Ultra CPUs? This feels like somewhat of an overlap with the Mac Studio.

adolph · 8 months ago

There will undoubtably be a Mac Studio (and Mac Pro?) bump to M4 at some point. Benchmarks [0] reflect how memory bandwidth and core count [1] compare to processor improvements. Granted, ymmv to your workload.

0. https://www.macstadium.com/blog/m4-mac-mini-review

1. https://www.apple.com/mac/compare/?modelList=Mac-mini-M4,Mac...

wslh · 8 months ago

The nVidia price is closer (USD 3k) to a top Mac mini but I trust Apple more for the end-to-end support from hardware to apps than nVidia. Not an Apple fanboy but an user/dev, and I don't think we realize what Apple really achieved, industrially speaking. The M1 was launched in late 2020.

croes · 8 months ago

Did they say anything about power consumption?

Apple M chips are pretty efficient.

behringer · 8 months ago

Not only that, but it should help free up the gpus for the gamers.

puppymaster · 8 months ago

it eats into all NVDA consumer-facing clients no? I can see why openai and etc are looking for alternative hardware solution to train their next model.

iKevinShah · 8 months ago

I can confirm this is the case (for me).

informal007 · 8 months ago

I would like to have Mac as my personal computer and digits as service to run llm.

csomar · 8 months ago

Am I the only one disappointed by these? They cost roughly half the price of a macbook pro and offer hmm.. half the capacity in RAM. Sure speed matters in AI, but what do I do with speed when I can't load a 70b model.

On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.

gnabgib · 8 months ago

Are you, perhaps, commenting on the wrong thread? Project Digits is a $3k 128GB computer.. the best your your $5K MBP can have for ram is.. 128GB.

rictic · 8 months ago

Hm? They have 128GB of RAM. Macbook Pros cap out at 128GB as well. Will be interesting to see how a Project Digits machine performs in terms of inference speed.

macawfish · 8 months ago

Then buy two and stack them!

Also I'm unfamiliar with macs is there really a MacBook pro with 256GB of RAM?

maniroo · 8 months ago

Bro we can connect two ProjectDigits as well. I was only looking at the M4 macbook because 128gb unified memory. Now this beast can cook better LLMs at just 3K with 4TB SSD too. M4 Macbook Max (128 GB unified ram and 4TB Storage) is 5999. So, No more apple for me. I will just get the Digits. And can create a workstation as well.

doctorpangloss · 8 months ago

What slice?

Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.

I don't think Digits will perform well either.

If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.

Karupan · 8 months ago

They are perfectly fine for certain people. I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious and would prefer using existing devices rather than pay for subscriptions where possible.

And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.

YetAnotherNick · 8 months ago

> Also, macOS devices are not very good inference solutions

They are good for single batch inference and have very good tok/sec/user. ollama works perfectly in mac.