Docker Model Runner - Readit News

I'm looking at using OCI at $DAY_JOB for model distribution for fleets of machines also so it's good to see it's getting some traction elsewhere.

OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.

If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.

Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.

remram · 4 months ago

Kubernetes added "image volumes" so this will probably become more and more common: https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-...

jpgvm · 4 months ago

That is exactly the feature we are using, right now you need to be on a beta release of containerd but before long it should be pretty widespread. In combination with lazy pull (eStargz) it's a pretty compelling implementation.

mdaniel · 4 months ago

Damn, that's handy. I now wonder how much trouble making a CSI driver that does this would be for backporting to the 1.2x clusters (since I don't think that kubernetes does backports for anything)

wofo · 4 months ago

I've been pretty disappointed with eStargz performance, though... Do you have any numbers you can share? All over the internet people refer to numbers from 10 years ago, from workloads that don't seem realistic at all. In my experiments it didn't provide a significant enough speedup.

(I ended up developing an alternative pull mechanism, which is described in https://outerbounds.com/blog/faster-cloud-compute though note that the article is a bit light on the technical details)

jpgvm · 4 months ago

In our case some machines would need to access less than 1% of the image size but being able to have an image with the entire model weights as a single artifact is an important feature in and of itself. In our specific scenario even if eStargz would be slow by filesystem standards it's competing with network transfer anyway so if it's the same order of magnitude as rsync that will do.

I don't have any perf numbers I can share but I can say we see ~30% compression with eStargz which is already a small win atleast heh.

Looks exactly like ollama but built into Docker desktop? Anyone know of any differences?

blitzar · 4 months ago

Hear me out here ... it's like docker, but with Ai <pause for gasps and applause>.

Seems fair to raise 1bn at a valuation of 100bn. (Might roll the funds over into pitching Kubernetes, but with Ai next month)

danparsonson · 4 months ago

What they really need is a Studio Ghibli'd version of their logo

ammo1662 · 4 months ago

They are using OCI artifacts to package models, so you can use your own registry to host these models internally. However, I just can't see any improvement comparing with a simple FTP server. I don't think the LLM models can adopt hierarchical structures like Docker, and thus cannot leverage the benefits of layered file systems, such as caching and reuse.

remram · 4 months ago

I think ollama uses OCI too? At least it's trying to. https://github.com/ollama/ollama/issues/914#issuecomment-195...

jesserwilliams · 4 months ago

It's not the only one using OCI to package models. There's a CNCF project called KitOps (https://kitops.org) that has been around for quite a bit longer. It solves some of the limitations that using Docker has, one of those being that you don't have to pull the entire project when you want to work on it. Instead, you can pull just the data set, tuning, model, etc.

krick · 4 months ago

They imply it should be somehow optimized for apple silicon, but, yeah, I don't understand what this is. If docker can use GPU, well, it should be able to use GPU in any container that makes use of it properly. If (say) ollama as an app doesn't use it properly, but they figured a way to do it better, it would make more sense to fix ollama. I have no idea why this should be a different app than, well, the very docker daemon itself.

mappu · 4 months ago

All that work (AGX acceleration...) is done in llama.cpp, not ollama. Ollama's raison d'être is a docker-style frontend to llama.cpp, so it makes sense that Docker would encroach from that angle.

gclawes · 4 months ago

Aren't some of the ollama guys ex-Docker guys?

rockwotj · 4 months ago

yes

israrkhan · 4 months ago

Be aware of licensing restrictions. Docker Desktop is free for personal use, but it requires a paid license if you work for an organization sized 250+. This feature seems to be available in Docker Desktop only.

francesco-corti · 4 months ago

Note: I'm part of the team developing this feature.

Soon (end of May, according to the current roadmap) this feature will also be available with the Docker Engine (so not only as part of Docker Desktop).

As a reminder, Docker Engine is the Community Edition, Open Source and free for everyone.

cmiles74 · 4 months ago

My understanding has always been that Docker Engine was only available directly on Linux. If you are running another operating system then you will need to run Docker Desktop (which, in turn, runs a Docker Engine instance in a VM).

This comment kind of makes it sound like maybe you can run Docker Engine directly on these operating systems (MacOS, Windows, etc.), is that the case?

daveguy · 4 months ago

Is it still the case that you can't run Docker Engine Community Edition on a windows machine?

leowoo91 · 4 months ago

I don't understand why add another domain-specific command to a container manager and go out of scope for what the tool was designed for at first place.

saidinesh5 · 4 months ago

The main benefit I see for cloud platforms: caching/co-hosting various services based on model instead of (model + user's API layer on top).

For the end user, it would be one less deployment headache to worry about: not having to package ollama + the model into docker containers for deployment. Also a more standardized deployment for hardware accelerated models across platforms.

anentropic · 4 months ago

gotta have an AI strategy to report to the board

kiview · 4 months ago

(disclaimer: I'm leading the Docker Model Runner team at Docker)

It's fine to disagree of course, but we envision Docker as a tool that has a higher abstraction level than just container management. That's why having a new domain-specific command (that also uses domain-specific technology that is independent from containers, at least on some platform targets) is a cohesive design choice from our perspective.

tgmatt · 4 months ago

Seems like https://kitops.org/ but fewer features.

Havoc · 4 months ago

Can’t say I'm a fan of packaging models as docker images. Feels forced - a solution in search of a problem.

The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there

We are not packaging models as Docker images, since indeed that is the wrong fit and comes with all kinds of technical problems. It also feels wrong to pure package data (which models are) into an image, which generally expects to be a runnable artifact.

That's why we decided to use OCI Artifacts, and specify our own OCI Artifact subset that is better suited for the use case. The spec and implementation is OSS, you can check it out here: https://github.com/docker/model-spec

gardnr · 4 months ago

> GPU acceleration on Apple silicon

There is at least one benefit. I'd be interested to see what their security model is.

Is this really a Docker feature, though? llama.cpp provides acceleration on Apple hardware, I guess you could create a Docker image with llama.cpp and an LLLM model and have mostly this feature.

avs733 · 4 months ago

I'm going to take a contrarian perspective to the theme of comments here...

There are currently very good uses for this and likely going to be more. There are increasing numbers of large generative AI models used in technical design work (e.g., semiconductor rules based design/validation, EUV mask design, design optimization). Many/most don't need to run all the time. Some have licensing that is based on length of time running, credits, etc. Some are just huge and intensive, but not run very often in the design glow. Many are run on the cloud but industrial customers are remiss to run them on someone else's cloud

Being able to have my GPU cluster/data center be running a ton of different and smaller models during the day or early in the design, and then be turned over to a full CFD or validation run as your office staff goes home seems to be to be useful. Especially if you are in anyway getting billed by your vendor based on run time or similar. It can mean a more flexible hardware investment. The use casae here is going to be Formula 1 teams, silicon vendors, etc. - not pure tech companies.

superb_dev · 4 months ago

Looks like Docker is feeling left out of the GenAI bubble. It’s a little late…

bsenftner · 4 months ago

I wonder if the adult kids of some Docker execs own Macs, and they make it. Why on Earth make this not for the larger installed OSes, you know, the ones running Docker in production?

We decided to start with Apple silicon Macs, because they provide one of the worst experiences of running LLMs in a containerized form, while at the same time having very capable hardware, so it felt like a very sad situation for Mac users (because of the lack of GPU access within containers).

And of course we understand who our users are, so believe me when I say, macOS users on Apple silicon make up a significant portion of our user case, else we would not have started with it.

In production environments on Docker CE, you can already mount the GPUs, so while the UX is not great, it is not a blocker.

However, we have first class support for Docker Model Runner within Docker CE on our roadmap and we hope it comes sooner rather than later ;) It will also be purely OSS, so no worries there.

pridkett · 4 months ago

Because the ones running Docker in production aren’t paying the license fees they make you pay to use Docker Desktop.

amouat · 4 months ago

I'm pretty sure that's in development, it's just more difficult.