It's an interesting project. I'll totally accept "for fun" or "because" but I'm interested in the why. Even if just a very narrow thing, is there any benefits we would get from using a ML based OS? I mean it is definitely cool and that has merit in its own right, but people talk about Neural OSs and I just don't "get it"
Unlike other ML-based OS projects (such as Gemini OS, which generates code and renders traditional UIs), NeuralOS directly generates every pixel. While this makes it susceptible to hallucination, in my opinion the other side of hallucination is full flexibility. In the future, I imagine operating systems running entirely (or mostly) on GPUs, adapting to user intent on the fly rather than relying on pre-designed menus and options.
There is no underlying kernel, no function calls, no program execution, and no networking. Everything is purely visual and imagined by the neural model. You can think of it as a safe, isolated container where nothing can actually run or cause harm, since no real code executes. It's essentially an interactive video simulation, conditioned entirely on user inputs.
Note: The Space is intended as a template, so please duplicate it and run with your own GPU for a better experience. (The default Space has only one worker.)
Recommended GPU: At least an L40, ideally an A100-large. (The original demo at neural-os.com used H100s.)
All code and models are self-contained in the huggingface space.
Could you talk about your hopes for the future on this project? What are your thoughts on having a more simplified interface which could combine inputs in a more abstract way, or are you only interested in simulating a traditional OS?
Thanks again.
PS the waiting time while firefox “loads” made me laugh. I presume this is also simulated.
However, my real dream behind this project is to blur the boundaries across applications, not just simulate traditional OS interactions. For example, imagine converting a movie we're watching directly into an interactive video game, or instantly changing the interface of an app (like Signal) to something we prefer (like Facebook Messenger) on the fly.
Of course, the current training data severely limits what's achievable today. But looking forward, I envision combining techniques from controllable text generation (such as Zhiting Hu's "Toward Controlled Generation of Text" paper) or synthesizing new interaction data to achieve greater and customization. I believe this is a promising path toward creating truly generative and personalized interfaces.
Thanks again for your interest!
although i wasn't able to really use it due to lag
I coded up the demo myself and didn't anticipate how disruptive the intermittent warning messages about waiting users would become. The demo is quite resource-intensive: each session currently requires its own H100 GPU, and I'm already using a dispatcher-worker setup with 8 parallel workers. Unfortunately, demand exceeded my setup, causing significant lag and I had to limit sessions to 60 more seconds when others are waiting. Additionally, the underlying diffusion model itself is slow to run, resulting in a frame rate typically below 2 fps, further compounded by network bottlenecks.
As for model capabilities, NeuralOS is indeed quite limited at this point (as acknowledged in my paper abstract). That's why the demo interactions shown in my tweet were minimal (opening Firefox, typing a URL).
Overall, this is meant as a proof-of-concept demonstrating the potential of generative, neural-network-powered GUIs. It's fully open-source, and I hope others can help improve it going forward!
Thanks again for the honest feedback.
See my tweet for more details: https://x.com/yuntiandeng/status/1944802154314916331
- Covers a wide range of topics and languages, all from actual users in the wild.
- Includes 122K conversations from reasoning models (o1-preview and o1-mini) which are long, often involving complex problem solving, and very costly to collect.
- 2.5M conversations from GPT-4o.
Links:
- Non-toxic version: https://hf.co/datasets/allenai/WildChat-4.8M
- Full version (gated): https://hf.co/datasets/allenai/WildChat-4.8M-Full
- Exploration tool: https://wildvisualizer.com