lcastricato (u/lcastricato)

lcastricato commented on Waypoint-1: Real-Time Interactive Video Diffusion from Overworld huggingface.co/blog/waypo... · Posted by u/avaer

lcastricato · 17 days ago

BTW, there is a gradio space here:

https://huggingface.co/spaces/Overworld/waypoint-1-small

And our streamed version:

https://overworld.stream

lcastricato commented on Waypoint-1: Real-Time Interactive Video Diffusion from Overworld huggingface.co/blog/waypo... · Posted by u/avaer

dsrtslnd23 · 17 days ago

10,000 hours training data seems quite low for a world model?

lcastricato · 17 days ago

60fps training data goes a long way ;)

lcastricato commented on Waypoint-1: Real-Time Interactive Video Diffusion from Overworld huggingface.co/blog/waypo... · Posted by u/avaer

dsrtslnd23 · 17 days ago

great work! Will the medium model be also open/apache-licensed?

lcastricato · 17 days ago

Medium is going to bc cc by sa nc 4.0. We may reevaluate in the future and make it more lenient. Small is meant to be the model for builders and hackers.

lcastricato commented on Waypoint-1: Real-Time Interactive Video Diffusion from Overworld huggingface.co/blog/waypo... · Posted by u/avaer

lcastricato · 17 days ago

Hi,

Louis here. CEO of overworld. Happy to answer questions :)

lcastricato commented on Stability AI releases StableVicuna, a RLHF LLM Chatbot stability.ai/blog/stablev... · Posted by u/davidbarker

nickthegreek · 3 years ago

This project was probably extremely affordable for them and there is value in these tests. They can play around with these different fine-tuning techniques and put them out there for people to mess with while gaining insight into better methodologies for when they spend the big time and cash on larger models.

lcastricato · 3 years ago

I'm team lead at Carper. This is correct. It's also just a project one of our engineers did over the course of a few days, so very low risk.

We will swap the base model out for StableLM as soon as we can and iterate from there. We just thought the community would enjoy this research artifact :)

lcastricato commented on DeepSpeed Chat: Easy, fast and affordable RLHF training of ChatGPT-like models github.com/microsoft/Deep... · Posted by u/quantisan

lcastricato · 3 years ago

FYI they don't compare to trlX bc trlX is roughly just as fast. Similarly, they put trl in the worst light possible (trl is actually must faster than they claim.)

lcastricato · 3 years ago

We're doing some stuff with NVIDIA right now that I can't talk about yet. Super exciting though.

lcastricato commented on DeepSpeed Chat: Easy, fast and affordable RLHF training of ChatGPT-like models github.com/microsoft/Deep... · Posted by u/quantisan

summarity · 3 years ago

Also see the example repo README: https://github.com/microsoft/DeepSpeedExamples/tree/master/a...

> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems

> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT

(disclaimer: MSFT/GH employee, not affiliated with this project)

lcastricato · 3 years ago

FYI they don't compare to trlX bc trlX is roughly just as fast. Similarly, they put trl in the worst light possible (trl is actually must faster than they claim.)