Neither does Replit, and your Replit email has to match your GitHub if you want the two to talk. I guess this is what running and not walking looks like.
I know a lot depends on architecture and number representation, but do people have a sense for how big a compute cluster is needed to train these classes of models from 1.5B, 3B, 7B, 13B, 70B?
Didn’t Meta say they trained on 2k A100s for LLama 2?
i stop using replit after it started taking minutes to install crates...
It started out so awesome, professor started using it instead of eclipse, it was super easy to teach on, and group projects were made easy for pair programming.
Crates is especially challenging -- the build process is very expensive on all resources (cpu, ram, and disk), and packages are very hard (impossible?) to cache. It works super well on the Pro plan.
Perhaps there is some weirdness if you’ve signed up with GitHub. Feel free to email me and we can take a look: amjad@repl.it