jiayq84 (u/jiayq84) - Readit News

jiayq84 commented on What Happens When Hyperscalers and Clouds Buy Most Servers and Storage? nextplatform.com/2024/04/... · Posted by u/rbanffy

jiayq84 · a year ago

I do a startup called Lepton AI. We provide AI PaaS and fast AI runtimes as a service, so we keep a close eye on the IaaS supply chain. For the last few months we see supply chain getting better and better, so the business model that worked 6 months ago - "we have gpus, come buy barebone servers" no longer work. However, a bigger problem emerges. Probably a problem that could shake the industry: people don't know how to efficiently use these machines.

There are clusters of GPUs sitting idle because companies don't know how to use them. It's embarrassing to resell them too because that makes the images look bad to VCs, but secondary market is slowly happening.

Essentially, people want a PaaS or SaaS on top of the barebone machines.

For example, for the last couple months we were helping a customer to fully utilize their hundreds-of-card cluster. Their IaaS provider was new to the field. So we literally helped both sides to (1) understand infiniband and nccl and training code and stuff; (2) figure out control plane traffic; (3) built accelerated storage layer for training; (4) all kinds of subtle signals that needs attention. Do you know that a GPU can appear OK in nvidia-smi, but still encounter issues when you actually run a cuda or nccl kernel? That needs care. (5) fast software runtimes, like LLM runtime, finetuning script, and many others.

So I think AI PaaS and SaaS is going to be a very valuable (and big) market, after people come out of the frenzy of "grabbing gpus" - and now we need to use them efficiently.

jiayq84 commented on Show HN: Conversational search in less than 500 lines of Python search.lepton.run/... · Posted by u/jiayq84

jiayq84 · 2 years ago

Full open-source code with Apache license here: https://github.com/leptonai/search_with_lepton

jiayq84 commented on Show HN: Conversational search in less than 500 lines of Python search.lepton.run/... · Posted by u/jiayq84

jiayq84 · 2 years ago

Hi folks - Yangqing from Lepton here. The idea came from a coffee chat with a colleague on the question: how much of the RAG quality comes from the old good search engine, vs LLMs? And we figured out that the best way is to build a quick experiment and try it out. What we learned is that search engine results matter a lot, and probably more important than LLMs. We decided to put it up as a site and also open source the full code.

You can try plug in different search engines or even your own elastic interface, write different LLM prompts, pick different LLM models - a lot of ablation studies that could be tried out.

We appreciate your interest and happy Friday!

jiayq84 commented on Structural Decoding (Function Calling) for All Open LLMs leptonai.medium.com/struc... · Posted by u/jiayq84

jiayq84 · 2 years ago

General availability of the structured decoding capability for ALL open-source models hosted on Lepton AI. Simply provide the schema you want the LLM to produce, and all our model APIs will automatically produce outputs following the schema. In addition, you can host your own LLMs with structured decoding capability without having to finetune

jiayq84 commented on Super AI Creativity App Run with Local GPU on Windows/Linux/MacOS blog.hippoml.com/super-ai... · Posted by u/antinucleon

jiayq84 · 2 years ago

Super cool exhibition of what a local machine can already do in the AI frenzy!

jiayq84 commented on Show HN: Running LLMs in one line of Python without Docker lepton.ai/... · Posted by u/jiayq84

brucethemoose2 · 2 years ago

SDXL is indeed a monster to install and setup. The UIs are even worse.

IDK if the GPL license is compatible with your business, but I wonder if you could package Fooocus or Fooocus-MRE into a window? Its a hairy monster to install and run, but I've never gotten such consistently amazing results from a single prompt box + style dropdown box (including native HF diffusers and other diffusers-based frontends). The automatic augmentations to the SDXL pipine are amazing:

https://github.com/MoonRide303/Fooocus-MRE

jiayq84 · 2 years ago

Oh wow yeah, that is a beast. Let me give it a shot.

jiayq84 commented on Show HN: Running LLMs in one line of Python without Docker lepton.ai/... · Posted by u/jiayq84

swyx · 2 years ago

congrats yangqing et al! i was really impressed by your llama2 demo https://llama2.lepton.run/ where you showed that you were the "fastest llama runners" (https://twitter.com/swyx/status/1695183902770614724). definitely needed for model hosting infra.

jiayq84 · 2 years ago

Thanks so much for the warm words!