Readit News logoReadit News
sherlockxu commented on LLM Inference Handbook   bentoml.com/llm/... · Posted by u/djhu9
qrios · 2 months ago
Thanks for putting this together! From now on I only need one link to point interested ones to learn.

Only one suggestion: On page "OpenAI-compatible API" it would be great to have also a simple example for the pure REST call instead of the need to import the OpenAI package.

sherlockxu · 2 months ago
Thanks. We just added the example.
sherlockxu commented on LLM Inference Handbook   bentoml.com/llm/... · Posted by u/djhu9
criemen · 2 months ago
Thanks a lot for putting this together!

I have a question. In https://github.com/bentoml/llm-inference-in-production/blob/..., you have a single picture that defines TTFT and ITL. That does not match my understanding (but you guys know probably more than me): In the graphic, it looks like that the model is generating 4 tokens T0 to T3, before outputting a single output token.

I'd have expected that picture for ITL (except that then the labeling of the last box is off), but for TTFT, I'd have expected that there's only a single token T0 from the decode step, that then immediately is handed to detokenization and arrives as first output token (if we assume a streaming setup, otherwise measuring TTFT makes little sense).

sherlockxu · 2 months ago
Thanks. We have updated the image to make it more accurate.
sherlockxu commented on LLM Inference Handbook   bentoml.com/llm/... · Posted by u/djhu9
sherlockxu · 2 months ago
Hi everyone. I'm one of the maintainers of this project. We're both excited and humbled to see it on Hacker News!

We created this handbook to make LLM inference concepts more accessible, especially for developers building real-world LLM applications. The goal is to pull together scattered knowledge into something clear, practical, and easy to build on.

We’re continuing to improve it, so feedback is very welcome!

GitHub repo: https://github.com/bentoml/llm-inference-in-production

u/sherlockxu

KarmaCake day418September 20, 2022
About
Hello world
View Original