darkolorin (u/darkolorin) - Readit News

darkolorin commented on Show HN: We made our own inference engine for Apple Silicon github.com/trymirai/uzu... · Posted by u/darkolorin

giancarlostoro · a month ago

Hoping the author can answer, I'm still learning about how this all works. My understanding is that inference is "using the model" so to speak. How is this faster than established inference engines specifically on Mac? Are models generic enough that if you build e.g. an inference engine focused on AMD GPUs or even Intel GPUs, would they achieve reasonable performance? I always assumed because Nvidia is king of AI that you had to suck it up, or is it just that most inference engines being used are married to Nvidia?

I would love to understand how universal these models can become.

darkolorin · a month ago

Basically “faster” means better performance e.g. tokens/s without loosing quality (benchmarks scores for models). So when we say faster we provide more tokens per second than llama cpp. That means we effectively utilize hardware API available (for example we wrote our own kernels) to perform better.

Posted by u/darkolorin a month ago

Show HN: We made our own inference engine for Apple Silicon github.com/trymirai/uzu...

Deleted Comment

Posted by u/darkolorin 3 months ago

MBA Harvard Feels Undervalued old.reddit.com/r/MBA/comm...

Deleted Comment

Deleted Comment

Posted by u/darkolorin 5 months ago

Faster Inference Than MLX trymirai.com/blog/mirai-i...

darkolorin commented on 90T/s on my iPhone llama3.2-1B-fp16 reddit.com/r/LocalLLaMA/s... · Posted by u/darkolorin

darkolorin · 5 months ago

I made it! 90 t/s on my iPhone with llama1b fp16

We completely rewrite the inference engine and did some tricks. This is a summarization with llama 3.2 1b float16. So most of the times we do much faster than MLX. lmk in comments if you wanna test the inference and I’ll post a link.

Posted by u/darkolorin 5 months ago

90T/s on my iPhone llama3.2-1B-fp16 reddit.com/r/LocalLLaMA/s...

Posted by u/darkolorin 8 years ago

ONNX to CoreML github.com/prisma-ai/onnx...

u/darkolorin

KarmaCake day73August 5, 2014

About

prev CEO & co-founder Prisma & Capture, now CEO & co-founder LFG