rshemet (u/rshemet) - Readit News

rshemet commented on Launch HN: Cactus (YC S25) – AI inference on smartphones github.com/cactus-compute... · Posted by u/HenryNdubuaku

MrDrMcCoy · 6 months ago

Question: can this utilize multiple forms of compute at once? Many phones have both GPUs that are capable of doing compute as well as NPUs, and that number will only increase. I'm sure it would be challenging, but that's a lot of performance to leave on the table if it can't do so already.

I am very curious what could be done with your impressive optimization on an rk3588, since it has pretty decent bits in all 3 categories, and am now seriously considering a Radxa Orion to play with this on :)

One more if you have a moment: will this be limited to text generation, or will it have audio and image capabilities as well? Would be neat to enable not only image generation, but also explore voice recognition, translation, computer vision, as well as image editing and enhancement features in mobile apps beyond what the big players daign to give us :)

rshemet · 6 months ago

Yes! Cactus is optimized for mobile CPU inference and we're finishing internal testing of hybrid kernels that use the NPU, as well other chips.

We don't advise using GPUs on smartphones, since they're very energy-inefficient. Mobile GPU inference is actually the main driver behind the stereotype that "mobile inference drains your battery and heats up your phone".

Wrt to your last question – the short answer is yes, we'll have multimodal support. We currently support voice transcription and image understanding. We'll be expanding these capabilities to add more models, voice synthesis, and much more.

rshemet commented on Launch HN: Cactus (YC S25) – AI inference on smartphones github.com/cactus-compute... · Posted by u/HenryNdubuaku

nicktay · 6 months ago

I built apps using Flutter and this project seems to make it possible to use models directly in app instead of cloud APIs. Curious about the commercial license here. How is the trade off between pricing and performance?

rshemet · 6 months ago

indeed, this is exactly the goal! The license grants rights to commercial use, unlocks additional hardware acceleration, includes cloud telemetry, and offers significant savings over using cloud APIs.

In our deployments, we've seen open source models rival and even outperform lower-tier cloud counterparts. Happy to share some benchmarks if you like.

Our pricing is on a per-monthly-active-device basis, regardless of utilization. For voice-agent workflows, you typically hit savings as soon as you process over ≈2min of daily inference.

rshemet commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

riedel · 7 months ago

Would be great to have it included in the Google Edge AI gallery android app.

rshemet · 7 months ago

you can run it in Cactus Chat (download from the Play Store)

rshemet commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

lemonish97 · 7 months ago

I use PocketPal. Can run any gguf model off hf.

rshemet · 7 months ago

you can also run it on Cactus - either in Cactus Chat from the App/Play Store or by using the Cactus framework to integrate it into your own app

rshemet commented on Show HN: OWhisper – Ollama for realtime speech-to-text docs.hyprnote.com/owhispe... · Posted by u/yujonglee

rshemet · 7 months ago

THIS IS THE BOMB!!! So excited for this one. Thanks for putting cool tech out there.

rshemet commented on I want everything local – Building my offline AI workspace instavm.io/blog/building-... · Posted by u/mkagenius

rshemet · 7 months ago

if you ever end up trying to take this in the mobile direction, consider running on-device AI with Cactus –

https://cactuscompute.com/

Blazing-fast, cross-platform, and supports nearly all recent OS models.

rshemet commented on Show HN: Cactus – Ollama for Smartphones github.com/cactus-compute... · Posted by u/HenryNdubuaku

pogue · 8 months ago

I would like to see it as an app, tbh! If I could run it as an APK with a nice GUI interface for picking different models to run, that would be a killer feature.

rshemet · 8 months ago

https://play.google.com/store/apps/details?id=com.rshemetsub...

rshemet commented on Show HN: Cactus – Ollama for Smartphones github.com/cactus-compute... · Posted by u/HenryNdubuaku

nunobrito · 8 months ago

I've installed the Android version from https://play.google.com/store/apps/details?id=com.rshemetsub...

It is fantastic. Compared to another program I had installed a year ago, the speed of processing and answering is really good and accurate. Was able to ask mathematical questions, basic translation between different languages and even trivia about movies released almost 30 years ago.

Things to improve: 1) sometimes the question would get stuck on the last phrase and keep repeating it without end. 2) The chat does not scroll the window to follow the answer and we have to scroll manually.

In either case, excellent start. It is without the fastest offline LLM that I've seen working on this phone.

rshemet · 8 months ago

thank you! Very kind feedback, and we'll add your feedback to our to-dos.

re: "question would get stuck on the last phrase and keep repeating it without end." - that's a limitation of the model i'm afraid. Smaller models tend to do that sometimes.

rshemet commented on Show HN: Cactus – Ollama for Smartphones github.com/cactus-compute... · Posted by u/HenryNdubuaku

v5v3 · 8 months ago

Do the community tools in Ollama work in Cactus? (Just python scripts I think).

rshemet · 8 months ago

say more about "community tools"?

rshemet commented on Show HN: Cactus – Ollama for Smartphones github.com/cactus-compute... · Posted by u/HenryNdubuaku

smcleod · 8 months ago

FYI I see you have SmolLM2, this was replaced with SmolLM 3 this week!

Would be great to have a few larger models to choose from too, Qwen 3 4b, 8b etc

rshemet · 8 months ago

in the app you mean?

Adding shortly!