It’s basically a luxury minivan. It’s may not be the fastest or prettiest or cheapest, but it’s a safe way for a large family of “data and AI people” to traverse a large organisation.
More seriously, I like to call it an “analytics workbench” in a professional setting.
A Qwen 2.5 500M will get you to ≈45tok/sec on an iPhone 13. Inference speeds are somewhat linearly inversely proportional to model sizes.
Yes, speeds are consistent across frameworks, although (and don't quote me on this), I believe React Native is slightly slower because it interfaces with the C++ engine through a set of bridges.
Most of the standard mobile CPU benchmarks (GeekBench, AnTuTu, et al) show a 20-40% performance gain over S23/S24 Ultra. Also, this bucks the trend where most other devices are ranked appropriately (i.e. newer devices perform better).
Thanks for sharing your project.