Cactus Chat lets you talk to AI, directly on your phone. This means it's free, runs offline, and your data stays on your device.
If you're a developer, use Cactus to benchmark the latency and throughput of various LLMs.
Updated on
Aug 14, 2025
Productivity
Data safety
arrow_forward
Safety starts with understanding how developers collect and share your data. Data privacy and security practices may vary based on your use, region, and age. The developer provided this information and may update it over time.
Learn more about how developers declare collection
See details
Ratings and reviews
phone_androidPhone
4.5
20 reviews
5
4
3
2
1
T Turner
Flag inappropriate
Show review history
October 31, 2025
It's a decent enough demo but isn't nearly as fast as the GitHub description says it should be, I'm only getting about 9 tokens a second on a Pixel 7 with the default model. If this wasn't open source it would be rated much lower, there are applications such as PocketPal using a 1b gemma3 model which is as fast or faster in a feature rich application with a smarter LLM model.
John Kintree
Flag inappropriate
November 1, 2025
Running gemma3:1b on my Oneplus 11, the latency with Cactus Chat was less than 1 second, while the latency with Ollama was more than 10 seconds.
wael chateur
Flag inappropriate
September 21, 2025
it exactly do what it does thanks, i would like to see a benchmarking tool to test llms and a new feature to import from phone storage would be appreciated 👏