Cactus Chat lets you talk to AI, directly on your phone. This means it's free, runs offline, and your data stays on your device.
If you're a developer, use Cactus to benchmark the latency and throughput of various LLMs.
Updated on
Aug 14, 2025
Productivity
Data safety
arrow_forward
Safety starts with understanding how developers collect and share your data. Data privacy and security practices may vary based on your use, region, and age. The developer provided this information and may update it over time.
Learn more about how developers declare collection
See details
Ratings and reviews
phone_androidPhone
4.5
26 reviews
5
4
3
2
1
T Turner
Flag inappropriate
Show review history
October 31, 2025
It's a decent enough demo but isn't nearly as fast as the GitHub description says it should be, I'm only getting about 9 tokens a second on a Pixel 7 with the default model. If this wasn't open source it would be rated much lower, there are applications such as PocketPal using a 1b gemma3 model which is as fast or faster in a feature rich application with a smarter LLM model.
John Kintree
Flag inappropriate
November 1, 2025
Running gemma3:1b on my Oneplus 11, the latency with Cactus Chat was less than 1 second, while the latency with Ollama was more than 10 seconds.
Jason Blackross
Flag inappropriate
July 2, 2025
works actually pretty good, only limitation i have is exporting the chats to pdf or easily copying the text to a file for convenience (would actually prefer pdf)