LLM tester with llama.cpp

Content rating
Everyone
500+
Downloads
Content rating
Everyone
Learn more
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image

About this app

1. What This App Can Do
This app lets you run large language models (LLMs) entirely on your Android device, enabling private text generation without external servers.
You can load GGUF models from HuggingFace or from local storage, providing a flexible offline AI environment.
The app supports a wide range of llama.cpp‑compatible models, including Gemma‑4 and Bonsai.
An Ollama‑compatible API server is also included, allowing other apps or scripts to access your local LLM via standard HTTP endpoints.
Web UI is Available. Vision as multimodel is Available with Gemma-4.
Upload mmproj file from local device. Only one mmproj model is acceptable at same time.
MCP and Function Calling are available.
---

2. Intended Users and Supported Devices
Ideal for:
- Users who want fully local LLMs
- Those using GGUF models from HuggingFace or local files
- Advanced users needing detailed parameter control
- Developers calling a local LLM from their apps
- Privacy‑focused users

The app works on a wide range of Android devices. By adjusting context size, threads, and batch size, you can tune performance for your hardware.

---

3. Key Features
- Load GGUF models from HuggingFace or local storage
- Fully offline inference
- Supports Gemma‑4 and other llama.cpp‑compatible models
- Detailed parameter settings (Mirostat, DRY, XTC, etc.)
- Ollama‑compatible API: /api/chat, /api/generate, /api/tags
- Automatic prompt template selection
- Streaming output option
- Comprehensive logs and UI safeguards
- Various minor improvements for stability and usability
- Web UI is Available.
- Vision as multimodel is Available with Gemma-4.

---

4. Getting Started
1. Open Settings.
2. Enter a HuggingFace GGUF URL or choose a local GGUF file, then tap Load Model.
3. Adjust parameters and tap Save Config.
4. Tap SAVE & CLOSE to apply settings.

---

5. Main Screen Functions
- Enter Prompt: Input your instruction
- Send: Start generation
- Re‑init Model: Reload current model
- View Log / Clear Log
- Start/Stop API Server
- Copy output or logs
- View timestamped processing logs

---

6. Settings Screen Highlights
- Save/load/delete configurations
- Model selection from URL or local storage
- Context size, temperature, Mirostat, DRY, XTC, etc.
- Streaming output toggle
- Custom or auto‑selected prompt templates
- API server port settings
- Log verbosity options
- Manual and privacy policy

---

7. Prompt Templates and Stop Sequences
The app detects the model family from the filename and selects an appropriate template.
It also stops generation when common delimiters appear to prevent runaway output.

Tips:
Gemma‑4 tends to repeat short phrases. Adding explicit anti‑repetition instructions in the system prompt or using stricter stop sequences can improve output quality.

---

8. API Server Capabilities
Provides:
- /api/chat
- /api/generate
- /api/tags
- /v1/chat/completions,
- /v1/models
- /props, /slots

- / for Web UI http://localhost:11434/

Only one generation request is processed at a time. Android 13+ may require notification permission.

---

9. How This App Stands Out
- GGUF loading from HuggingFace and local device
- Support for Gemma‑4 as multimodel via Web UI
- More detailed parameter control than typical local LLM apps
- Built‑in Ollama‑compatible API server
- Automatic template selection
- Flexible performance tuning
Updated on
May 5, 2026

Data safety

Safety starts with understanding how developers collect and share your data. Data privacy and security practices may vary based on your use, region, and age. The developer provided this information and may update it over time.
No data shared with third parties
Learn more about how developers declare sharing
No data collected
Learn more about how developers declare collection

What’s new

MCP and Function Calling are Available.
Improve Settings usability.