TokForge

10+
Downloads
Content rating
Everyone
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image

About this app

Your phone is smarter than you think.

TokForge runs large language models directly on your Android device — no cloud, no subscription, no data leaving your pocket. Chat with AI characters, attach documents, hear responses spoken aloud, and tune everything to your hardware automatically.

WHAT CAN IT DO?

Chat with AI Characters
Import TavernAI V2 character cards (PNG/JSON), customize personalities with per-character settings, and have real conversations with streaming generation. Reasoning models get collapsible thinking blocks. Lorebooks, alternate greetings, world info — the full spec.

Attach Documents & Ask Questions
Drop in a PDF, DOCX, EPUB, or text file and ask questions grounded in that document. RAPTOR tree indexing and BGE-small embeddings find relevant passages. Follow-up questions stay fast thanks to delta KV cache preservation.

Hear Responses Read Aloud
On-device Kokoro TTS — 11 voices, adjustable speed, two quality tiers. Fully offline. No internet needed.

2x Faster with Speculative Decoding
A small draft model predicts ahead, the main model verifies in batch. Live tok/s indicator in the chat toolbar. Auto-detected pairings with smart per-mode backend routing.

THREE BACKENDS, FIVE GPU PATHS
• MNN with OpenCL and Vulkan GPU — tuned MNN Vulkan GEMV kernels for Mali, OpenCL for Adreno. TQ4 TurboQuant hits 46–57 tok/s on small models.
• GGUF via llama.cpp — ARM i8mm, Vulkan cooperative matrix, flash attention, DRY sampler, Mirostat, full quantization range
• Remote API — OpenAI-compatible streaming to Ollama, vLLM, or llama.cpp server
• SoC-aware auto-routing picks the fastest path for your chipset

YOUR AI REMEMBERS YOU
Per-character persistent memory with background extraction — no manual tagging. Knowledge graphs track entity relationships. Hybrid keyword + semantic search. Document attachments persist across sessions.

TUNE YOUR DEVICE
ForgeLab benchmarks every model/backend combo on your hardware. AutoForge sweeps all configs and picks the fastest. Named inference profiles save your sampler settings. Shareable PNG report cards.

DEVELOPER API — 120+ ENDPOINTS
Full local control plane over HTTP. Load models, run benchmarks, manage memory, pin documents, send messages — all programmatically. Bearer-token auth, disabled by default.

TESTED ON REAL HARDWARE
• RedMagic 11 Pro (SM8850): 21.0 tok/s — Qwen3-8B, OpenCL
• Galaxy S24 Ultra (SM8650): 13.58 tok/s — Qwen3-4B, OpenCL
• OnePlus Ace 5 Ultra (D9400): 11.88 tok/s — Qwen3-8B, MNN Vulkan
• Xiaomi Pad 7 Pro (SM8635): 11.81 tok/s — Qwen3-4B, CPU

PRIVACY IS THE POINT
• Zero analytics, zero telemetry, zero cloud dependency
• All inference on-device — airplane mode works fine
• No accounts, no sign-up

17 curated models (0.6B–14B): Qwen3, DeepSeek-R1, Llama 3, Phi-4 and more. Download in-app or search HuggingFace.
Updated on
Apr 6, 2026

Data safety

Safety starts with understanding how developers collect and share your data. Data privacy and security practices may vary based on your use, region, and age. The developer provided this information and may update it over time.
No data shared with third parties
Learn more about how developers declare sharing
No data collected
Learn more about how developers declare collection

What’s new

Lot's of changes vs last upload. TurboQuant added under advanced settings, Cache clearing, RAG + Attachment support (Very Beta), Metrics/API work, UI/UX cleaning and improvements from beta tester feedback

App support

About the developer
Isaac Maple
isaac.maple@defcon-one.io
United States