TokForge

Altersfreigabe
Jedes Alter
10+
Downloads
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot
Screenshot

Über diese App

Your phone is smarter than you think.

TokForge runs large language models directly on your Android device — no cloud, no subscription, no data leaving your pocket. Chat with AI characters, attach documents, hear responses spoken aloud, and tune everything to your hardware automatically.

WHAT CAN IT DO?

Chat with AI Characters
Import TavernAI V2 character cards (PNG/JSON), customize personalities with per-character settings, and have real conversations with streaming generation. Reasoning models get collapsible thinking blocks. Lorebooks, alternate greetings, world info — the full spec.

Attach Documents & Ask Questions
Drop in a PDF, DOCX, EPUB, or text file and ask questions grounded in that document. RAPTOR tree indexing and BGE-small embeddings find relevant passages. Follow-up questions stay fast thanks to delta KV cache preservation.

Hear Responses Read Aloud
On-device Kokoro TTS — 11 voices, adjustable speed, two quality tiers. Fully offline. No internet needed.

2x Faster with Speculative Decoding
A small draft model predicts ahead, the main model verifies in batch. Live tok/s indicator in the chat toolbar. Auto-detected pairings with smart per-mode backend routing.

THREE BACKENDS, FIVE GPU PATHS
• MNN with OpenCL and Vulkan GPU — tuned MNN Vulkan GEMV kernels for Mali, OpenCL for Adreno. TQ4 TurboQuant hits 46–57 tok/s on small models.
• GGUF via llama.cpp — ARM i8mm, Vulkan cooperative matrix, flash attention, DRY sampler, Mirostat, full quantization range
• Remote API — OpenAI-compatible streaming to Ollama, vLLM, or llama.cpp server
• SoC-aware auto-routing picks the fastest path for your chipset

YOUR AI REMEMBERS YOU
Per-character persistent memory with background extraction — no manual tagging. Knowledge graphs track entity relationships. Hybrid keyword + semantic search. Document attachments persist across sessions.

TUNE YOUR DEVICE
ForgeLab benchmarks every model/backend combo on your hardware. AutoForge sweeps all configs and picks the fastest. Named inference profiles save your sampler settings. Shareable PNG report cards.

DEVELOPER API — 120+ ENDPOINTS
Full local control plane over HTTP. Load models, run benchmarks, manage memory, pin documents, send messages — all programmatically. Bearer-token auth, disabled by default.

TESTED ON REAL HARDWARE
• RedMagic 11 Pro (SM8850): 21.0 tok/s — Qwen3-8B, OpenCL
• Galaxy S24 Ultra (SM8650): 13.58 tok/s — Qwen3-4B, OpenCL
• OnePlus Ace 5 Ultra (D9400): 11.88 tok/s — Qwen3-8B, MNN Vulkan
• Xiaomi Pad 7 Pro (SM8635): 11.81 tok/s — Qwen3-4B, CPU

PRIVACY IS THE POINT
• Zero analytics, zero telemetry, zero cloud dependency
• All inference on-device — airplane mode works fine
• No accounts, no sign-up

17 curated models (0.6B–14B): Qwen3, DeepSeek-R1, Llama 3, Phi-4 and more. Download in-app or search HuggingFace.
Aktualisiert am
06.04.2026

Datensicherheit

Was die Sicherheit angeht, solltest du als Erstes verstehen, wie Entwickler deine Daten erheben und weitergeben. Die Datenschutz- und Sicherheitspraktiken können je nach deiner Verwendung, deiner Region und deinem Alter variieren. Diese Informationen wurden vom Entwickler zur Verfügung gestellt und können jederzeit von ihm geändert werden.

Neuerungen

Lot's of changes vs last upload. TurboQuant added under advanced settings, Cache clearing, RAG + Attachment support (Very Beta), Metrics/API work, UI/UX cleaning and improvements from beta tester feedback

Support für diese App

Informationen zum Entwickler
Isaac Maple
isaac.maple@defcon-one.io
United States