Fluent AI: Offline & Cloud LLM

Name: Fluent AI: Offline & Cloud LLM
Availability: InStock
Author: ReadHeights Technologies Private Limited

ReadHeights Technologies Private Limited

Contains adsIn-app purchases

Everyone

10K+

Downloads

Everyone

Learn more

About this app

🤖 Fluent AI — Private Offline LLM + Claude, GPT-4 & Gemini

Run AI entirely on your device — no cloud, no account, no data sent anywhere. Then switch to Claude, GPT-4 or Gemini when you need more power. One app. Every AI. Always private.

✨ WHAT'S NEW IN v1.3

🏥 MEDICAL AI (MedGemma)
• Google's MedGemma 4B — clinical Q&A and biomedical text, 100% on-device
• Requires accepting Google's Health AI Developer Foundation Terms
• Not a substitute for professional medical advice

🤖 AGENTIC MODE
• On-device AI agent with 12 built-in skills
• Runs tasks autonomously: calendar events, web research, document digest, trip planning
• Agent Task Inspector — see every reasoning step in real time
• 3 free agent runs/day — no subscription needed to start
• Scheduled tasks available with Premium

⚡ LITERT MTP — UP TO 2× FASTER
• Gemma 4n E2B/E4B with Multi-Token Prediction on Android GPU
• Speculative decoding — more tokens per step, same quality
• Tok/s display measures decode-phase speed only for accurate results

👁️ ON-DEVICE VISION (Android)
• Attach photos using Gemma 4n — processed entirely on-device
• No image uploaded to any server, ever

🔒 PRIVACY FIRST
• Conversations stay on your device
• Optional local models = zero cloud data
• API keys encrypted with AES — never stored in plain text
• No mandatory account required

🧠 LOCAL AI MODELS
• GGUF / llama.cpp: Gemma 3/4, Qwen 3.5, Phi-4, Llama, DeepSeek R1, Nemotron, MedGemma
• LiteRT (Android GPU/NPU): Gemma 4n E2B/E4B — vision + MTP speculative decoding
• Apple MLX: Native Metal on Apple Silicon and iOS 18+ (A17 Pro+)
• Q5_K GPU acceleration on Qualcomm Adreno (alongside Q4_0)
• Device-aware model recommendations based on your RAM and chipset
• Browse, download, and manage models in-app — no sideloading needed
• Import custom GGUF from HuggingFace URL or device storage

☁️ CLOUD AI
• Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google)
• OpenRouter — 200+ models via a single API key
• Streaming, vision, and tool calling across all providers

🌐 ONLINE SERVERS
• Ollama Cloud and self-hosted Ollama
• LM Studio, vLLM, LocalAI, and any OpenAI-compatible /v1 API
• Multiple server profiles with per-profile encrypted auth headers

🎤 VOICE MODE
• 5 conversation modes: Normal, Interview, Learning, Storytelling, Translation
• Animated waveform, voice commands (speed, repeat, stop)
• Quick-capture mic button directly in the chat input bar

📚 KNOWLEDGE BASES (RAG)
• Import PDFs, TXT, and Markdown — AI references your docs when answering
• Semantic search for relevant context, topic and project organisation

🔧 POWER FEATURES
• Tool calling: Calculator, DateTime, Weather, Web Search, mem0 Memory
• MCP servers: GitHub, Slack, Notion, Supabase, and 20+ presets
• Code execution: Python, Bash, Node.js from code blocks (desktop + mobile JS)
• Model benchmarking: tok/s, TTFT, MMLU-50 quality score, shareable PNG cards
• Slash commands: /agent, /clear, /export, /voice, /template and more
• Per-chat thinking toggle for Qwen3, DeepSeek R1, Nemotron reasoning models
• URL context injection — paste a link, AI reads the page for context
• Polish Before Send — AI rewrites your draft before you hit send
• Continue button — resumes responses cut off at the token limit

📁 CHAT ORGANISATION
• Folders, tags, and cross-chat full-text search across every message
• HuggingFace model browser with bookmarks and memory fitness badges
• Conversation branching and message reactions

🌟 PREMIUM (OPTIONAL)
• Ad-free experience
• Scheduled agent tasks (recurring or one-time)
• Priority feature access and advanced analytics

📱 PERFECT FOR
✓ Privacy-focused users — local models, zero cloud data
✓ Android power users — LiteRT GPU/NPU with MTP acceleration
✓ Developers — benchmark GGUF, LiteRT, and MLX side-by-side
✓ Healthcare researchers — MedGemma on-device, no upload needed
✓ Students — knowledge bases for study documents and materials
✓ Professionals — agentic tasks, document Q&A, and tool calling

Updated on

Jun 18, 2026

Data safety

Safety starts with understanding how developers collect and share your data. Data privacy and security practices may vary based on your use, region, and age. The developer provided this information and may update it over time.

No data shared with third parties

Learn more about how developers declare sharing

No data collected

Learn more about how developers declare collection

Data is encrypted in transit

You can request that data be deleted

What’s new

What's new:
• Bug fixes and stability improvements.
• Groq is now available as a cloud AI provider — add your Groq API key in Settings > Cloud AI Models to chat with fast Groq-hosted models.

Flag as inappropriate

Everyone

Learn more