LLM Hub

500+
Downloads
Content rating
Everyone
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image
Screenshot image

About this app

LLM Hub brings production-grade AI straight to your Android device — private, fast, and fully local. Run modern on-device LLMs (Gemma-3, Gemma-3n multimodal, Llama-3.2, Phi-4 Mini) with large context windows, persistent global memory, and retrieval-augmented generation (RAG) that grounds answers in indexed documents stored on-device. Create and store embeddings for documents and notes, run vector similarity search locally, and enrich responses with DuckDuckGo-powered web search when you need live facts. Everything important stays on your phone unless you explicitly export it: local-only memory, indexes, and embeddings protect your privacy while delivering high relevance and accuracy.

Key Features

On-device LLM inference: Fast, private responses without cloud dependency; choose models that match your device and needs.
Retrieval-Augmented Generation (RAG): Combine model reasoning with indexed document chunks and embeddings to produce fact-grounded answers.
Persistent Global Memory: Save facts, documents, and knowledge to a persistent, device-local memory (Room DB) for long-term recall across sessions.
Embeddings & Vector Search: Generate embeddings, index content locally, and retrieve the most relevant documents with efficient similarity search.
Multimodal Support: Use text + image capable models (Gemma-3n) for richer interactions when available.
Web Search Integration: Supplement local knowledge with DuckDuckGo-powered web results to fetch up-to-date information for RAG queries and instant answers.
Offline-Ready: Work without network access — models, memory, and indexes persist on-device.
GPU Acceleration (optional): Benefit from hardware acceleration where supported — for best results with larger GPU-backed models we recommend devices with at least 8GB RAM.
Privacy-First Design: Memory, embeddings, and RAG indexes remain local by default; no cloud upload unless you explicitly choose to share or export data.
Long-Context Handling: Support for models with large context windows so the assistant can reason over extensive documents and histories.
Developer-Friendly: Integrates with local inference, indexing, and retrieval use-cases for apps requiring private, offline AI.
Why choose LLM Hub? LLM Hub is built to deliver private, accurate, and flexible AI on mobile. It merges the speed of local inference with the factual grounding of retrieval-based systems and the convenience of persistent memory — ideal for knowledge workers, privacy-conscious users, and developers building local-first AI features.

Supported Models: Gemma-3, Gemma-3n (multimodal), Llama-3.2, Phi-4 Mini — choose the model that fits your device capabilities and context needs.
Updated on
Sep 16, 2025

Data safety

Safety starts with understanding how developers collect and share your data. Data privacy and security practices may vary based on your use, region, and age. The developer provided this information and may update it over time.
No data shared with third parties
Learn more about how developers declare sharing
No data collected
Learn more about how developers declare collection

What’s new

- Upgraded Phi-4 Mini Max context window to 4096 and enabled GPU backend
- Model loading configuration now remembers your last settings
- Added translation support for Italian

App support

About the developer
Yuan Qian
timmyboy0623@gmail.com
33 Magdalena Place, Rowville Rowville Clayton VIC 3168 Australia
undefined

Similar apps