LocalAI is a self-hosted platform for running OpenAI-compatible APIs with local models. The project emphasizes running on your own infrastructure so teams can keep data, prompts, and logs within their security boundary.
LocalAI provides an OpenAI-compatible API for running models on your own hardware. It is designed for teams that want local inference without changing client integrations.
GitHub: github.com/mudler/LocalAI
- LocalAI was created by Ettore Di Giacinto
- Initial focus: OpenAI API compatibility for local model inference
- Early support for llama.cpp backend
- Added support for multiple model backends (vLLM, transformers)
- Image generation support (Stable Diffusion)
- Audio processing (whisper.cpp, TTS)
- GPU acceleration (NVIDIA CUDA, AMD ROCm, Intel oneAPI)
- MLX backend for Apple Silicon
- Agent management capabilities
- New React UI
- WebRTC support for real-time communication
- Model Context Protocol (MCP) support
- Latest Release: v4.1.3 (April 6, 2026)
- GitHub: 45.4k+ stars, 3.9k+ forks
- Backends: 20+ model backends for text, image, audio, and specialized AI tasks
- Hardware Support: NVIDIA CUDA 12/13, AMD ROCm, Intel oneAPI, Apple Metal, Vulkan, NVIDIA Jetson
LocalAI supports multiple inference backends:
Text Generation:
- llama.cpp, vLLM, transformers, MLX, MLX-VLM, vLLM Omni
Audio Processing:
- whisper.cpp, faster-whisper, moonshine (transcription)
- kokoro, piper, chatterbox, silero-vad, vibevoice, qwen-tts, neutts (TTS)
- ace-step (music generation)
Image Generation:
- stablediffusion.cpp, diffusers, Flux
Specialized AI:
- rfdetr (object detection)
- rerankers (document reranking)
- local-store (vector database)
| Acceleration |
Hardware |
Notes |
| NVIDIA CUDA 12/13 |
Nvidia GPUs |
Full CUDA support |
| AMD ROCm |
AMD Graphics |
llama.cpp, vLLM, transformers |
| Intel oneAPI |
Intel Arc, iGPUs |
llama.cpp, vLLM, transformers |
| Apple Metal |
M1/M2/M3+ |
llama.cpp, MLX, diffusers |
| Vulkan |
Cross-platform |
llama.cpp, whisper, stablediffusion |
| NVIDIA Jetson |
AGX Orin, DGX Spark |
ARM64 embedded AI |
| CPU |
AVX/AVX2/AVX512 |
All backends with quantization |
LocalAI is a mature platform for self-hosted AI inference with:
- OpenAI-compatible API - Drop-in replacement for OpenAI, Anthropic, Elevenlabs
- Multi-modal support - Text, image, audio, vision, embeddings, reranking
- No GPU required - Runs on consumer hardware, GPU acceleration optional
- Backend Gallery - Install/remove backends via OCI images
- Agent capabilities - Built-in agents with LocalAGI, MCP support
- Memory & Knowledge Base - LocalRecall for semantic search and RAG
LocalAI is released under the MIT License:
- Free to use, modify, and distribute
- Commercial use allowed
- No warranty provided