RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities. With over 74,400 GitHub stars, it excels at deep document understanding for complex formats. RAGFlow provides template-based chunking, grounded citations, and heterogeneous data support.
License: Apache-2.0
GitHub: infiniflow/ragflow
- 📄 Deep Document Understanding - OCR, layout analysis for complex formats (PDF, DOCX, slides, images)
- 📋 Template-Based Chunking - Intelligent, explainable chunking with multiple templates
- 🔗 Grounded Citations - Visual text chunking, traceable citations to reduce hallucinations
- 🗂️ Heterogeneous Data - Word, slides, excel, images, scanned copies, structured data, web pages
- 🔄 Automated RAG Workflow - Streamlined orchestration with fused re-ranking
- 🤖 Agent Capabilities - Agentic workflow, MCP support, Python/JavaScript code executor
- 💾 Memory Support - AI agent memory functionality (since Dec 2025)
- 🔌 Data Sync - Confluence, S3, Notion, Discord, Google Drive integration
- Enterprise Document Q&A - Transform complex documents into searchable knowledge bases
- Research & Analysis - Find “needle in a data haystack” across unlimited tokens
- Customer Support Automation - Grounded answers with traceable citations
- Legal/Compliance Document Review - Deep document understanding for complex formats
- Multi-Modal Document Processing - Extract insights from images within documents
| Component |
Technology |
| Backend |
Python 3.12+ |
| Frontend |
TypeScript (33.7%) |
| Database |
MySQL, Redis |
| Search Engine |
Elasticsearch or Infinity |
| Object Storage |
MinIO |
| Deployment |
Docker, Kubernetes |
Language Breakdown:
- Python: 48.5%
- TypeScript: 33.7%
- C++: 9.7%
- Go: 6.3%
| Component |
Minimum |
Recommended |
| CPU |
4 cores |
8+ cores |
| RAM |
16 GB |
32+ GB |
| Disk |
50 GB |
100+ GB |
| Docker |
24.0.0+ |
Latest |
| Docker Compose |
v2.26.1+ |
Latest |
| vm.max_map_count |
≥262144 |
≥262144 |
Important Notes:
- ⚠️ x86 Platform Only - Docker images built for x86; ARM64 requires custom build
- ⚠️ gVisor Required - For code executor sandbox feature
- ⚠️ Memory Mapping - vm.max_map_count must be ≥262144
- ✅ Open-source and self-hosted
- ✅ Apache-2.0 License
- ✅ Active development (v0.24.0 - February 10, 2026)
- ✅ 75.7k+ GitHub stars, 8.3k+ forks
- ✅ Recent: Memory support, MCP, data sync (Confluence, S3, Notion)
- ⚠️ Requires significant resources for deep document understanding
- ⚠️ x86 platform only (ARM64 needs custom build)
¶ History and References
Any questions?
Feel free to contact us. Find all contact information on our contact page.