The history and evolution of RAGFlow, from its inception to becoming one of the most popular open-source RAG engines with deep document understanding capabilities.
RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine developed by Infiniflow. It specializes in deep document understanding for complex formats and has grown to over 74,400 GitHub stars since its launch.
RAGFlow was created to address the challenges of building production-ready RAG applications with:
- Deep document understanding for complex formats
- Template-based intelligent chunking
- Grounded citations with reduced hallucinations
- Support for heterogeneous data sources
The founding mission was to provide:
- Quality In, Quality Out (QIQO) principle
- Find “needle in a data haystack” across unlimited tokens
- Streamlined RAG orchestration for personal and enterprise use
- Configurable LLMs and embedding models
Early 2024:
- Initial Release - RAGFlow launched with deep document understanding
- Core Features - Template-based chunking, OCR, layout analysis
- Platform Support - Docker deployment, x86 architecture
Mid 2024:
- Grounded Citations - Visual text chunking, traceable citations
- Heterogeneous Data - Support for Word, slides, excel, images, scanned copies
- Automated RAG Workflow - Streamlined orchestration with fused re-ranking
Late 2024:
- API Integration - Intuitive APIs for business integration
- Multiple Recall - Paired with fused re-ranking
- Community Growth - Thousands of GitHub stars
Early 2025:
- Agentic Workflow - Agent capabilities added
- MCP Support - Model Context Protocol integration
- Python/JavaScript Executor - Code execution in workflows
Mid 2025:
- GPT-5 Support - OpenAI’s GPT-5 series models (August 2025)
- MinerU & Docling - Document parsing methods (October 2025)
- Orchestrable Ingestion Pipeline - Flexible data ingestion (October 2025)
Late 2025:
- Data Synchronization - Confluence, S3, Notion, Discord, Google Drive (November 2025)
- Gemini 3 Pro Support - Google’s latest models (November 2025)
- Memory Support - AI agent memory functionality (December 2025)
February 2026:
- Version 0.24.0 - Latest release (February 10, 2026)
- 75.7k+ GitHub Stars - Growing community
- 8.3k+ Forks - Active development community
- 489+ Contributors - Community-driven development
| Date |
Milestone |
| 2024 Q1 |
RAGFlow launched with deep document understanding |
| 2024 Q2 |
Template-based chunking, grounded citations |
| 2024 Q3 |
Automated RAG workflow, multiple recall |
| 2024 Q4 |
API integration, heterogeneous data support |
| 2025 Q1 |
Agentic workflow, MCP support |
| 2025 Q2 |
Code executor (Python/JavaScript) |
| 2025 Aug |
GPT-5 series support |
| 2025 Oct |
MinerU & Docling parsing, orchestrable ingestion |
| 2025 Nov |
Data sync (Confluence, S3, Notion, Discord, Google Drive) |
| 2025 Nov |
Gemini 3 Pro support |
| 2025 Dec |
Memory support for AI agents |
| 2026 Feb |
v0.24.0 release, 75.7k+ stars |
¶ Document Understanding
Early Versions:
- Basic OCR and layout analysis
- Simple text extraction
- Limited format support
Current Versions:
- Advanced OCR with layout preservation
- Complex format handling (PDF, DOCX, slides, images)
- Template-based intelligent chunking
- Visual text chunking for human intervention
Early Versions:
- Basic retrieval
- Simple ranking
- Single recall
Current Versions:
- Multiple recall strategies
- Fused re-ranking
- Hybrid retrieval (dense + sparse)
- Graph-based workflow orchestration
Early Versions:
- Basic agent execution
- Limited tool access
Current Versions:
- Full agentic workflow
- MCP (Model Context Protocol) support
- Python/JavaScript code executor with gVisor sandbox
- Memory support (cross-session context)
- Data synchronization from multiple sources
Initial:
- x86 Docker images
- CPU-only inference
Current:
- x86 Docker images (ARM64 requires custom build)
- GPU acceleration support (NVIDIA CUDA)
- Elasticsearch and Infinity document engines
- MinIO object storage
- Apache-2.0 License - Free for personal and commercial use
- Open-Source - Community-driven development
- Self-Hosted - Full control over infrastructure
¶ Community and Ecosystem
Official Repository:
| Metric |
Value |
| GitHub Stars |
75.7k+ |
| Forks |
8.3k+ |
| Contributors |
489+ |
| Releases |
24+ (v0.24.0 latest) |
Supported LLM Providers:
- OpenAI (GPT-4, GPT-4o, GPT-5)
- Anthropic (Claude)
- Google (Gemini, Gemini 3 Pro)
- Azure OpenAI
- Ollama (local models)
- LocalAI
- Mistral
- Groq
Data Sources:
- Confluence
- S3
- Notion
- Discord
- Google Drive
- Local files (PDF, DOCX, XLSX, images, etc.)
RAGFlow has significantly influenced the RAG space:
- Deep Document Understanding - Set standard for complex format handling
- Template-Based Chunking - Intelligent, explainable chunking
- Grounded Citations - Reduced hallucinations with traceable references
- Hybrid Retrieval - Multiple recall with fused re-ranking
- Agent Integration - Combined RAG with agentic workflows
- 75.7k+ GitHub stars
- Widely adopted in enterprise and research
- Recommended RAG engine in AI community
- Featured in RAG and LLM application guides
- Used by enterprises for internal knowledge bases
- Latest Version: v0.24.0 (February 10, 2026)
- Release Cadence: Regular updates
- Active Development: Continuous feature additions
- Bug Tracking: Public GitHub issues
- Deep document understanding with OCR
- Template-based chunking (multiple templates)
- Grounded citations with visual text chunking
- Heterogeneous data support (Word, slides, excel, images)
- Automated RAG workflow with fused re-ranking
- Agentic workflow with MCP support
- Memory support for AI agents
- Data sync from Confluence, S3, Notion, Discord, Google Drive
- Python/JavaScript code executor
- GPU acceleration (NVIDIA CUDA)
- Elasticsearch and Infinity document engines
Free and Open-Source:
- Apache-2.0 license
- Free for personal and commercial use
- Community-driven development
- Built in public
Based on development patterns and public communications:
- Enhanced document understanding
- More data source integrations
- Improved agent capabilities
- Better GPU optimization
- Expanded MCP ecosystem
- Maintain Apache-2.0 license
- Expand enterprise features
- Enhanced collaboration tools
- Better multi-modal support
- Improved workflow orchestration
- Memory enhancements
- Focus: RAG and document understanding infrastructure
- Mission: Make RAG accessible and effective for everyone
- Products: RAGFlow (open-source)
- Website: https://ragflow.io
- GitHub: https://github.com/infiniflow/ragflow
- Discord: https://discord.gg/NjYzJD3GM3
- Twitter: @infiniflowai
- Demo: https://demo.ragflow.io
RAGFlow represents a deep document understanding approach to RAG. Its evolution from basic RAG engine to a full-featured platform with agent capabilities, memory support, and data synchronization reflects the broader trajectory of the GenAI ecosystem.
Key principles that guide RAGFlow:
- Quality In, Quality Out - Deep document understanding for better results
- Template-Based - Intelligent, explainable chunking
- Grounded - Traceable citations to reduce hallucinations
- Flexible - Support for heterogeneous data sources
- Open Source - Apache-2.0 license for maximum flexibility
Any questions?
Feel free to contact us. Find all contact information on our contact page.