Curated list of official documentation, community resources, tutorials, and tools for vLLM.
| Resource | URL | Description |
|---|---|---|
| Official Website | vllm.ai | Project homepage |
| Documentation | docs.vllm.ai | Complete documentation |
| GitHub Repository | github.com/vllm-project/vllm | Source code and issues (73.9k+ ⭐) |
| Docker Hub | hub.docker.com/r/vllm/vllm-openai | Official Docker images |
| PyPI Package | pypi.org/project/vllm | Python package |
| Section | URL | Description |
|---|---|---|
| Getting Started | Installation | Installation guides |
| Quickstart | Quickstart | Quick start guide |
| Models | Supported Models | Model compatibility |
| Configuration | Engine Args | Engine arguments |
| Deployment | Deployment Guide | Production deployment |
| API Reference | OpenAI API | API documentation |
| Paper | Link | Description |
|---|---|---|
| PagedAttention (SOSP 2023) | arxiv.org/abs/2309.06180 | Original vLLM paper |
| ACM Digital Library | doi.org/10.1145/3600006.3613165 | SOSP 2023 proceedings |
| Paper | Link | Description |
|---|---|---|
| Continuous Batching | arxiv.org/abs/2309.06180 | Batching optimization |
| FlashAttention | arxiv.org/abs/2205.14135 | Efficient attention |
| FlashInfer | flashinfer.ai | Kernel library |
| Platform | Link | Description |
|---|---|---|
| GitHub Discussions | discussions | Q&A and announcements |
| Slack Workspace | vLLM Dev Slack | Community chat |
| Hugging Face | vLLM integration | HF integration docs |
| Platform | Link | Description |
|---|---|---|
| Twitter/X | @vllm_project | Official updates |
| vLLM Project | Company page |
| Tutorial | Link | Description |
|---|---|---|
| Basic Inference | docs.vllm.ai | Getting started |
| API Server | OpenAI API | API usage |
| Multi-GPU | Tensor Parallel | Distributed serving |
| Tutorial | Source | Description |
|---|---|---|
| vLLM Complete Guide | Towards Data Science | Comprehensive tutorial |
| Production Deployment | Medium | Production best practices |
| Benchmarking Guide | Blog posts | Performance testing |
| Video | Platform | Description |
|---|---|---|
| vLLM Introduction | YouTube | Project overview |
| SOSP 2023 Talk | YouTube | Conference presentation |
| Deployment Tutorial | YouTube | Step-by-step guide |
| Tool | Link | Description |
|---|---|---|
| vLLM Benchmarks | GitHub | Performance benchmarks |
| vLLM Examples | GitHub | Example deployments |
| Integration | Link | Description |
|---|---|---|
| LangChain | python.langchain.com | LangChain integration |
| LlamaIndex | docs.llamaindex.ai | RAG integration |
| Haystack | haystack.deepset.ai | NLP pipeline |
| Open WebUI | openwebui.com | Web interface |
| LiteLLM | docs.litellm.ai | Unified API |
| Tool | Link | Description |
|---|---|---|
| Prometheus | prometheus.io | Metrics collection |
| Grafana | grafana.com | Dashboards |
| vLLM Dashboard | GitHub | Community dashboard |
| Provider | Service | Link |
|---|---|---|
| AWS | SageMaker | aws.amazon.com/sagemaker |
| GCP | Vertex AI | cloud.google.com/vertex-ai |
| Azure | ML Service | azure.microsoft.com/en-us/products/machine-learning |
| Anyscale | Managed vLLM | anyscale.com |
| Provider | Image | Link |
|---|---|---|
| AWS | AMI | AWS Marketplace |
| GCP | VM Image | GCP Marketplace |
| Azure | VM Image | Azure Marketplace |
| Report | Source | Description |
|---|---|---|
| vLLM vs TGI vs Ollama | buildwithmatija.com | Performance comparison |
| SGLang vs vLLM | localaimaster.com | Feature comparison |
| LLM Inference Battle | worldline.tech | Multi-engine comparison |
| Metric | Source | Description |
|---|---|---|
| Throughput Benchmarks | Official docs | Official benchmarks |
| Community Benchmarks | GitHub issues | User benchmarks |
| Independent Tests | Blogs | Third-party tests |
| Source | Link | Description |
|---|---|---|
| GitHub Releases | Releases | Version history |
| Changelog | docs.vllm.ai | Feature changelog |
| Source | Link | Description |
|---|---|---|
| Official Blog | vllm.ai/blog | Project updates |
| Hugging Face Blog | huggingface.co/blog | HF integration news |
| AI News | The Batch | Industry news |
| Resource | Link | Description |
|---|---|---|
| GitHub Issues | Issues | Bug reports |
| Discussions | Discussions | Q&A |
| Slack | vLLM Dev | Community support |
| Documentation | docs.vllm.ai | Self-help |
| Provider | Service | Link |
|---|---|---|
| Anyscale | Enterprise support | anyscale.com |
| Cloud Providers | Managed services | See cloud integrations |
| Project | Link | Description |
|---|---|---|
| SkyPilot | skypilot.readthedocs.io | Cloud orchestration |
| Ray | ray.io | Distributed computing |
| Sky Computing Lab | sky.cs.berkeley.edu | Lab homepage |
| Project | Link | Description |
|---|---|---|
| SGLang | github.com/sgl-project/sglang | Alternative engine |
| TGI | github.com/huggingface/text-generation-inference | HF inference |
| Ollama | ollama.com | Local inference |
| TensorRT-LLM | github.com/NVIDIA/TensorRT-LLM | NVIDIA optimization |
| Repository | Link | Description |
|---|---|---|
| Hugging Face Hub | huggingface.co/models | Model hub |
| ModelScope | modelscope.cn | Alibaba models |
| Civitai | civitai.com | Community models |
| Source | Link | Description |
|---|---|---|
| TheBloke | huggingface.co/TheBloke | AWQ/GPTQ models |
| MaziyarPanahi | huggingface.co/MaziyarPanahi | Quantized models |
Any questions?
Feel free to contact us. Find all contact information on our contact page.