vLLM is a high-throughput and memory-efficient LLM inference and serving engine originally developed in the Sky Computing Lab at UC Berkeley. It features PagedAttention for efficient memory management, continuous batching, and an OpenAI-compatible API server. vLLM is widely used for production LLM deployments requiring high performance.
License: Apache-2.0
GitHub: vllm-project/vllm
Website: vllm.ai
Documentation: docs.vllm.ai
| Component | Technology |
|---|---|
| Languages | Python (87.7%), CUDA (6.7%), C++ (4.1%), CMake, C |
| Framework | PyTorch, HuggingFace Transformers |
| GPU | NVIDIA CUDA, AMD ROCm, Intel XPU |
| CPU | x86_64, ARM AArch64, Apple Silicon, IBM Z (S390X), PowerPC |
| Deployment | Docker, Kubernetes, Bare-metal |
| Build System | CMake, pip/PyPI |
| Component | Minimum | Recommended (Production) |
|---|---|---|
| GPU VRAM | 8GB+ | 24GB+ per GPU |
| RAM | 16 GB | 64+ GB |
| CUDA | 11.8+ | Latest stable |
| Storage | 50 GB SSD | 500GB+ NVMe |
| Network | 1 Gbps | 10 Gbps+ (distributed) |
| Hardware | Status | Notes |
|---|---|---|
| NVIDIA GPUs | β Native | Primary platform, full feature support |
| AMD GPUs | β Supported | ROCm backend |
| Intel XPU | β Supported | Intel GPU support |
| TPU | β Supported | Google TPU deployments |
| AWS Trainium/Inferentia | β Supported | AWS hardware plugins |
| CPU Only | β Supported | x86, ARM, Apple Silicon |
| Page | Description |
|---|---|
| π¦ Setup | Installation and setup guide |
| π³ Docker Setup | Docker deployment guide |
| π§ Ansible Setup | Ansible automation |
| βοΈ Configuration | Configuration options |
| π Security | Security hardening (coming soon) |
| π History | Project history and milestones |
| π Links | External resources |
| π Alternatives | Alternative tools |
pip install vllm
docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
--model meta-llama/Llama-2-7b-chat-hf
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"prompt": "Hello, my name is",
"max_tokens": 100
}'
Any questions?
Feel free to contact us. Find all contact information on our contact page.
Need professional help? We offer consulting for vLLM deployments. Contact us or email office@linux-server-admin.com.