Docker deployment guide for GPT4All OpenAI-compatible API server.
GPT4All provides a Docker-based API server that offers OpenAI-compatible HTTP endpoints for local LLM inference. This is useful for:
| Requirement | Details |
|---|---|
| Docker | Docker Engine 20+ or Docker Desktop |
| Docker Compose | Docker Compose v2+ |
| RAM | 8GB minimum, 16GB+ recommended |
| Disk | 10GB minimum, 50GB+ recommended |
| GPU | Optional (NVIDIA with CUDA, AMD with Vulkan) |
docker pull nomicai/gpt4all:latest
docker run -d \
--name gpt4all \
-p 4891:4891 \
-v gpt4all-models:/app/models \
nomicai/gpt4all:latest
# Check container status
docker ps
# View logs
docker logs gpt4all
# Test API
curl http://localhost:4891/v1/models
version: '3.8'
services:
gpt4all:
image: nomicai/gpt4all:latest
container_name: gpt4all
restart: always
ports:
- "4891:4891"
volumes:
- gpt4all-models:/app/models
environment:
- MODEL_NAME=llama-3-8b-instruct.Q4_0.gguf
volumes:
gpt4all-models:
version: '3.8'
services:
gpt4all:
image: nomicai/gpt4all:latest
container_name: gpt4all
restart: always
ports:
- "4891:4891"
volumes:
- gpt4all-models:/app/models
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- MODEL_NAME=llama-3-8b-instruct.Q4_0.gguf
- CUDA_VISIBLE_DEVICES=0
volumes:
gpt4all-models:
version: '3.8'
services:
gpt4all:
image: nomicai/gpt4all:latest
container_name: gpt4all
restart: always
ports:
- "4891:4891"
volumes:
- gpt4all-models:/app/models
- /dev/dri:/dev/dri
group_add:
- video
environment:
- MODEL_NAME=llama-3-8b-instruct.Q4_0.gguf
- VULKAN_VISIBLE_DEVICES=0
volumes:
gpt4all-models:
| Variable | Description | Default |
|---|---|---|
MODEL_NAME |
Model to load | - |
MODEL_PATH |
Path to models | /app/models |
HOST |
Binding address | 0.0.0.0 |
PORT |
API port | 4891 |
CUDA_VISIBLE_DEVICES |
GPU selection | - |
VULKAN_VISIBLE_DEVICES |
Vulkan GPU selection | - |
| Endpoint | Method | Purpose |
|---|---|---|
/v1/chat/completions |
POST | Chat completions |
/v1/completions |
POST | Text completions |
/v1/models |
GET | List available models |
curl http://localhost:4891/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b-instruct",
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello!"}
],
"max_tokens": 1024
}'
curl http://localhost:4891/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b-instruct",
"prompt": "The capital of France is",
"max_tokens": 100
}'
curl http://localhost:4891/v1/models
Models are downloaded automatically when first requested. You can pre-download:
# Inside container
docker exec -it gpt4all bash
# Download model (implementation depends on image)
| Volume | Purpose |
|---|---|
gpt4all-models |
Model storage |
volumes:
- /path/to/your/models:/app/models
ports:
- "127.0.0.1:4891:4891"
Nginx configuration:
server {
listen 443 ssl http2;
server_name gpt4all.example.com;
ssl_certificate /etc/ssl/certs/gpt4all.example.com.crt;
ssl_certificate_key /etc/ssl/private/gpt4all.example.com.key;
location / {
proxy_pass http://localhost:4891;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# Rate limiting
limit_req zone=onepersecond burst=5 nodelay;
}
}
location / {
proxy_pass http://localhost:4891;
# Basic auth
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4891/v1/models"]
interval: 30s
timeout: 10s
retries: 3
# View logs
docker logs gpt4all
# Follow logs
docker logs -f gpt4all
# Last 100 lines
docker logs --tail 100 gpt4all
# Check container stats
docker stats gpt4all
# Check logs
docker logs gpt4all
# Check if port is in use
lsof -i :4891
# Restart container
docker restart gpt4all
# Check available models
curl http://localhost:4891/v1/models
# Check container logs
docker logs gpt4all | grep -i model
# Check NVIDIA GPU
docker exec gpt4all nvidia-smi
# Check Vulkan
docker exec gpt4all vulkaninfo
# Check container memory
docker stats gpt4all
# Limit memory in docker-compose.yml
deploy:
resources:
limits:
memory: 8G
docker pull nomicai/gpt4all:latest
docker restart gpt4all
docker compose pull
docker compose up -d
# Export volume
docker run --rm \
-v gpt4all-models:/source \
-v $(pwd):/backup \
alpine tar czf /backup/gpt4all-models.tar.gz -C /source .
docker run --rm \
-v gpt4all-models:/target \
-v $(pwd):/backup \
alpine tar xzf /backup/gpt4all-models.tar.gz -C /target
Any questions?
Feel free to contact us. Find all contact information on our contact page.