Ollama Configuration

Variable	Default	Description
`OLLAMA_HOST`	`127.0.0.1:11434`	IP address and port for the Ollama server
`OLLAMA_MODELS`	`~/.ollama/models`	Path to the models directory
`OLLAMA_KEEP_ALIVE`	`5m`	Duration models stay loaded in memory after last request
`OLLAMA_NUM_PARALLEL`	`1`	Maximum number of parallel requests
`OLLAMA_MAX_LOADED_MODELS`	`0` (auto)	Maximum number of loaded models per GPU
`OLLAMA_MAX_QUEUE`	`512`	Maximum number of queued requests
`OLLAMA_ORIGINS`	`localhost, 0.0.0.0, 127.0.0.1`	Comma-separated list of allowed CORS origins
`OLLAMA_DEBUG`	`false`	Show additional debug information
`OLLAMA_LOAD_TIMEOUT`	`5m`	Model load timeout before giving up
`OLLAMA_NO_CLOUD`	`false`	Disable cloud features (remote inference, web search)
`OLLAMA_FLASH_ATTENTION`	`false`	Enable flash attention for faster inference
`OLLAMA_KV_CACHE_TYPE`	`f16`	KV cache quantization type (`f16`, `q8_0`, `q4_0`)
`OLLAMA_MULTIUSER_CACHE`	`false`	Optimize prompt caching for multi-user scenarios

Platform	Default Path
macOS	`~/.ollama/models`
Linux (service)	`/usr/share/ollama/.ollama/models`
Linux (user)	`~/.ollama/models`
Windows	`C:\Users\%username%\.ollama\models`

¶ Ollama Configuration