Guide to configuring LM Studio for optimal performance and use cases.
LM Studio stores configuration in:
| Platform | Location |
|---|---|
| Windows | %APPDATA%/LM Studio |
| macOS | ~/Library/Application Support/LM Studio |
| Linux | ~/.config/LM Studio |
Configure how much of the model runs on GPU vs CPU:
| Setting | Description | Use Case |
|---|---|---|
| Max | Offload maximum layers to GPU | Best performance with dedicated GPU |
| Auto | Automatic GPU offload | Balanced approach |
| 0.0-1.0 | Percentage of layers on GPU | Fine-tuned control |
| 0 | CPU only | No GPU or GPU incompatible |
Configuration:
Controls how much text the model can “remember”:
| Context Length | RAM Usage | Use Case |
|---|---|---|
| 2048 | Low | Short conversations |
| 4096 | Medium | Standard conversations |
| 8192 | High | Long documents, code |
| 16384+ | Very High | Book-length context |
Configuration:
Enable multiple concurrent inference requests:
Configuration:
Benefits:
Use a smaller “draft” model to speed up inference:
Configuration:
Example Setup:
Settings → Server:
| Setting | Default | Description |
|---|---|---|
| Port | 1234 | API server listening port |
| CORS Origins | * | Allowed origins for CORS |
| Authentication | Disabled | Require API key |
| HTTPS | Disabled | Enable TLS |
LM Studio provides OpenAI-compatible endpoints:
Endpoints:
POST /v1/chat/completions - Chat completionsPOST /v1/completions - Text completionsGET /v1/models - List modelsPOST /v1/embeddings - Generate embeddingsExample Request:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b-instruct",
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello!"}
]
}'
LM Studio also supports Anthropic API format:
Endpoint: POST /v1/messages
Example:
curl http://localhost:1234/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-key" \
-d '{
"model": "llama-3-8b-instruct",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Enable API key authentication for remote access:
Configuration:
Authorization: Bearer YOUR_KEYExample:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{"model": "llama-3", "messages": [...]}'
Enable Developer Mode for per-model presets:
Settings → Developer Mode → Model Presets
Create custom configurations for specific models:
{
"llama-3-8b-instruct": {
"contextLength": 8192,
"gpuOffload": "max",
"temperature": 0.7,
"topP": 0.9,
"maxTokens": 2048
},
"qwen-2.5-72b": {
"contextLength": 4096,
"gpuOffload": 0.8,
"temperature": 0.6,
"topP": 0.85
}
}
Configure custom prompt templates per model:
Settings → Developer Mode → Prompt Templates
Example (ChatML format):
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Configure RAG (Retrieval-Augmented Generation) settings:
Settings → RAG:
| Setting | Default | Description |
|---|---|---|
| Max Documents | 10 | Maximum attached documents |
| Chunk Size | 512 | Text chunk size for retrieval |
| Top K | 5 | Number of chunks to retrieve |
| Similarity Threshold | 0.7 | Minimum similarity score |
Settings → Display → Theme:
Enable advanced options:
Settings → Developer Mode:
Configure LM Link for remote instance connections:
Settings → LM Link:
| Setting | Description |
|---|---|
| Enable Remote Connections | Allow remote clients |
| Permission Keys | Generate client access keys |
| E2E Encryption | End-to-end encryption (via Tailscale) |
# Custom API endpoint
export LMSTUDIO_API_BASE=http://localhost:1234
# Authentication token
export LMSTUDIO_API_KEY=your-token
# Timeout settings
export LMSTUDIO_TIMEOUT=30
The CLI (lms) uses LM Studio’s main configuration. No separate config needed.
lms server statusAny questions?
Feel free to contact us. Find all contact information on our contact page.