This guide covers downloading, installing, and configuring LM Studio for running local large language models (LLMs) on your hardware.
Visit lmstudio.ai and download for your platform:
| Platform | Installation |
|---|---|
| Windows | Download .exe installer |
| macOS | Download .dmg (Intel/Apple Silicon) |
| Linux | Download .AppImage or .deb |
Windows:
# Run the downloaded installer
LM-Studio-Setup.exe
macOS:
# Drag LM Studio to Applications folder
Linux:
# Make AppImage executable
chmod +x LM-Studio-*.AppImage
./LM-Studio-*.AppImage
# Or install .deb package
sudo dpkg -i lm-studio_*.deb
The desktop application provides the full GUI experience with model discovery, chat interface, and settings.
Download: https://lmstudio.ai
For server/cloud/CI deployments without GUI:
macOS/Linux:
curl -fsSL https://lmstudio.ai/install.sh | bash
Windows:
irm https://lmstudio.ai/install.ps1 | iex
The CLI tool ships with LM Studio. After installing LM Studio:
# Verify installation
lms --help
# Start interactive chat
lms chat
# Download a model
lms get llama-3
# List models
lms ls
# Load a model
lms load llama-3-8b-instruct
| Component | Requirement |
|---|---|
| OS | Windows 10, macOS 11+, Linux (Ubuntu 20.04+) |
| RAM | 16 GB |
| GPU | 4 GB VRAM (optional, CPU inference supported) |
| Disk | 20 GB free space |
| Component | Requirement |
|---|---|
| OS | Latest Windows/macOS/Linux |
| RAM | 32-64 GB |
| GPU | 12-24 GB VRAM (NVIDIA RTX 3060+, AMD RX 6800+, Apple M1+) |
| Disk | 100+ GB NVMe SSD |
| GPU Type | Support | Notes |
|---|---|---|
| NVIDIA | ✅ CUDA | RTX 20/30/40 series recommended |
| AMD | ✅ ROCm | RX 6000/7000 series |
| Intel | ✅ Arc | A750, A770 |
| Apple | ✅ Metal/MLX | M1/M2/M3 chips |
Start with these models:
| Model | Size | Quantization | RAM Required | Use Case |
|---|---|---|---|---|
| Llama 3.2 3B | 3B | Q4_K_M | 4 GB | Fast responses, low RAM |
| Llama 3.2 1B | 1B | Q4_K_M | 2 GB | Very fast, minimal RAM |
| Mistral 7B | 7B | Q4_K_M | 6 GB | Good balance |
| Gemma 2 9B | 9B | Q4_K_M | 8 GB | Quality responses |
| Model | Size | Quantization | RAM Required | Use Case |
|---|---|---|---|---|
| Llama 3.1 8B | 8B | Q4_K_M | 6 GB | General purpose |
| Qwen 2.5 14B | 14B | Q4_K_M | 10 GB | Multilingual |
| Mistral Nemo 12B | 12B | Q4_K_M | 10 GB | Coding, reasoning |
| Llama 3.1 70B | 70B | Q4_K_M | 48 GB | Maximum quality |
| Quantization | Size | Quality | Speed |
|---|---|---|---|
| Q2_K | Smallest | Lower | Fastest |
| Q4_K_M | Medium | Good | Fast |
| Q5_K_M | Large | Better | Medium |
| Q6_K | Larger | Best | Slower |
| Q8_0 | Largest | Near-lossless | Slowest |
Recommendation: Q4_K_M offers the best balance for most users.
After installation:
Settings → Inference
Settings → Server
Settings → Display
Enable Developer Mode in Settings for:
Any questions?
Feel free to contact us. Find all contact information on our contact page.