checkmk is a IT monitoring solution based on Nagios Core, offering both a free Raw Edition (CRE) and paid editions (CEE, CME). This guide covers configuration for all checkmk editions.
| File/Directory | Path | Purpose |
|---|---|---|
| Main configuration | /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/main.mk |
Core settings (WATO managed) |
| Host configuration | /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/hosts.mk |
Host definitions |
| Service configuration | /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/services.mk |
Service checks |
| Contact groups | /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/contactgroups.mk |
Alert recipients |
| Timeperiods | /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/timeperiods.mk |
Check schedules |
| Rules configuration | /etc/checkmk/sites/[SITE_NAME]/var/checkmk/web/[USER]/wato/rules.mk |
WATO rules |
| Check plugins | /usr/lib/check_mk/plugins/ |
Custom check plugins |
| Local checks | /etc/checkmk/sites/[SITE_NAME]/local/share/check_mk/local/ |
Local check scripts |
| Log files | /var/log/checkmk/sites/[SITE_NAME]/ |
System logs |
checkmk uses a rule-based configuration system managed through the WATO (Web Administration Tool) interface. Direct file editing is possible but changes may be overwritten by WATO.
# Folder: /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/rules.mk
# Global settings
check_mk_conf['inventory_check_interval'] = 120
check_mk_conf['inventory_check_severity'] = 1
# Check intervals
check_rules['check_interval'] = [
('.*', {'value': 60}), # Default 60 seconds
('server-.*', {'value': 30}), # Servers every 30 seconds
]
# Notification settings
notification_rules = [
{
'description': 'Critical alerts to admins',
'contact_groups': ['admin'],
'time_period': '24x7',
'host_labels': [],
'service_labels': [],
'condition': {
'state': 'crit',
},
},
{
'description': 'Warning alerts via email',
'contact_groups': ['ops-team'],
'time_period': 'workhours',
'condition': {
'state': 'warn',
},
},
]
# Folder: /etc/checkmk/sites/[SITE_NAME]/etc/checkmk/hosts.mk
# Define hosts with attributes
all_hosts = [
'webserver01.example.com',
'dbserver01.example.com',
'switch-core-01',
]
# Host attributes
host_attributes = {
'webserver01.example.com': {
'alias': 'Web Server 01',
'address': '192.168.1.10',
'tags': ['web', 'linux', 'production'],
'custom_attrs': {
'location': 'DC1',
'rack': 'A12',
},
},
'dbserver01.example.com': {
'alias': 'Database Server 01',
'address': '192.168.1.20',
'tags': ['database', 'linux', 'production'],
},
}
Configure how often checks are performed:
# In rules.mk
check_interval_rules = [
# Critical services checked more frequently
({'service_labels': [{'key': 'critical', 'value': 'true'}]}, 15),
# Standard checks
({}, 60),
# Low-priority checks
({'host_tags': ['low-priority']}, 300),
]
Automatic service discovery configuration:
# Inventory rules
inventory_checks = {
'check_mk': {
'inventory_check_interval': 120,
'inventory_check_severity': 1, # 0=OK, 1=Warning, 2=Critical
},
'df': { # Disk usage
'levels': (80.0, 90.0), # Warning at 80%, Critical at 90%
'show_used': True,
},
'memory': { # Memory usage
'levels': (80.0, 90.0),
},
'cpu': { # CPU usage
'levels': (80.0, 95.0),
},
}
Configure checkmk agents for monitored hosts:
# Agent rules in rules.mk
agent_rules = [
{
'condition': {'host_tags': ['linux']},
'value': {
'sections': ['cpu', 'memory', 'disk', 'network', 'services'],
'piggyback': True,
},
},
{
'condition': {'host_tags': ['windows']},
'value': {
'sections': ['cpu', 'memory', 'disk', 'services', 'updates'],
'piggyback': True,
},
},
]
# Host definition
host_attributes['linux-server-01'] = {
'alias': 'Linux Application Server',
'address': '192.168.1.50',
'tags': ['linux', 'application', 'production'],
'checkgroup_parameters': {
'memory': {'levels': (85.0, 95.0)},
'cpu': {'levels': (80.0, 95.0)},
'df': {'levels': (80.0, 90.0)},
},
}
# Service-specific checks
service_checks['linux-server-01'] = [
('CPU utilization', {'levels': (80, 95)}),
('Memory usage', {'levels': (85, 95)}),
('Disk /', {'levels': (80, 90)}),
('Disk /var', {'levels': (85, 95)}),
('Network eth0', {}),
('Systemd services', {'state_regex': ['failed']}),
]
# SNMP host configuration
host_attributes['switch-core-01'] = {
'alias': 'Core Switch',
'address': '192.168.1.1',
'tags': ['network', 'switch', 'cisco'],
'snmp_credentials': {
'credentials': ('public', '2c', None), # community, version, security name
'snmp_port': 161,
},
}
# Interface monitoring
service_checks['switch-core-01'] = [
('Interface GigabitEthernet0/1', {'levels': (80, 95)}),
('Interface GigabitEthernet0/2', {'levels': (80, 95)}),
('CPU utilization', {'levels': (70, 90)}),
('Memory usage', {'levels': (80, 95)}),
]
host_attributes['win-server-01'] = {
'alias': 'Windows Domain Controller',
'address': '192.168.1.100',
'tags': ['windows', 'domain-controller', 'production'],
'checkgroup_parameters': {
'windows_updates': {'levels': (7, 14)}, # Days since last update
},
}
service_checks['win-server-01'] = [
('CPU utilization', {'levels': (80, 95)}),
('Memory usage', {'levels': (85, 95)}),
('Disk C:', {'levels': (80, 90)}),
('Windows Services', {'state_regex': ['stopped']}),
('Windows Updates', {}),
('Event Log Errors', {'levels': (5, 10)}), # Count of recent errors
]
# contactgroups.mk
contactgroups = {
'admin': {
'alias': 'Administrators',
'members': ['admin1', 'admin2'],
},
'ops-team': {
'alias': 'Operations Team',
'members': ['ops1', 'ops2', 'ops3'],
},
'dba-team': {
'alias': 'Database Team',
'members': ['dba1', 'dba2'],
},
}
# Contact definitions
contacts = {
'admin1': {
'alias': 'John Admin',
'email': 'john@example.com',
'pager': '+1234567890',
},
'ops1': {
'alias': 'Jane Ops',
'email': 'jane@example.com',
},
}
# notification_rules.mk
notification_rules = [
# Critical alerts - immediate notification
{
'description': 'Critical - All Hosts',
'contact_groups': ['admin'],
'time_period': '24x7',
'condition': {
'state': 'crit',
'host': True,
},
'notification_plugin': 'mail',
'escalation': {
'first_notification': 1,
'last_notification': 10,
'notification_interval': 5,
},
},
# Warning alerts - business hours only
{
'description': 'Warning - Business Hours',
'contact_groups': ['ops-team'],
'time_period': 'workhours',
'condition': {
'state': 'warn',
},
'notification_plugin': 'mail',
},
# Database-specific alerts
{
'description': 'Database Critical',
'contact_groups': ['dba-team', 'admin'],
'time_period': '24x7',
'condition': {
'state': 'crit',
'service_labels': [{'key': 'type', 'value': 'database'}],
},
'notification_plugin': 'mail,sms',
},
]
# timeperiods.mk
timeperiods = {
'24x7': {
'alias': '24 Hours a Day, 7 Days a Week',
'monday': ['00:00-24:00'],
'tuesday': ['00:00-24:00'],
'wednesday': ['00:00-24:00'],
'thursday': ['00:00-24:00'],
'friday': ['00:00-24:00'],
'saturday': ['00:00-24:00'],
'sunday': ['00:00-24:00'],
},
'workhours': {
'alias': 'Work Hours (8AM-6PM)',
'monday': ['08:00-18:00'],
'tuesday': ['08:00-18:00'],
'wednesday': ['08:00-18:00'],
'thursday': ['08:00-18:00'],
'friday': ['08:00-18:00'],
},
'night': {
'alias': 'Night Hours',
'monday': ['18:00-08:00'],
'tuesday': ['18:00-08:00'],
'wednesday': ['18:00-08:00'],
'thursday': ['18:00-08:00'],
'friday': ['18:00-08:00'],
'saturday': ['00:00-24:00'],
'sunday': ['00:00-24:00'],
},
}
# Activate changes via command line
omd activate [SITE_NAME]
# Or use the WATO interface:
# 1. Navigate to Setup > Changes
# 2. Review pending changes
# 3. Click "Activate on selected sites"
# Check configuration syntax
omd config [SITE_NAME]
# View configuration errors
tail -f /var/log/checkmk/sites/[SITE_NAME]/web.log
# Restart entire site
omd restart [SITE_NAME]
# Restart specific components
omd restart [SITE_NAME] -r cmc # Checkmk Core
omd restart [SITE_NAME] -r mkeventd # Event Console
omd restart [SITE_NAME] -r dcd # Discovery
# Reload configuration without full restart
omd reload [SITE_NAME]
# Check site status
omd status [SITE_NAME]
https://[SERVER]/[SITE_NAME]# Check site configuration
omd config [SITE_NAME]
# List all hosts
cmk -I [SITE_NAME] --list-hosts
# Preview service discovery
cmk -I [SITE_NAME] hostname.example.com
# Test a specific check
cmk -v hostname.example.com
# Check notification configuration
cmk -n hostname.example.com
# Send test notification
cmk -n test@example.com --test-notification
# Check notification log
tail -f /var/log/checkmk/sites/[SITE_NAME]/notifications.log
# Verify contact configuration
cmk -C --contacts
# Run service discovery
cmk -I hostname.example.com
# Preview changes without applying
cmk -I --check-hostname hostname.example.com
# Remove vanished services
cmk -I --remove-old-services hostname.example.com
/etc/checkmk/sites/ regularlyEvery deployment is unique. We provide consulting for:
Get personalized assistance: office@linux-server-admin.com | Contact Page