Effective error handling is crucial for maintaining robust and reliable Ansible playbooks. Current stable version: Ansible-core 2.20.2 (released January 29, 2026). Here are the latest best practices for handling errors in Ansible:
ignore_errors SparinglyWhile ignore_errors: yes can be useful, it should be used sparingly as it can mask underlying issues. Instead, consider using failed_when to provide more granular control over task failures.
- name: Ensure the service is running
service:
name: apache2
state: started
register: result
failed_when: result.state != 'started'
block, rescue, and alwaysAnsible provides block, rescue, and always keywords to handle errors more gracefully. This is the preferred approach for complex error handling.
- name: Example of block, rescue, and always
block:
- name: Attempt to start the service
service:
name: apache2
state: started
register: service_result
- name: Verify service is running
uri:
url: "http://localhost"
method: GET
register: health_check
retries: 3
delay: 5
until: health_check.status == 200
rescue:
- name: Handle the failure
debug:
msg: "Failed to start the service: {{ ansible_failed_result.msg | default('Unknown error') }}"
- name: Send notification about failure
slack:
token: "{{ slack_token }}"
channel: "#alerts"
msg: "Service start failed on {{ inventory_hostname }}"
color: danger
always:
- name: Ensure the service status is logged
debug:
msg: "Service start attempted on {{ inventory_hostname }}"
failed_when and changed_when for Custom LogicDefine custom failure and change conditions to make your playbooks more resilient:
- name: Run a command that might return non-zero for valid reasons
command: /some/command
register: cmd_result
failed_when: "'ERROR' in cmd_result.stdout or cmd_result.rc > 1"
changed_when: "'CHANGED' in cmd_result.stdout"
Use the until, retries, and delay parameters for tasks that might fail temporarily:
- name: Wait for service to become available
uri:
url: "http://{{ target_host }}:{{ target_port }}/health"
method: GET
register: result
until: result.status == 200
retries: 10
delay: 30
failed_when: result.status != 200
async and poll for Long-Running TasksHandle long-running operations that might exceed SSH timeouts:
- name: Run long-running task asynchronously
command: /path/to/long_running_script.sh
async: 3600 # Allow up to 1 hour
poll: 0 # Don't wait for completion
register: async_task
- name: Check on async task
async_status:
jid: "{{ async_task.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 30
delay: 10
Ensure that all inputs and variables are validated early in your playbooks to prevent errors downstream:
- name: Validate required variables at the beginning
assert:
that:
- database_host is defined and database_host | length > 0
- database_port is defined and database_port is number
- database_port >= 1 and database_port <= 65535
- backup_retention_days is defined and backup_retention_days is number
fail_msg: "One or more required variables are invalid"
success_msg: "All required variables are valid"
Configure detailed logging to help with debugging and auditing:
# ansible.cfg
[defaults]
log_path = /var/log/ansible.log
debug = False
bin_ansible_callbacks = True
callback_whitelist = timer, profile_tasks
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
Make your tasks compatible with check mode (--check flag) when possible:
- name: Perform action only when not in check mode
command: /some/action
when: not ansible_check_mode
notify: restart_service
- name: Show what would happen in check mode
debug:
msg: "Would perform action in check mode"
when: ansible_check_mode
Differentiate between different types of failures and handle them appropriately:
- name: Install package with different error handling
package:
name: "{{ package_name }}"
state: present
register: pkg_result
failed_when: >
pkg_result is failed and
('No package matched' not in pkg_result.msg | default(''))
ignore_errors: yes
- name: Handle specific error cases
fail:
msg: "Package {{ package_name }} not available"
when:
- pkg_result is succeeded
- ('No package matched' in pkg_result.msg | default(''))
Perform error handling or notifications from a central location:
- name: Send notification to centralized system
uri:
url: "https://notification-system/api/alerts"
method: POST
body: |
{
"host": "{{ inventory_hostname }}",
"status": "failed",
"playbook": "{{ ansible_play_name }}",
"timestamp": "{{ ansible_date_time.iso8601 }}"
}
body_format: json
delegate_to: localhost
run_once: true
when: ansible_failed_result is defined
By following these best practices, you can improve the reliability, resilience, and maintainability of your Ansible playbooks in modern DevOps environments.