Angry robot logo

Mumbling about computers

"strict mode" Ansible

Posted on under [ ]

I use Ansible quite a bit for infrastructure automation, both at home and at work. Ansible's main strength is how rich its built-in modules library is.

I find everything else about Ansible bad; to list a few examples:

  • The inventory must gather all details from every host, regardless of running Ansible with --limit
  • The YAML DSL hits its limits pretty quickly when going beyond simple tasks
  • Tooling to deal with Ansible playbooks is extremely limited:
    • Linters detect very few statically-detectable errors, such as undefined variables
    • Testing of playbooks is not really supported at a language level

In my experience, I've found that Ansible does not help you in managing any kind of complexity, such as branching.

If complex code is hard to write correctly, the next best thing is to exercise the code and write expectation tests for it, but this is not supported by Ansible.

There's a project, ansible-molecule, but I feel like it is insanity. Ansible's DSL is unable to deal with complexity, yet you are expected to validate the outcome of hundreds of steps by writing more yaml.

What's worked for us was to avoid all complex logic in YAML and I mean all.

We achieved this by working mainly on two topics:

  • Define all variables via inventory
  • Forcing all usage of variables in YAML to come from our inventory via ansible-lint
  • Disabling logic in YAML via ansible-lint

Defining all variables in the inventory

Ansible prides itself in being idempotent and declarative, yet most code written in Ansible reads the state of the target and takes decisions based on that state. Yes, that is idempotent but it goes against the philosophy of being declarative.

What we've done is to embrace the declarative nature by declaring all of our intended state upfront, in the inventory for each host.

A host may end up looking like:

  - name: rootfs
    sizes: [800, 800]
    raid_level: 1
  - name: user1
    groups: [..]
  - repo1

Forcing all variables to come from our inventory

We have an ansible-lint custom rule which only allows us to use inventory.X for when clauses.

This example does not pass linting:

  src: example
  dest: example

We would instead decide during our inventory whether we should be copying this file, and writing:

  src: example
  dest: example
when: inventory.should_copy_example_file

Disabling all complex logic in YAML

We disabled all complex logic in YAML statements. We don't do and, not, filters, etc.

This would fail

  service: vmware_agent
  enabled: inventory.system_vendor != 'VMWare'

Instead, we would write

  service: vmware_agent
  enabled: inventory.should_have_vmware_agent

which also prevents the use of "indirect" logic.

Removing filters is also interesting, as neither variable|default(true) nor variable is defined are kinds of patterns we would use.

Implementing complex logic

We implement all complex logic as Ansible plugins, which are written in Python and fully unit-tested.

This lets us handle the essentialy complexity of the task without also incurring in emergent complexity from dealing with Ansible's DSL.


When you can't write complex logic in YAML, you can leverage the strength of existing Ansible modules, while dramatically reducing risk and tech debt.

We cannot write the following:

- ansible.builtin.stat:
    path: x
  register: st

    msg: "File not owned by root"
  when: st.stat.pw_name != 'root'

Instead, we'd write a module and use it as:

- assert_ownership:
    path: x
    owner: root

Integration testing

Even when having all modules unit-tested, we need to validate that assumptions made by each task are being upheld by the execution of the previous tasks.

We do this by running our playbooks against a Docker instance, similar to ansible-molecule's Docker driver, but instead of asserting our desired end-state via YAML insanity, we use testinfra, which lines up very nicely with our philosophy