Ansible is an easily extensible framework for doing just about anything you want. It's somewhat opinionated in how you should structure elements and files, and I've modified the stock Ansible approach create a straightforward opinionated algorithm on how to spin up stacks of just about anything you want in AWS. I'm going to run through some of the playbooks that I've built and discuss any best practices I've discovered.

Note that in this article set I'll assume a moderate level of familiarity with Ansible. If this is your first time using the tool or with YAML, you should read the documentation for getting started and familiarize yourself with the basic concepts before proceeding.

But wait... Why not just use Cloud Formation?

Frankly, because I don't like it. If you're an AWS evangelist or you really enjoy ctrl + fing around a gigantic JSON blob then great! Not my cup of tea. Ansible provides an agentless way of orchestrating EC2 instances and the miscellaneous other AWS lego bricks in a fairly powerful and concise way, and if you structure your playbooks, roles, and tasks well you get a great deal of re-use.

Using Ansible is not without its drawbacks though.

First, there's missing support for some important pieces of the AWS stack, but that issue should be alleviated as the project moves forward with subsequent releases. Next, there's some inconsistency with some common feature sets that apply across modules--notably some modules missing the capability to tag something, for example. Since I rely heavily on tags to create a structured grouping of the various bits in my stack; this is a little disappointing. The good news is that Ansible is constantly under development, and it's open source: so I supose I should stop complaining and go fix it already ;).

In this series of posts, I'll try to cover all the pain points I've run into while building the various groupings of popular components, along with the scenarios where Ansible really shines. I will also selfishly hit on my own organizational style for playbooks, tasks, handlers, and so forth. Feel free to inject your own style where applicable. I'll also describe how I think about the bits of AWS as we create those stacks, as when you start looking at scripting across availability zones inside a region it can get a little wonky.

Let's start some the basics around project organization.

Project Organization

Ansible's terminology is less catchy than Chef's cooking metaphor--but it's quite easy to understand. A playbook is a grouping of hosts associated with a set of roles that apply tasks to all the hosts associated with the playbook. There's two sets of operations we can perform when applied to AWS and other IaaS/PaaS platforms. First, we can create some resources, and then we can operate on those resources however we want. The create piece of the puzzle needs to be executed from localhost, and operational tasks are executed on the target host in an agentless fashion. I like that this forces us to separate creational and operational type tasks into different host groups, but it can cause some repetition.

I usually organize my playbooks into one playbook per stack included in one global playbook. Think of the stack playbooks as functions, and the top level playbook as your main(). The global entry point allows me to build roles and tasks for building global AWS resources like VPCs, subnets, elastic IPs, and other networking bits to be used later inside stack-specific playbooks and roles. It's a good idea to group components into single playbooks based on the component type (like databses, or web boxes) and attempt to break down creational and operational roles in such a way that they can be reused across hosts as much as possible. A notable exception, however is that I prefer to duplicate creational logic for EC2 instances across, say, an instance destined to be a node in a MongoDB cluster and an instance destined to serve an application via Node.js. I do this because there are often some AWS specifics for each stack type that needs to be addressed. For example, I like to create security groups before I spin up any EC2 instances to ensure we have proper protection on ports and protocols not relevant to what the box does, and there's often different requirements for ports based on what you're building.

Directory structure

Ansible has some opinionated documentation on how directory trees of ansible files should be split up. Also, the command line utility ansible-galaxy comes with a switch called init that will create this directory structure for any roles that you want to build for your playbooks. I don't like the stock layout very much because I find it overly verbose--I'd rather have a clean and tidy structure that only contains the files and folders that I need to do the job. So, a typical ansible directory for my AWS projects looks like this:

- ansible
  +-keys        # for security, keys and whatnot.
    |-MyApp_MongoDB.pem
    |-MyApp_NodeJs.pem
  +-regions     # file per AWS region
    |-myapp_us-west-2_mean-stack.yml
    |-myapp_us-west-2_kafka-stack.yml
  +-roles       # structure per operational/creational action
    +-create_webtier_ec2
      +-tasks
        |-main.yml
    |-create_dbtier_ec2
    |-install_node
    |-install_mongo
  +-vars
    |-aws_config.yml
    |-mongo_config.yml
    |-node_config.yml
  |-ansible.cfg     # miscellaneous ansible stuff
  |-credentials.yml # encrypted AWS keys
  |-mean-stack.yml  # primary playbook, includes others
  |-webtier.yml     # web tier playbook
  |-dbtier.yml      # database tier playbook
  |-vault-password.txt
  |-key-password.txt
  |-unlock-keys.sh
  |-lock-keys.sh

Compared to the default Ansible this is very short and sweet: we're lacking the myriad of directories that a run of ansible-galaxy init produces. The prescribed way isn't wrong; I simply prefer to reduce the noise by only including things I actually care about. Couple of things to note:

The regions directory we'll cover a little later in this post, but as you've already guessed I place region specific bits in this file and import them into the main playbook with a command line switch that looks something like this:

ansible-playbook --vault-password-file=vault-password.txt --extra-vars "aws_region=us-west-1 deployment_group=mean_stack" full-stack.yml 

The objective is to categorize bits of our stack into a deployment group and slice that by region. This allows us to re-use the same set of scripts for creating any number of components of the same type without stepping on one another. The details of why lie deep within how the Ansible EC2 module reacts to the exact_count tag, and we'll get into that when we dissect a deployment in-depth.

I prefer placing variables that do not change at all or very infrequently inside cleverly named files within the vars directory: one file per component. The general rule of thumb here is any attribute tied to a region should be placed in the region's deployment blob, and anything that is tied directly to the stack itself should live inside one of these config files. A great example of something that changes very infrequently is the ec2_instance_type for a given node. Other examples include port numbers, component specific directories, Git repository URLs, etcetera.

aws_config.yml

---
ec2_ssh_user: ec2-user

mongo_config.yml

---
mongo_spec:
  ssh_keypair_name: MongoDB.pem
  port: 27017
  ec2_node_size: m3.large # swap to t2.micro for dev
  ec2_instance_monitoring: no
  ec2_assign_public_ip: yes
  ec2_wait: yes
  ec2_wait_timeout: 300
  ebs_volumes:
    - device_name: /dev/xvda
      root_volume: true
      volume_size: 8 # in GB
      type: gp2 # standard vanilla SSD
      delete_on_termination: true

The roles directory contains a sub-folder for each step in the process for creating stacks. Each sub-folder can have several additional folders for various Ansible features, but let's start simple with a tasks directory containing a main.yml file. As mentioned earlier, I usually divide all roles into creational and operational categories conceptually. As you can see above usually, any directory starting with create_ is a creational role, and any directory starting with install_ is an operational role; there are bound to be exceptions, however.

The ansible.cfg file can contain a myriad of options that I won't cover in-depth, but you can read more about them. A few interesting settings I have on by default are below.

ansible.cfg

# because those prompts irritate me.
[defaults]
host_key_checking = False

# For bastion hosts
[ssh_connection]
control_path = %(directory)s/%%h-%%r

I like to dump my AWS key information generated from a single IAM user built for AWS automation into a credentials.yml variables file encrypted with ansible-vault. Obviously this single user should only have the appropriate policies for orchestrating the kinds of things you're building: often EC2FullAccess is good enough. Depending on how complex you wish to be with IAM users and permissions, you may need to store quite a few keys in this file, and I'd recommend devising a naming system along with designing parameterized roles.

credentials.yml

---
aws_key: MY_KEY
aws_secret_key: MY_SECRET_KEY

The keys directory contains SSH keys--one per stack type--that are used by Ansible to remote into the machine it orchestrating. If you'd like to avoid public IPs and you're not in a Direct Connect scenario, you can configure the framework to use a bastion host pattern, however that's fairly complex and we'll cover it in-depth in another post. I like to encrypt the PEM files and store them along side the Ansible scripts in Git, so I wrote a couple of script for encrypting and decrypting my keys using OpenSSH.

lock-keys.sh

#~/bin/bash
echo "+---------------------------------------------------------------+"
echo "| This script requires all keys to be in the keys directory.    |"
echo "| Enter a password to encrypt all unencrypted PEM files in the  |"
echo "| keys directory. Unencrypted PEM files will be deleted after   |"
echo "| encrypted. You may check in encrypted PEM files.              |"
echo "+---------------------------------------------------------------+"

if ! [ "$(ls -A keys/*.pem)" ]
  then
    echo "Target directory has no PEM files, cannot continue."
      exit 1
fi

read -s -p "Password: " PASSWORD
echo

# fail if openssl bails
set -e
for f in keys/*.pem
do
  # do the thing
  echo "Encrypting $f..."
  openssl aes-256-cbc -salt -a -e -in $f -out $f.encrypted -k $PASSWORD

  if [ -f $f.encrypted ]
    then
      rm -rf $f
  fi
done
unset -e

unlock-keys.sh

#~/bin/bash
echo "+-------------------------------------------------------------+"
echo "| This script requires all keys to be in the keys directory.  |"
echo "| After entering the password, this script will decrypt all   |"
echo "| PEM files in the keys directory. Please do NOT check these  |"
echo "| files into source control.                                  |"
echo "+-------------------------------------------------------------+"

if ! [ "$(ls -A keys/*.pem.encrypted)" ]
  then
    echo "Target directory is empty, cannot continue."
      exit 1
fi

read -s -p "Password: " PASSWORD
echo

# fail if openssl bails
for f in keys/*.pem.encrypted
do
  # do the thing
  echo "Decrypting $f..."
  FILENAME=$(sed -e 's|.pem.encrypted|.pem|' <<< $f)
  openssl aes-256-cbc -salt -a -d -in $f -out $FILENAME -k $PASSWORD

  if [ $? -ne 0 ]
    then
    echo "Failure to decrypt file, exiting..."

    # Clean up the bad file
    if [ -f $FILENAME ]
      then
        rm -rf $FILENAME
    fi

    # Bail
    exit 1
  fi

  # General file clean up, happens if command is successful
  if [ -f $FILENAME ]
    then
      rm -rf $f
  fi
done

Both the key-password.txt and the vault-password.txt contain passwords in plain text for unlocking keys and unlocking the ansible-vault encrpyted bits respectively. If you're paranoid about storing plain text passwords on your local machine you can always memorize them and pass the --ask-vault-pass option to Ansible during a playbook run. As you've likely noticed, my encryption utility scripts both require manual keying of the password.

Playbooks

In the sample directory structure there are three playbooks: a base my-stack.yml, one for our web tier (webtier.yml) and one for our database tier (dbtier.yml). It's super easy to import other playbooks like so:

mean-stack.yml

---
# First, let's create the stacks in AWS. This includes building
# out all the required EC2 instances, load balancers, jump boxes
# VPCs, etc.
- hosts: localhost
  gather_facts: False
  connection: local
  vars_files:
    - config.yml
    - credentials.yml
    - "regions/myapp_{{ target_region }}.yml"
  roles:
    - role: initialize_aws

# include configuration and building of each stack
- include: webservers.yml
- include: dbservers.yml

Each individual playbook should import the variable files required for the playbook to run. In our example below we import the relevant variable files from the vars directory along with our deployment blob and the credentials.yml. Ansible will automagically decrypt the credentials file, so don't worry about running any ansible-vault commands before you run the playbook.

webtier.yml

---
- hosts: localhost
  gather_facts: False
  vars_files:
    - aws_config.yml
    - node_config.yml
    - credentials.yml
    - "regions/myapp_{{ target_region }}.yml"
  roles:
    - role: create_ec2_web_stack

# configure the web hosts.
...

dbtier.yml

---
- hosts: localhost
  gather_facts: False
  vars_files:
    - aws_config.yml
    - mongo_config.yml
    - credentials.yml
    - "regions/myapp_{{ target_region }}.yml"
  roles:
    - role: create_ec2_mongo_stack

# configure the mongo hosts
...

Deployment blobs

A typical deployment blob looks like this:

myapp_us-west-2_mean-stack.yml

deployment:
  group: "{{ deployment_group }}"
  region: "{{ aws_region }}"
  vpc_cidr: 172.23.0.0/16
  nodejs:
    ec2_ami_id: ami-31490d51
    availability_zones:
      - zone: "us-west-2b"
        ensure_count: 2
        subnet_cidr: 172.23.1.0/24
      - zone: "us-west-2c"
        ensure_count: 2
        subnet_cidr: 172.23.2.0/24
  mongodb:
    ec2_ami_id: ami-31490d51
    replica_set: "myapp_{{ aws_region }}_replica"
    leader:
      zone: "us-west-1b"
      subnet_cidr: 172.23.3.0/24
    availability_zones:
      - zone: "us-west-1b"
        ensure_count: 0
        subnet_cidr: 172.23.2.0/24

The group attribute is a way to identify this particular deployment as explained earlier; I usually pass this in on the command line.

The region attribute is there to maintain consistency inside consuming playbooks. You could just as easily just pull from the aws_region variable.

Each component has it's own special set of instructions that describe bits unique to the software we're installing. Mongo, for example, requires a leader node if we want a replica set--so I've included that in the relevant sections. One piece that is common to all components should be the availability_zones list. Elements inside the list are fairly concise containing the zone name, the total number of instances you want for the target component inside the zone, and a way of identifying the subnet for all instances to be assigned to.

If you'd rather create your networking by hand instead of using the subnet_cidr approach in each zone definition you can replace them with subnet_id blocks and push those values into the various fields required by Ansible's built in AWS modules. This saves you from scripting some more static blocks of your AWS architecture and nets you a little faster execution time of your playbook. Alternatively, you can script all the networking pieces of your VPC into a static playbook that you run, say, once per region as a pre-deploy step.

Summary

  • Encrypt your EC2 keys and store them alongside your project. There's a sample script above you can take if you'd like, but storing these keys alongside the code makes it faster for other developers to work alongside you. Obviously if your security team doesn't like this, do what they say and ignore me.
  • Store region specific deployment bits and parameterize your scripts accordingly. This fits in nicely with AWS's region and availability zone concepts. If you build your scripts assuming they can be executed and reproduced per region you will have a more encapsulated approach--and that's a Good Thing™.
  • Place static, component specific bits in files inside the vars directory, one per component and name them well; trust me it's important. I prefer this over defaults per task, unless your task is a special case and truly needs defaults. Don't use a feature for the sake of using it.
  • Because of the way Ansible uses tags to identify instances, introduce another level of grouping around your components or you'll find you've scaled or nuked instances you didn't mean to. I call it deployment group and store it inside the deployment blob per region. I'll show you how to use this when go in-depth on stack creation in another post.
  • Break playbooks up into a single entry point main() and import other, parameterized playbooks to perform the tasks you need. This will encourage re-use in your project, and you may eventually find some common themes across many projects. Don't Repeat Yourself--well mostly anyway.