Blog

Multi-Cloud Docker Swarm with Terraform and Ansible

Since the release of Docker 1.12, there’s a new Swarm mode that is baked into the Docker engine. I wanted to spend some time, after months of Kubernetes-only work, to check out how Swarm was doing things and to see how easy it was to get started. Building a quick cluster on your laptop or on a single provider seemed to be straight forward, but I couldn’t readily find a no nonsense way to spin one up across multiple clouds. So, you know, I went ahead and built one.

Today, we’ll walk through how you can create a multi-cloud Swarm on AWS and GCE. We will use Terraform and Ansible to complete the bootstrap process, which is surprisingly straightforward. You can go directly to the Github repo where I stashed the code by clicking here.

I’ll give an early preface and say that I’ve only used this for some testing and learning experience. It’s in no way prod. ready or as robust as it should be to accept lots of different configurations.

Outline

The deployment of our cluster will occur in the following order:

  • AWS infrastructure is provisioned (security groups and instances)
  • GCE infrastructure is provisioned (firewall rules and instances)
  • An Ansible inventory file is created in the current working directory
  • Docker is installed and Swarm is initialized

Terraform Scripts

In order to create our infrastructure, we want to create three terraform scripts and a variable file. This will provide all of the necessary information that Terraform needs to do it’s thing.

  • Create four files: touch 00-aws-infra.tf 01-gce-infra.tf 02-create-inv.tf variables.tf
  • Open variables.tf for editing. We’ll populate this file with all of the configurable options that we will use for each cloud, as well as some general info that the instances have in common, regardless of cloud. Populate the file with the following:
  • You can update these defaults if you desire, but also know that you can override these at runtime with the -var flag to terraform. See here for details.

  • Now that we’ve got the variables we need, let’s work on creating our AWS infrastructure. Open 00-aws-infra.tf and put in the following:

Walking through this file, we can see a few things happen. If you’ve seen Terraform scripts before it’s pretty straight forward.

  • First, we simply configure a bit of info to tell Terraform to talk to our desired region that’s specified in the variables file.
  • Next, we create a security group called swarm_sg. This security group allows ingress from all of the ports listed here.
  • Finally, we’ll create all of the nodes that we plan to use in AWS. We’ll create the master instance first, simply because it’s tagged differently, then we’ll create the workers. Notice the use of ${var... everywhere. This is how variables are passed from the vars file into the desired configuration of our nodes.

It’s now time to create our GCE infrastructure.

  • Open 01-gce-infra.tf and paste the following:

Taking a read through this file, you’ll notice we’re essentially doing the same thing we did with AWS:

  • Configure some basic info to connect to GCE.
  • Create firewall rules in the default network to allow ingresses for Swarm.
  • Create the Swarm members in GCE.

We’re almost done with Terraform! The last bit is we need to take the infrastructure that gets created and create an inventory file that Ansible can use to provision the actual Docker bits.

  • Populate 02-create-inv.tf:

This file simply tells Terraform, after all infrastructure has been created, to drop a file locally called swarm-inventory. The file that’s dropped should look like (real IPs redacted):

Ansible Time!

Okay, now that we’ve got the Terraform bits ready to deploy the infrastructure, we need to be able to actually bootstrap the cluster once the nodes are online. We’ll create two files here: swarm.yml and swarm-destroy.yml.

  • Create swarm.yml with:

This Ansible playbook does a few things:

  • Bootstraps all nodes with the necessary packages for Ansible to run properly.
  • Installs Docker prerequisites and then installs Docker.
  • On the master, initializes the swarm and grabs the key necessary to join.
  • On the nodes, simply joins the swarm.

Now, that’s really all we need. But while we’re here, let’s make sure we can tear our Swarm down as well.

  • Create swarm-destroy.yml:

That one’s really easy. It really just goes to each node and tells it to leave the Swarm, no questions asked.

Create Swarm

Okay, now that we’ve got all the bits in place, let’s create our swarm.

  • First source AWS API keys with source /path/to/awscreds.sh or export ....
  • Create the infrastructure with terraform apply. Keep in mind that you may also want to pass in the -var flag to override defaults.
  • Once built, issue cat swarm-inventory to ensure master and workers are populated.
  • Bootstrap the Swarm cluster with ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -b -i swarm-inventory swarm.yml.

In the just a couple of minutes, these steps should have been completed successfully. If all looks like it went okay, SSH into the master node.

  • Issue docker node ls and view all the nodes in the Swarm. You’ll notice different hostnames between AWS and GCE instances:

Test It Out

Now that we’ve got our Swarm up, let’s create a scaled service and we’ll see it show up on different environments.

  • Issue docker service create --replicas 5 --name helloworld alpine ping google.com on the master.
  • Find where the pods are scheduled with docker service ps helloworld:
  • SSH into the GCE worker and find the containers running there with docker ps.
  • Show that the containers are pinging Google as expected with docker logs <CONTAINER_ID>
  • Do the same with the AWS nodes.

Teardown

Once we’re done with our test cluster it’s time to trash it.

  • You can tear down just the Swarm, while leaving the infrastructure with ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -b -i swarm-inventory swarm-destroy.yml
  • Tear down all the things with a simple terraform destroy

That’s it! I was happy to get a cross-cloud Swarm running pretty quickly. Over the next few weeks, I’ll probably come back to revisit my Swarm deployment and make sure some of the more interesting things are possible, like creating networks and scheduling webservers. Stay tuned!

Author: Spencer Smith