Terraform is one of the most important infrastructure-as-code tools in modern DevOps tooling, with broad cross-platform cloud provider support, as well as a considerable body of provisioning tools for services like Postgres, monitoring and logging, and numerous others.
It’s an incredible resource.
I use Terraform predominately with AWS, including into production with our clients. For us, it’s the right tool to manage infrastructure-as-code.
But, and there’s always a but, Terraform is not perfect.
Sharp Edges
As part of good, modern development practice Terraform supports code reuse in the form of modules, providing a means to abstract away concepts and ideas to make it theoretically easy to share and reuse.
And it’s that “theoretically” that’s important.
Terraform directly implements the AWS API and it is bound by the limitations and design principles of those APIs, bound by the history of design decisions made a decade or more ago.
As a result, it can be quite difficult to build generic modules in Terraform, code that provides a composed building block for more complex infrastructure. I’ve run into this over and over again, from load balancers to autoscaling groups to other edge cases.
These pieces often resist being put in a module by requiring too many specific customisation options and too many environment-specific settings. There’s no way to make them generic.
Building Better Tools
So what I’ve realised is that trying to make these components into generic modules is the problem. How I want to design infrastructure, from IAM roles to VPC layout to how to configure a load balancer is driven by my understanding of solutions architecture and good security principles.
By trying to make modules generic, I was undermining myself, preventing myself from building reusable infrastructure-as-code that fit my opinions and design principles. By trying to build smaller parts, I was prevented from building bigger and more important pieces easily and effectively.
So, this is what I’ve learned from writing infrastructure-as-code with Terraform.
Be Extremely Opinionated
Your environment isn’t mine, and can’t be mine. How you think and design, while informed by similar best practises and choices, cannot be how I design.
The things you need will not be the things I need, and should not try to be.
The modules that I am now building assert that Lambda functions should always have logging enabled and always read their payloads from S3. That my S3 bucket policies should always be role-based, instead of user-based.
That security groups between resources is based not on IPs, but on access labels that are easily changed or revoked.
These opinions come from a place of should, not can, and that is the driving force of my module designs. By providing not capability but opinions, I make it easier for myself to use these pieces, and to extend them in ways that are reasonable for me and my environments.
Build Complex Primitives
The primitives that Terraform provides are extremely low-level, being nearly direct implementations of the AWS API. This makes it easy to fall into the idea that our goal will be to build small, generic higher-level primitives out of these very low-level primitives.
This has been a source of pain for me, numerous times.
Instead, build more complex components. Should a VPC always have a bastion, and a public and private subnets? Then your VPC module should bake that in, and not provide a means to turn it off.
Should an autoscale group always have metrics? Then your module should define those, and configure them appropriately.
Should ELBs always log? And Cloudwatch Events? Cloudtrail? Your modules should define these things, the S3 buckets and the read policies, because these are your opinions.
A Single Repo is Useful
There’s some advanced tricks you can pull with Terraform if all your helper modules are in the same repository.
The overrides feature, when combined with symlinks, allows for complex layering of modules, by re-opening and merging in changes to existing definitions. I’ve already used this to define an ECS helper module, and an ELBv2-configured helper module, based off the original ECS helper.
I can now add better logging and new concepts to the base module and know that they will move forwards as expected.
Layering my opinions in this way makes it easier to define production versus non-production modules, having production with more logging, more connections and more capabilities, without breaking the core interface contract exported by the module.
Use Zero Counts
As a very strict declarative language, Terraform lacks a lot of the niceties of imperative programming, like if
statements and loops.
There’s some minor capability to create multiple resources of a single type, using the count =
argument. This can be extremely useful in certain ways, by enabling and disabling aspects of a module as needed.
This feature is limited and needs to be used with care, especially when resources refer to other resources that may not have been created.
Try New Things
Terraform can be very limiting in some ways but there’s a lot of power present in the tool, a great many capabilities that you can use.
Our terraform_lambda_zip
module, for instance, makes extensive use of external data sources in the form of shell scripts, and local provisioners as shell scripts to handle building a complete stable archive for use with S3.
While this is perhaps beyond the normal scope of Terraform, I was able to build opinions on how a Python Lambda function should be built into the deployment tooling. This module can now be spun out into a CI/CD process, making it straightforward to build compatible, repeatable builds, with all of my opinions and goals.
Throw it Away if you Need To
I’ve been taught to be hesitant to discard code or ideas that aren’t working for me, or aren’t keeping up with my new design ideals and necessities.
Fortunately, Terraform has made it relatively easy to rebuild what I’ve built before, to build new abstractions with new goals in mind.
If the design you tried doesn’t work, that’s okay. Rewrite it piece by piece, try again in your test environment, and migrate as you make better designs. Because Terraform is meant to be an iterative tool, this workflow is well-supported and catered to.
Better Infrastructure Today
This is how I approach designing Terraform modules today. How does this encode my opinions? How should this work, to support my goals?
How can I use this in CI/CD, or with remote state
, or any number of excellent powers that Terraform gives me? How should I make my interfaces easy to use by other programmers?
What are the implications of my design choices?
By asking ourselves these questions as we design, we build what works for us, works for our environments, and, crucially, work to make tomorrow easier than today.