One of the core aspects of modern DevOps process is the CI/CD pipeline, where newly built deployment artefacts can be easily pushed into our various environments, automating the tedium of deployment and going live.
In terms of convenience, this is great. Complicated deployment procedures are turned into turn-key automation, enabling us to get new software into testing or production, and getting feedback on our work has never been easier.
However, when we’re building these systems, security is often a secondary thought, or not considered at all, and it’s all too easy to use highly-privileged credentials to set up these deployment roles.
We recently built out a deployment system for AWS Lambda, using Terraform, and it takes a lot of thought to consider what, exactly, our deployment role should be able to do, but also what we’re trading when we limit things like that.
The Need
AWS Lambda is easy to deploy and easy to work with, and there’s a myriad of opinions on how to deploy code to it.
We wanted to be able to use our standard TravisCI-based build process for pushing new versions of a codebase to AWS Lambda, by using the industry-standard tool Terraform to manage the deployment process.
As Terraform requires AWS API access, the fastest way to achieving this goal is just to make standard AWS access keys and insert them into the TravisCI UI. While this is largely safe, in that TravisCI is well-defended and manages their own breach detection, including highly-privileged access keys here is still a risk if someone breaches your source code repository or one of your developers’ workstations.
Because TravisCI has to be trusted, those access keys are available in the clear, and so anyone with access to your source code repository must have access to those keys. This would be the same with any CI provider, because any CI provider would be in the same position of high trust.
So what should we be doing?
Principle of Least Access
In this, and every case, we should be thinking about how little access we need to get the job done.
When we built our system to deploy into AWS Lambda, we asked what we needed and how we should be thinking about deploying functions.
We decided on using a versioned S3 bucket to provide a historic record of function payloads, and a limited execution role to be passed to the Lambda itself.
From those decisions, the smallest set of permissions needed to deploy a function would be: - Write to an S3 bucket - List, create and delete functions - Pass in an IAM role for execution
It’s important to note that our deployment isn’t going to be creating the S3 bucket, or the IAM role needed for execution.
S3 Bucket
The S3 bucket permission is fairly straightforward, we need to be able to upload payload files, and know where to upload them. This part of our role won’t need to delete or modify existing files, as we’re focussing solely on letting S3 manage the historic record.
List, Create, Modify
The second piece is being able to modify functions. Because the CI system is authoritative when it comes to the deployment of functions, it needs to have the permissions to make these modifications.
However, the CI system should only be authoritative over its own functions. In our design, we implemented this by requiring a CI-specific prefix for the function names, ensuring that functions created through other means couldn’t be touched.
Pass a Role
Finally, a newly-created function needs to have an execution role associated. This role determines what a function is capable of doing, and this is probably the most important aspect of ensuring a consistent security profile when building this sort of CI service.
In order to create a function, the deployment role needs to be able to pass a role. In general, this could be any role, from the most basic AWSLambdaBasicExectionRole
to the core admin role.
In order to ensure we’re not able to assign an admin role to our function, we set up our deployment role to only be able to assign a single, pre-determined execution role, and only that role.
By doing this we can be assured that our functions can’t leak elevated access, and that functions can never do anything more than we expect.
Implications
Limiting our deployment role to this extent does come with a major potential downside, in that it introduces a gatekeeping stage every time we need to deploy functions which require different levels of access to our AWS accounts.
This kind of gatekeeping can be one of the major drivers behind creating a shadow IT environment, because it interferes in rapid testing and iteration processes.
This level of deployment role lockdown may not be appropriate for your environment, but it is necessary to consider and have the conversation in your team as to its necessity and impact, as well as the impact of not implementing these ideas.
Code Examples
So what does the setup for this deployment role look like in practise? Let’s look at some Terraform code to set it up:
The Policies
This first block of code defines the IAM policies required to create AWS Lambda functions. These policies are the core that enables what our CI role can do.
data "aws_iam_policy_document" "lambda_create" {
# So that CI can bind the Lambda to the execution role
statement {
actions = [
"iam:PassRole",
"iam:GetRole",
]
# Allows us to only provide one role to our lambda function
resources = [
"arn:aws:iam::account-id:role/role-name",
]
}
# So that we can create and modify all the Lambdas
statement {
actions = [
"lambda:CreateAlias",
"lambda:CreateFunction",
"lambda:GetPolicy",
"lambda:DeleteFunction",
"lambda:GetFunction*",
"lambda:ListFunctions",
"lambda:ListVersionsByFunction",
"lambda:PublishVersion",
"lambda:UpdateAlias",
"lambda:UpdateFunctionCode",
"lambda:UpdateFunctionConfiguration",
]
# But only the ones with the prefix
resources = [
"arn:aws:lambda:region:account-id:function:cicd_prefix_*",
]
}
}
data "aws_iam_policy_document" "s3" {
# Allows our CI provider to list the payloads bucket
statement {
actions = [
"s3:ListBucket",
"s3:GetBucketLocation",
]
resources = [
"arn:aws:s3:::example-deployment-bucket",
]
}
statement {
# So we can update files in our own bucket, but not get or delete them.
actions = [
"s3:PutObject",
"s3:PutObjectAcl",
]
resources = [
"arn:aws:s3:::example-deployment-bucket/*",
]
}
}
The User
The second part is the user itself, and the group membership. By creating the user in this way and isolating it from the existing roles in AWS, we’re able to strongly control our role capabilities, and providing a skeleton for creating new deployment roles in the future.
# Creates an AWS user to hold the CI deployment role
resource "aws_iam_user" "ci" {
name = "ci_user"
}
resource "aws_iam_group" "ci_group" {
name = "ci"
path = "/ci/"
}
resource "aws_iam_group_membership" "ci_group_membership" {
name = "ci-group-membership"
users = [
"${aws_iam_user.ci.name}",
]
group = "${aws_iam_group.ci_group.name}"
}
# Create the policy that permits creating Lambda functions
resource "aws_iam_policy" "allow_create_lambda" {
name = "allow_create_lambda"
path = "/ci/"
description = "allows limited lambda source creation and modification"
policy = "${data.aws_iam_policy_document.lambda_create.json}"
}
# Create the policy that permits uploading Lambda functions to S3
resource "aws_iam_policy" "allow_ci_S3" {
name = "allow_ci_S3"
path = "/ci/"
description = "allows limited S3 access for CI"
policy = "${data.aws_iam_policy_document.s3.json}"
}
# Connect the policies to the group that our CI user is a part of
resource "aws_iam_group_policy_attachment" "lambda_attach" {
group = "${aws_iam_group.ci_group.name}"
policy_arn = "${aws_iam_policy.allow_create_lambda.arn}"
}
resource "aws_iam_group_policy_attachment" "s3_allow" {
group = "${aws_iam_group.ci_group.name}"
policy_arn = "${aws_iam_policy.allow_ci_S3.arn}"
}
# Creates credentials that can be used in your CI provider of choice
resource "aws_iam_access_key" "ci_credentials" {
user = "${aws_iam_user.ci.name}"
}
Implications
Pulling the credentials as we are in aws_iam_access_key
does have the implication that these credentials are being written into the Terraform state file, which may be inappropriate for your threat model.
If it is inappropriate, generating the access keys from the AWS console will be a better option, and should be explored instead.