OpenStack-Specific Configuration for Mastodon

Some specific settings for running Mastodon on an OpenStack-powered cloud

Aurynn Shaw
July 02, 2020

Getting Cloud Island launched was a solid 6 weeks of effort, developing the infrastructure-as-code that we felt was necessary for a modern and robust web deployment.

We’ve talked about the design of Cloud Island and about how we think of infrastructure, the three-tier architecture and the benefits that it provides, and why we went to so much extra effort to build beyond the provided docker-compose.yml file when we built out our deployment.

So now I’d like to like to talk about everything that we learned for our deployment on OpenStack, beyond the conventional architecture.

Performance

The initial deployment of Cloud Island is:

  • 1 web server, running 2 CPUs and 2GB of memory,
  • 1 Redis server, running 1 CPU and 1 GB of memory,
  • 1 Postgres server, running 1CPU and 1 GB of memory, and
  • 1 Elasticsearch server, running 1 CPU and 2GB of memory.

This breakdown is to ensure that we can scale each piece as performance bottlenecks present themselves, without having to drastically re-plan how the infrastructure is designed.

That single web server runs the main Web software, the API streaming software, and two Sidekiq background processors. While this has been enough so far, it is only just enough.

The scaling plan going forwards is to split the web service tier into three servers: - The core web server, - an API streaming server, and - a Sidekiq worker server.

Sidekiq workers will need to be split out, as they manage all of federation. As Cloud Island grows and we federate with more servers, this layer is going to be more and more critical to ensuring high performance for end users.

The API streaming server is the front-end for users, and after the web UI has loaded provides all the content that users see. This is currently running a single thread, and is it grows it will desperately need to be scaled.

Recommendations

If you’re setting up your own Mastodon instance and want to follow a similar design pattern and scaling story, I’d stick with the single initial single webserver, and consider preparing a replacement Sidekiq server for later.

The nature of Sidekiq’s interaction with Mastodon means that it’s the easiest to move to its own system and scale, and won’t require any significant changes to your infrastructure, and can be done without an outage.

Moving to a dedicated API server is going to require changes at your nginx layer, to map requests to the new server. Your nginx configuration will already be partially set up for this, but may require an outage.

Media Storage

Out of the box, Mastodon supports Swift, the OpenStack object storage system1, making it trivial to handle user uploads without having to manage the disk space ourselves.

Finding the right configuration settings for Swift at the Mastodon layer took a bunch of trial end error before we hit on the right configuration for Catalyst Cloud. Your OpenStack cloud might require other options being set, or have different formatting for the configuration.

Our basic Swift settings look like this:

SWIFT_ENABLED=true
SWIFT_USERNAME=<user object upload swift username>

SWIFT_PASSWORD=<user object upload password>

SWIFT_PROJECT_ID=<project_id>

SWIFT_AUTH_URL=https://api.nz-hlz-1.catalystcloud.io:5000/v3
SWIFT_CONTAINER=<container name>

SWIFT_OBJECT_URL=https://object-storage.nz-hlz-1.catalystcloud.io/v1/AUTH_<project_id>/<container name>

There’s also a SWIFT_CACHE_TTL= setting present in the configuration file and I have, as of this writing, been unable to figure out what changing this setting does. If you know what it does, please reach out and let me know!

Users and permissions

As part of good security planning and the Principle of Least Privilege, we’ve created a specific user to manage uploading images to Swift. This user doesn’t have access to anything by default, and only has access to upload and delete images from the image store.

The read and write ACLs that we’ve set up for our container are:

read_acl       | .r:*                                                                  
write_acl      | <project_id>:<user uploads username>

This says that anyone in the world can read images out of the image store, which is needed so that images can be shown, and that the image upload user that we’ve created is also able to upload and delete images.

This also disables the ability for anyone to list out the image storage container, which makes it more difficult to discover images indirectly.

The Terraform code we’re using to create this container is

resource "openstack_objectstorage_container_v1" "user_uploads" {
  name = "${var.name}-user-uploads"

  container_read  = ".r:*"
  container_write = "${data.openstack_identity_auth_scope_v3.current.project_id}:${var.mastodon_swift_user}"
}

CORS

CORS, or Cross-Origin Resource Sharing, is a security feature of modern browsers to restrict scripts from accessing resources on other domains.

Since we’re using another domain, in this case the OpenStack Swift object storage, we need to add CORS headers to the container to ensure that Mastodon and user browsers can read uploaded media correctly.

This is set by adding the metadata tag X-Container-Meta-Access-Control-Allow-Origin, which we can do when we declare the container.

This is a simple change to the container definition, and the final declaration is

resource "openstack_objectstorage_container_v1" "user_uploads" {
  name = "${var.name}-user-uploads"

  container_read  = ".r:*"
  container_write = "${data.openstack_identity_auth_scope_v3.current.project_id}:${var.mastodon_swift_user}"

  metadata = {
    X-Container-Meta-Access-Control-Allow-Origin = "https://cloudisland.nz"
  }
}

Alternatives

An alternative we explored for the image storage tier was MinIO, an AWS S3-compatible object store that can proxy Swift. This service has a number of exciting features (like callbacks on object changes) but we ultimately decided that working with Swift natively was the right call for our initial launch.

Part of this decision was that adding another server to the deployment was a new level of operational maintenance we weren’t ready to take on, especially since we’d need to set up another nginx server to act as the media URL, such as media.cloudisland.nz.

The rest was that for now we don’t really need the extra control or capabilities that using MinIO would have offered. The technology behind MinIO is really exciting, and I’ll be exploring it again in the future.

Database and Backups

Without backups, we don’t have a service. We can’t ensure that the service will be around tomorrow, and we can’t ensure that we’re able to recover in the event of disaster.

Good, tested backups are beyond critical.

Database storage for Cloud Island is managed as a high-speed SSD volume attached to the database instance, so that we can easily resize it as needed. The backups themselves are generated and uploaded with the tool swiftbackmeup, which handles the backup and upload stages for us.

swiftbackmeup uses pg_dump to handle the backups behind the scenes, which means we can use standard Postgres restoration procedures when we need to recover.

To store the database backups we use two object storage containers, the primary upload container and a second, mirrored container.

We’ve split it up like this so that, in the event of a breach getting shell access on the database, the backups themselves can’t easily be deleted.

Like the image upload container, we created a specific user to manage database uploads that has no access by default.

The Terraform code that creates these containers looks like this

## Create the versions backup container

resource "openstack_objectstorage_container_v1" "version_storage" {
  name = "${var.name}-version-storage"
}

## Create the backups Swift container

resource "openstack_objectstorage_container_v1" "postgres_backups" {
  name = "${var.name}-backups"

  container_read  = ".r:-${var.postgres_backup_user}"
  container_write = "${data.openstack_identity_auth_scope_v3.current.project_id}:${var.postgres_backup_user}"

  versioning {
    type     = "versions"
    location = openstack_objectstorage_container_v1.version_storage.name
  }
}

The version container has no read or write permissions applied, so none of the utility users for Cloud Island can read or modify objects.

The backup container explicitly denies read access to the Postgres backup user, to mitigate the potential for a breach to exfiltrate older backups that may have data from now-deleted users.

As Swift does not allow more complex ACLs than read or write, the versions style of versioning acts to prevent objects from being deleted from the main backups container, as deleted objects will be automatically restored from the versions container.

Going Forwards

Deployment Automation

While the entirety of the Cloud Island infrastructure is deployed through infrastructure-as-code, the Mastodon Docker image and the relevant docker-compose.yml is currently deployed and managed by hand.

While this isn’t ideal, designing a complete CI/CD system to automatically deploy Mastodon would have delayed launch of Cloud Island by several months.

We’re planning on using Gitlab or another self-hosted git repository and CI service, to keep with our goal of keeping services in New Zealand.

This will require planning for:

  • Where to host the Gitlab instance,
  • Where to host the built Docker images,
  • Where to host the Terraform code for the infrastructure,
  • Planning for development and testing environments,
  • Automated Rails migrations, and
  • Any number of other things.

A lot goes into a good continuous integration and deployment setup!

Autoscaling

Given the importance of the Sidekiq background processing in maintaining federation, this is the obvious starting point for introducing autoscaling.

Once we move Sidekiq to its own server, we’ll be able to add it to an autoscaling pool in OpenStack, and scale up and down based on the Redis queue depth.

The next target for autoscaling will be the API streaming service, and will scale based on the number of concurrent users. However, this is less pressing as moving the API service to its own server should alleviate performance concerns for the time being.

Building autoscaling will require:

  • A tool to monitor the Redis queue (Already part of our monitoring deployment),
  • A callback for when the Redis queue depth exceeds our threshholds, which updates the autoscaler “minimum” count,
  • A callback for when the Redis queue depth has been under the threshold for a length of time, which updates the “minimum” count back down.

Presently, OpenStack doesn’t support arbitrary inputs as part of its autoscale pipeline, so we’ll need to develop and deploy our own there, as well.

Final Thoughts

OpenStack makes a great platform for running a modern scalable Mastodon instance, and I hope that these extra details on the specific changes are the push you need to start your own instance.


  1. An object storage system provides effectively infinite capacity, without worrying about underlying hard drives. The AWS equivalent would be S3. 


practise devops OpenStack Catalyst Cloud

© Eiara Limited