In this post I’m going to show how to setup, with Terraform, a Buildkite-based CI using your own workers that run on GCP. For reference, the complete Terraform configuration for this post is available in this repository.
- The setup gives you complete control on how fast your worker are.
- The workers come with Nix pre-installed, so you won’t need to spend time downloading the same docker container again and again on every push as would usually happen with most cloud CI providers.
- The workers come with a distributed Nix cache set up. So authors of CI scripts won’t have to bother about caching at all.
Secrets
We are going to need to import two secret resources:
resource "secret_resource" "buildkite_agent_token" {}
resource "secret_resource" "nix_signing_key" {}
To initialize the resources, execute the following from the root directory of your project:
$ terraform import secret_resource.<name> <value>
where:
buildkite_agent_token
is obtained from the Buildkite site.-
nix_signing_key
can be generated by running:nix-store --generate-binary-cache-key <your-key-name> key.private key.public
The
key.private
file will contain the value for the signing key. I’ll explain later in the post how to use the contents of thekey.public
file.
Custom NixOS image
The next step is to use the nixos_image_custom
module to create a NixOS image with custom configuration.
resource "google_storage_bucket" "nixos_image" {
name = "buildkite-nixos-image-bucket-name"
location = "EU"
}
module "nixos_image_custom" {
source = "git::https://github.com/tweag/terraform-nixos.git//google_image_nixos_custom?ref=40fedb1fae7df5bd7ad9defdd71eb06b7252810f"
bucket_name = "${google_storage_bucket.nixos_image.name}"
nixos_config = "${path.module}/nixos-config.nix"
}
The snippet above first creates a bucket nixos_image
where the generated
image will be uploaded, then it uses the nixos_image_custom
module, which
handles generation of the image using the configuration from the
nixos-config.nix
file. The file is assumed to be in the same directory as
the Terraform configuration, hence ${path.module}/
.
Service account and cache bucket
To control access to different resources we will also need a service account:
resource "google_service_account" "buildkite_agent" {
account_id = "buildkite-agent"
display_name = "Buildkite agent"
}
We can use it to set access permissions for the storage bucket that will contain the Nix cache:
resource "google_storage_bucket" "nix_cache_bucket" {
name = "nix-cache-bucket-name"
location = "EU"
force_destroy = true
retention_policy {
retention_period = 7889238 # three months
}
}
resource "google_storage_bucket_iam_member" "buildkite_nix_cache_writer" {
bucket = "${google_storage_bucket.nix_cache_bucket.name}"
role = "roles/storage.objectAdmin"
member = "serviceAccount:${google_service_account.buildkite_agent.email}"
}
resource "google_storage_bucket_iam_member" "buildkite_nix_cache_reader" {
bucket = "${google_storage_bucket.nix_cache_bucket.name}"
role = "roles/storage.objectViewer"
member = "allUsers"
}
The bucket is configured to automatically delete objects that are older than
3 months. We give the service account the ability to write to and read from
the bucket (the roles/storage.objectAdmin
role). The rest of the world
gets the ability to read from the bucket (the roles/storage.objectViewer
role).
NixOS configuration
Here is the content of my nixos-config.nix
. This NixOS configuration
can serve as a starting point for writing your own. The numbered
points refer to the notes below.
{ modulesPath, pkgs, ... }:
{
imports = [
"${modulesPath}/virtualisation/google-compute-image.nix"
];
virtualisation.googleComputeImage.diskSize = 3000;
virtualisation.docker.enable = true;
services = {
buildkite-agents.agent = {
enable = true;
extraConfig = ''
tags-from-gcp=true
'';
tags = {
os = "nixos";
nix = "true";
};
tokenPath = "/run/keys/buildkite-agent-token"; # (1)
runtimePackages = with pkgs; [
bash
curl
gcc
gnutar
gzip
ncurses
nix
python3
xz
# (2) extend as necessary
];
};
nix-store-gcs-proxy = {
nix-cache-bucket-name = { # (3)
address = "localhost:3000";
};
};
};
nix = {
binaryCaches = [
"https://cache.nixos.org/"
"https://storage.googleapis.com/nix-cache-bucket-name" # (4)
];
binaryCachePublicKeys = [
"cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="
"<insert your public signing key here>" # (5)
];
extraOptions = ''
post-build-hook = /etc/nix/upload-to-cache.sh # (6)
'';
};
security.sudo.enable = true;
services.openssh.passwordAuthentication = false;
security.sudo.wheelNeedsPassword = false;
}
Notes:
- This file will be created later by the startup script (see below).
- The collection of packages that are available to the Buildkite script can be edited here.
- Replace
nix-cache-bucket-name
by the name of the bucket used for the Nix cache. - Similarly to (3) replace
nix-cache-bucket-name
in the URL. - Insert the contents of the
key.public
file you generated earlier. - The file will be created later by the startup script.
Compute instances and startup script
The following snippet sets up an instance group manager which controls multiple (3 in this example) Buildkite agents. The numbered points refer to the notes below.
data "template_file" "buildkite_nixos_startup" { # (1)
template = "${file("${path.module}/files/buildkite_nixos_startup.sh")}"
vars = {
buildkite_agent_token = "${secret_resource.buildkite_agent_token.value}"
nix_signing_key = "${secret_resource.nix_signing_key.value}"
}
}
resource "google_compute_instance_template" "buildkite_nixos" {
name_prefix = "buildkite-nixos-"
machine_type = "n1-standard-8"
disk {
boot = true
disk_size_gb = 100
source_image = "${module.nixos_image_custom.self_link}"
}
metadata_startup_script = "${data.template_file.buildkite_nixos_startup.rendered}"
network_interface {
network = "default"
access_config = {}
}
metadata {
enable-oslogin = true
}
service_account {
email = "${google_service_account.buildkite_agent.email}"
scopes = [
"compute-ro",
"logging-write",
"storage-rw",
]
}
scheduling {
automatic_restart = false
on_host_maintenance = "TERMINATE"
preemptible = true # (2)
}
lifecycle {
create_before_destroy = true
}
}
resource "google_compute_instance_group_manager" "buildkite_nixos" {
provider = "google-beta"
name = "buildkite-nixos"
base_instance_name = "buildkite-nixos"
target_size = "3" # (3)
zone = "<your-zone>" # (4)
version {
name = "buildkite_nixos"
instance_template = "${google_compute_instance_template.buildkite_nixos.self_link}"
}
update_policy {
type = "PROACTIVE"
minimal_action = "REPLACE"
max_unavailable_fixed = 1
}
}
Notes:
- The file
files/buildkite_nixos_startup.sh
is shown below. - Because of the remote Nix cache, the nodes can be preemptible (short-lived, never lasting longer than 24 hours), which results in much lower GCP costs.
- Changing
target_size
allows you to scale the system. This is the number of instances that are controlled by the instance group manager. - Insert your desired zone here.
Finally, here is the startup script:
# workaround https://github.com/NixOS/nixpkgs/issues/42344
chown root:keys /run/keys
chmod 750 /run/keys
umask 037
echo "${buildkite_agent_token}" > /run/keys/buildkite-agent-token
chown root:keys /run/keys/buildkite-agent-token
umask 077
echo '${nix_signing_key}' > /run/keys/nix_signing_key
chown root:keys /run/keys/nix-signing-key
cat <<EOF > /etc/nix/upload-to-cache.sh
#!/bin/sh
set -eu
set -f # disable globbing
export IFS=' '
echo "Uploading paths" $OUT_PATHS
exec nix copy --to http://localhost:3000?secret-key=/run/keys/nix_signing_key \$OUT_PATHS
EOF
chmod +x /etc/nix/upload-to-cache.sh
This script uses the Nix post build hook approach for uploading to the cache without polluting the CI script.
Conclusion
The setup allows us to run Nix builds in an environment where Nix tooling is available. It also provides a remote Nix cache which does not require that the authors of CI scripts set it up or, even, be aware of it at all. We use this setup on many of Tweag’s projects and found that both mental and performance overheads are minimal. A typical CI script looks like this:
steps:
- label: Build and test
command: nix-build -A distributed-closure --no-out-link
Builds with up-to-date cache that does not cause re-builds may finish in literally 1 second.