Nix + Bazel = fully reproducible, incremental builds

15 March 2018 — by Mathieu Boespflug, Théophane Hufschmitt

This is the second post in our series about building polyglot projects using Bazel. You might find it useful to start with our first installment.

A Makefile is an extraordinary thing. You specify a set of targets, you tell Make what the dependencies are between these targets, and Make figures out how to create these in the right order, every time.

foo.o: foo.c foo.h
        gcc -c foo.c

bar.o: bar.c foo.h
        gcc -c bar.c

libhello.so: foo.o bar.o
       gcc -shared -o libhello.so foo.o bar.o

The number one problem with this kind of build system today? Reproducibility. After an initial git clone of a new project, it’s all too common to have to install a long list of “build requirements” and plod through multiple steps of “setup”, only to find that, yes indeed, the build did fail. Yet it worked just fine for your colleague! Now you have to find out why. One reason might be that the compiler toolchain on your system is different from that of your colleague. Technically, gcc itself is a dependency of each of the above targets, but developers don’t say so in their Makefile. Instead of building GCC from scratch, they implicitly reuse whatever is part of their system’s state.

In fact this also frequently happens with system upgrades. The command apt-get install does not do a full reinstallation from scratch of your system every time you call it. If it did, chances would be high that apt-get would systematically succeed, because “nothing-at-all” is a known good starting state for your system, so following a deterministic sequence of steps starting from that state will get your system, as well as everyone else’s system, to the same final state every time.

That’s why some folks today use Nix instead. Nix is a package manager just like apt-get. But unlike apt-get and other package managers, Nix is the only tool that will reinstall your system from scratch, every time. Sounds crazy? Reinstalling from scratch is actually much faster than you might think, sometimes faster even that partial installs, for reasons we’ll discuss below. The point is that build tools and package management tools both have sometimes poor reproducibility, in both cases because they don’t start from scratch completely each and every time. Nix solves the problem for package management, but what about for builds? If a remedy works for Bob, could the same one work for Alice? That’s what we’ll explore in this post.

But let’s not bury the lede too far down. We’ll argue that you want to use Nix to “build” your entire compiler toolchain and system libraries, but use Bazel to build your code base to achieve fast, correct and incremental rebuilds.

Hermeticity

Truly starting a build from scratch effectively means making it so that the build is entirely self-contained. It means that building your project requires nothing in /usr/bin, /usr/lib, /usr/include or anywhere else in your system, apart from the build command. In a self-contained build, we can’t just grab the compiler from the PATH. The only possible way forward is to make the compiler toolchain itself one of the targets in the build, and then make all targets that need a compiler depend on it. We can’t just include whatever headers we find in the filesystem. We need to supply our own. We can’t link against system libraries. We need to build those libraries, then link against that.

That sounds like a lot of work. So why do that? Because in this way, we can precisely control the version of the compiler toolchain, ensure header files are byte-for-byte identical to what we expect, and use system libraries that we know for sure won’t cause linking issues.

It also sounds like a long time. If you have to build yourself an entire compiler before you can even get started in earnest, you might need a lot of coffee breaks. But the good news is that hermetic build targets are easy to cache in globally available storage. Because by definition, hermeticity means targets don’t depend on anything outside the build, it means that anything produced as part of the build is a closed artifact that can be copied across different systems easily. Build artifacts aren’t affected by different environments on different systems, because they don’t use them.

That’s where Nix and the Nixpkgs project come into play: Nixpkgs defines a lot of compiler toolchains, header files and system libraries. You can reuse these definitions as snippets of your project’s build description. If you do so, then you’re sharing the same snippets as others are. So you “building” them can be very quick indeed: just check whether the target is already in the remote cache, because someone else already built it, and if so, download it. And if you’ve already built it yourself, then just get the version from your local cache.

We could bring our own build definitions for these standard things like glibc, GCC, OpenJDK, zlib etc, but the benefit of reusing the definitions from Nixpkgs is that

we don’t have to painstakingly put together these build definitions and
we get the benefit of the public Nixpkgs build caches.

Incremental rebuilds

Could we perhaps write our entire project’s build description as an extension to Nixpkgs, using the Nix language? Unfortunately, we’d run afoul of two problems:

Nixpkgs is designed to describe packages. Any time any of the inputs to the package change, the entire package must be rebuilt, because one target = one package. In the case of Haskell packages, this could mean rebuilding as much as all of lens, all of Agda, all of any of your proprietary Cabal packages, at any change anywhere. Clearly, the granularity of Nixpkgs is too large for an iterative development workflow.
The cache of the Nix system, as currently implemented, invalidates build artifacts not when the artifacts themselves change, but rather whenever their inputs change, or indeed if any of the inputs of any of their dependencies change. This model has a number of advantages for the Nix implementation. But for a build tool, this is a massive pessimization. Consider our Makefile from earlier. If we converted that entire description to Nix, we could have to recompile libhello.so, along with anything the depends on it, even when we’re just adding a comment at the top of the header file. In Java source code, we would have to recompile downstream classes even when just the code of a private method changed; similarly in Haskell.

Quite simply, Nix in its current form is designed to support the package management use case, not the build system use case. So to get fast incremental rebuilds where only the strict minimum necessary is rebuilt on any source code change, we turned to Bazel, a polyglot build tool designed for large monorepos. Not only do we get good incremental rebuilds and granular caching, but we also got to reuse all of Bazel’s built-in knowledge and best practices for building C/C++, Java, Scala, Rust, Go (and now Haskell)!

Example Nix+Bazel project

Let’s build a simple Haskell project, that depends on zlib. We have a single source file, called Main.hs that makes a dummy call to some zlib function:

module Main where

import Foreign.Ptr
import Foreign.C.Types

foreign import ccall crc32 :: CLong -> Ptr () -> CInt -> IO ()

main = crc32 0 nullPtr 0

At the root of any project buildable by Bazel lives a WORKSPACE file:

# Import the Bazel rules for Haskell.
http_archive(
  name = "io_tweag_rules_haskell",
  strip_prefix = "rules_haskell-0.4",
  urls = ["https://github.com/tweag/rules_haskell/archive/v0.4.tar.gz"],
)

# Recursively import Haskell rules' dependencies.
load("@io_tweag_rules_haskell//haskell:repositories.bzl", "haskell_repositories")
haskell_repositories()

# Import and load the Bazel rules to build Nix packages.
http_archive(
  name = "io_tweag_rules_nixpkgs",
  strip_prefix = "rules_nixpkgs-0.2",
  urls = ["https://github.com/tweag/rules_nixpkgs/archive/v0.2.tar.gz"],
)

load("@io_tweag_rules_nixpkgs//nixpkgs:nixpkgs.bzl", "nixpkgs_package")

In Bazel, all files use a very simple subset of Python as their syntax. Bazel projects are made of fine grained “packages”. Each package has its own BUILD file. We’ll have one top-level package only, containing two targets:

load("@io_tweag_rules_haskell//haskell:haskell.bzl",
  "haskell_binary",
  "haskell_cc_import",
)

# Make zlib library available to Haskell targets.
haskell_cc_import(
  name = "zlib",
  shared_library = "@zlib//:lib"
)

haskell_binary(
  name = "hello",
  srcs = ["Main.hs"],
  deps = [":zlib"],
  prebuilt_dependencies = ["base"],
)

The Haskell binary has two dependencies. It depends on the zlib library. Secondly, it depends on Haskell’s base library. Bazel can’t build that one, because it ships built-in to the compiler.

The API reference documentation for these rules is available here.

Speaking of compilers, how do we build this? Where does the compiler come from? It comes from Nixpkgs, just like zlib and any other system libraries we would need for our project. To import these into Bazel, we add the following to our aforementioned BUILD file:

nixpkgs_package(
  name = "ghc",
  attribute_path = "haskell.compiler.ghc822",
)

nixpkgs_package(name = "zlib")

This says, import the project to build GHC inside the project. Also import the project that builds the zlib library. Bazel knows how to fork out to Nix to build each Nixpkgs package, thanks to the rules we loaded earlier. Under the hood, Bazel evaluates a Nix expression for the name package, resolving it to a path in the Nix store, which is a local cache of all Nix-built artifacts.

All that remains is to explicitly tell Bazel to use exactly the GHC version that was built using Nix. We could add GHC as an extra dep in each Haskell target, but that would be tiresome. For these kinds of toolchains used pervasively, Bazel has a ”toolchain registration” mechanism, telling Bazel to add our GHC as an extra dependency to each Haskell target for us.

register_toolchains("//ghc")

Unlike with a Makefile, we don’t implicitly depend on whatever’s around, while not being honest about it.

What now?

We can build the whole thing with

$ bazel build //...

i.e “Bazel, please build all targets at the root of the project and below”. When this happens, Bazel will hermetically build our hello binary, using only inputs we specified exactly along the way, and which were themselves targets of the build. Unlike pure Nix solutions, bazel will incrementally recompile the strict minimum number of targets necessary when source files change.

For more examples of using Bazel to build Haskell and other code, have a look here and here.

In short,

we imported build recipes for GHC and zlib from Nixpkgs because Nixpkgs already has these,
we used Bazel to build our actual Haskell source because Bazel already knows how to do this. When we eventually add C/C++ or Java too to our project, Bazel will already know how to build that too.

Did we really need Nix here to achieve a fully hermetic build? We did have alternatives. We could as well have achieved similar levels of hermiticity using Docker to supply a container with the versions of the toolchain we want. But that’s a topic for another post.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Implementing a safer sort with linear types We're hiring! Software engineers →