This is the second post in our series about building polyglot projects using Bazel. You might find it useful to start with our first installment.
A Makefile
is an extraordinary thing. You specify a set of targets,
you tell Make what the dependencies are between these targets, and
Make figures out how to create these in the right order, every time.
foo.o: foo.c foo.h
gcc -c foo.c
bar.o: bar.c foo.h
gcc -c bar.c
libhello.so: foo.o bar.o
gcc -shared -o libhello.so foo.o bar.o
The number one problem with this kind of build system today?
Reproducibility. After an initial git clone
of a new project, it’s
all too common to have to install a long list of “build requirements”
and plod through multiple steps of “setup”, only to find that, yes
indeed, the build did fail. Yet it worked just fine for your
colleague! Now you have to find out why. One reason might be that the
compiler toolchain on your system is different from that of your
colleague. Technically, gcc
itself is a dependency of each of the
above targets, but developers don’t say so in their Makefile
.
Instead of building GCC from scratch, they implicitly reuse whatever
is part of their system’s state.
In fact this also frequently happens with system upgrades. The command apt-get install
does not do a full reinstallation from scratch of your system
every time you call it. If it did, chances would be high that
apt-get
would systematically succeed, because “nothing-at-all” is
a known good starting state for your system, so following
a deterministic sequence of steps starting from that state will get
your system, as well as everyone else’s system, to the same final
state every time.
That’s why some folks today use Nix instead. Nix is a package
manager just like apt-get
. But unlike apt-get
and other package
managers, Nix is the only tool that will reinstall your system from
scratch, every time. Sounds crazy? Reinstalling from scratch is
actually much faster than you might think, sometimes faster even that
partial installs, for reasons we’ll discuss below. The point is that
build tools and package management tools both have sometimes poor
reproducibility, in both cases because they don’t start from scratch
completely each and every time. Nix solves the problem for package
management, but what about for builds? If a remedy works for Bob,
could the same one work for Alice? That’s what we’ll explore in this
post.
But let’s not bury the lede too far down. We’ll argue that you want to use Nix to “build” your entire compiler toolchain and system libraries, but use Bazel to build your code base to achieve fast, correct and incremental rebuilds.
Hermeticity
Truly starting a build from scratch effectively means making it so
that the build is entirely self-contained. It means that building your
project requires nothing in /usr/bin
, /usr/lib
, /usr/include
or anywhere else in your system, apart from the build command. In
a self-contained build, we can’t just grab the compiler from the
PATH
. The only possible way forward is to make the compiler
toolchain itself one of the targets in the build, and then make all
targets that need a compiler depend on it. We can’t just include
whatever headers we find in the filesystem. We need to supply our own.
We can’t link against system libraries. We need to build those
libraries, then link against that.
That sounds like a lot of work. So why do that? Because in this way, we can precisely control the version of the compiler toolchain, ensure header files are byte-for-byte identical to what we expect, and use system libraries that we know for sure won’t cause linking issues.
It also sounds like a long time. If you have to build yourself an entire compiler before you can even get started in earnest, you might need a lot of coffee breaks. But the good news is that hermetic build targets are easy to cache in globally available storage. Because by definition, hermeticity means targets don’t depend on anything outside the build, it means that anything produced as part of the build is a closed artifact that can be copied across different systems easily. Build artifacts aren’t affected by different environments on different systems, because they don’t use them.
That’s where Nix and the Nixpkgs project come into play: Nixpkgs defines a lot of compiler toolchains, header files and system libraries. You can reuse these definitions as snippets of your project’s build description. If you do so, then you’re sharing the same snippets as others are. So you “building” them can be very quick indeed: just check whether the target is already in the remote cache, because someone else already built it, and if so, download it. And if you’ve already built it yourself, then just get the version from your local cache.
We could bring our own build definitions for these standard things like glibc, GCC, OpenJDK, zlib etc, but the benefit of reusing the definitions from Nixpkgs is that
- we don’t have to painstakingly put together these build definitions and
- we get the benefit of the public Nixpkgs build caches.
Incremental rebuilds
Could we perhaps write our entire project’s build description as an extension to Nixpkgs, using the Nix language? Unfortunately, we’d run afoul of two problems:
- Nixpkgs is designed to describe packages. Any time any of the
inputs to the package change, the entire package must be rebuilt,
because one target = one package. In the case of Haskell packages,
this could mean rebuilding as much as all of
lens
, all ofAgda
, all of any of your proprietary Cabal packages, at any change anywhere. Clearly, the granularity of Nixpkgs is too large for an iterative development workflow. - The cache of the Nix system, as currently implemented, invalidates
build artifacts not when the artifacts themselves change, but rather
whenever their inputs change, or indeed if any of the inputs of any
of their dependencies change. This model has a number of advantages
for the Nix implementation. But for a build tool, this is
a massive pessimization. Consider our
Makefile
from earlier. If we converted that entire description to Nix, we could have to recompilelibhello.so
, along with anything the depends on it, even when we’re just adding a comment at the top of the header file. In Java source code, we would have to recompile downstream classes even when just the code of a private method changed; similarly in Haskell.
Quite simply, Nix in its current form is designed to support the package management use case, not the build system use case. So to get fast incremental rebuilds where only the strict minimum necessary is rebuilt on any source code change, we turned to Bazel, a polyglot build tool designed for large monorepos. Not only do we get good incremental rebuilds and granular caching, but we also got to reuse all of Bazel’s built-in knowledge and best practices for building C/C++, Java, Scala, Rust, Go (and now Haskell)!
Example Nix+Bazel project
Let’s build a simple Haskell project, that depends on zlib
. We have
a single source file, called Main.hs
that makes a dummy call to
some zlib
function:
module Main where
import Foreign.Ptr
import Foreign.C.Types
foreign import ccall crc32 :: CLong -> Ptr () -> CInt -> IO ()
main = crc32 0 nullPtr 0
At the root of any project buildable by Bazel lives a WORKSPACE
file:
# Import the Bazel rules for Haskell.
http_archive(
name = "io_tweag_rules_haskell",
strip_prefix = "rules_haskell-0.4",
urls = ["https://github.com/tweag/rules_haskell/archive/v0.4.tar.gz"],
)
# Recursively import Haskell rules' dependencies.
load("@io_tweag_rules_haskell//haskell:repositories.bzl", "haskell_repositories")
haskell_repositories()
# Import and load the Bazel rules to build Nix packages.
http_archive(
name = "io_tweag_rules_nixpkgs",
strip_prefix = "rules_nixpkgs-0.2",
urls = ["https://github.com/tweag/rules_nixpkgs/archive/v0.2.tar.gz"],
)
load("@io_tweag_rules_nixpkgs//nixpkgs:nixpkgs.bzl", "nixpkgs_package")
In Bazel, all files use a very simple subset of Python as their
syntax. Bazel projects are made of fine grained “packages”. Each
package has its own BUILD
file. We’ll have one top-level package
only, containing two targets:
load("@io_tweag_rules_haskell//haskell:haskell.bzl",
"haskell_binary",
"haskell_cc_import",
)
# Make zlib library available to Haskell targets.
haskell_cc_import(
name = "zlib",
shared_library = "@zlib//:lib"
)
haskell_binary(
name = "hello",
srcs = ["Main.hs"],
deps = [":zlib"],
prebuilt_dependencies = ["base"],
)
The Haskell binary has two dependencies. It depends on the zlib
library. Secondly, it depends on Haskell’s base
library. Bazel can’t build that one, because it ships built-in to the
compiler.
The API reference documentation for these rules is available here.
Speaking of compilers, how do we build this? Where does the compiler
come from? It comes from Nixpkgs, just like zlib
and any other
system libraries we would need for our project. To import these into
Bazel, we add the following to our aforementioned BUILD
file:
nixpkgs_package(
name = "ghc",
attribute_path = "haskell.compiler.ghc822",
)
nixpkgs_package(name = "zlib")
This says, import the project to build GHC inside the project. Also
import the project that builds the zlib
library. Bazel knows how to
fork out to Nix to build each Nixpkgs package, thanks to the rules we
loaded earlier. Under the hood, Bazel evaluates a Nix expression for
the name package, resolving it to a path in the Nix store, which is
a local cache of all Nix-built artifacts.
All that remains is to explicitly tell Bazel to use exactly the GHC
version that was built using Nix. We could add GHC as an extra dep
in each Haskell target, but that would be tiresome. For these kinds of
toolchains used pervasively, Bazel has
a ”toolchain registration” mechanism,
telling Bazel to add our GHC as an extra dependency to each Haskell
target for us.
register_toolchains("//ghc")
Unlike with a Makefile
, we don’t implicitly depend on whatever’s around, while
not being honest about it.
What now?
We can build the whole thing with
$ bazel build //...
i.e “Bazel, please build all targets at the root of the project and
below”. When this happens, Bazel will hermetically build our hello
binary, using only inputs we specified exactly along the way, and
which were themselves targets of the build. Unlike pure Nix solutions,
bazel will incrementally recompile the strict minimum number of
targets necessary when source files change.
For more examples of using Bazel to build Haskell and other code, have a look here and here.
In short,
- we imported build recipes for GHC and zlib from Nixpkgs because Nixpkgs already has these,
- we used Bazel to build our actual Haskell source because Bazel already knows how to do this. When we eventually add C/C++ or Java too to our project, Bazel will already know how to build that too.
Did we really need Nix here to achieve a fully hermetic build? We did have alternatives. We could as well have achieved similar levels of hermiticity using Docker to supply a container with the versions of the toolchain we want. But that’s a topic for another post.