In a previous post I explained why we were eagerly trying to change the Nix store model to allow for content-addressed derivations. I also handwaved that this was a real challenge, but without giving any hint at why this could be tricky. So let’s dive a bit into the gory details and understand some of the conceptual pain points with content-addressability in Nix, which forced us to some trade-offs in how we handle content-addressed paths.
What are self-references?
This is a self-reference
— Théophane Hufschmitt, This very article
A very trivial Nix derivation might look like this:
with import <nixpkgs> {};
writeScript "hello" ''
#!${bash}/bin/bash
${hello}/bin/hello
''
The result of this derivation will be an executable file containing a script that will run the hello
program.
It will depend on the bash
and hello
derivations as we refer to them in the file.
We can build this derivation and execute it:
$ nix-build hello.nix
$ ./result
Hello, world!
So far, so good.
Let’s now change our derivation to change the prompt of hello
to something more personalized:
with import <nixpkgs> {};
writeScript "hello-its-me" ''
#!${bash}/bin/bash
echo "Hello, world! This is ${placeholder "out"}"
''
where ${placeholder "out"}
is a magic value that will be replaced by the output path of the derivation during the build.
We can build this and run the result just fine
$ nix-build hello-its-me.nix
$ ./result
Hello, world! This is /nix/store/c0qw0gbp7rfyzm7x7ih279pmnzazg86p-hello-its-me
And we can check that the file is indeed who it claims to be:
$ /nix/store/c0qw0gbp7rfyzm7x7ih279pmnzazg86p-hello-its-me
Hello, world! This is /nix/store/c0qw0gbp7rfyzm7x7ih279pmnzazg86p-hello-its-me
While the hello
derivation depends on bash
and hello
, hello-its-me
depends on bash
and… itself.
This is something rather common in Nix.
For example, it’s rather natural for a C program to have /nix/store/xxx-foo/bin/foo
depend of /nix/store/xxx-foo/lib/libfoo.so
.
Self references and content-addressed paths
How do we build a content-addressed derivation foo
in Nix? The recipe is rather simple:
- Build the derivation in a temporary directory
/some/where/
- Compute the hash
xxx
of that/some/where/
directory - Move the directory under
/nix/store/xxx-foo/
You might see where things will go wrong with self-references: the reference will point to /some/where
rather than /nix/store/xxx-foo
, and so will be wrong (in addition to leak a path to what should just be a temporary directory).
To work around that, we would need to compute this xxx
hash before the build, but that’s quite impossible as the hash depends on the content of the directory, including the value of the self-references.
However, we can hack our way around it in most cases by allowing ourselves a bit of heuristic.
The only assumption that we need to make is that all the
self-references will appear textually (i.e. running strings
on a
file that contains self-references will print all the self-references out).
Under that assumption, we can:
- Build the derivation in our
/some/where
directory - Replace all the occurrences of a self-reference by a magic value
- Compute the hash of the resulting path to determine the final path
- Replace all the occurrences of the magic value by the final path
- Move the resulting store path to its final path
Now you might think that this is a crazy hack − there’s so many ways it could break. And in theory you’ll be right. But, surprisingly, this works remarkably well in practice. You might also notice that pedantically speaking this scheme isn’t exactly content-addressing because of the “modulo the final hash” part. But this is close-enough to keep all the desirable properties of proper content addressing, while also enabling self-references, which wouldn’t be possible otherwise. For example, the Fugue cloud deployment system used a generalisation of this technique which not only deals with self-references, but with reference cycles of arbitrary length.
However, there’s a key thing that’s required for this to work: patching strings in binaries is generally benign, but the final string must have the same length as the original one.
But we can do that: we don’t know what the final xxx
hash will be, but we know its length (because it’s a fixed-length hash), so we can just choose a temporary directory that has the right length (like a temporary store path with the same name), and we’re all set!
The annoying thing is that there’s no guarantee that there are no self-references hidden in such a way that a textual replacement won’t catch it (for example inside a compressed zip file). This is the main reason why content-addressability will not be the default in Nix, at first at least.
Non-deterministic builds − the diamond problem strikes back
No matter how hard Nix tries to isolate the build environment, some actions will remain inherently non-deterministic − anything that can yield a different output depending on the order in which concurrent tasks will be executed for example. This is an annoyance as it might prevent early cutoff (see our previous article on the subject in case you missed it).
But more than limiting the efficiency of the cache, this could also hurt the correctness of Nix if we’re not careful enough.
For example, consider the following dependency graph:
Alice wants to get foo
installed.
She already built lib0
and lib1
locally.
Let’s call them lib0_a
and lib1_a
.
The binary cache contains builds of lib0
and lib2
.
Let’s call them lib0_b
and lib2_b
.
Because the build of lib0
is not deterministic, lib0_a
and lib0_b
are different — and so have a different hash.
In a content-addressed word, that means they will be stored in different paths.
A simple cache implementation would want to fetch lib2_b
from the cache and use it to build foo
.
This would also pull lib0_b
, because it’s a dependency of lib2_b
.
But that would mean that foo
would depend on both lib0_a
and lib0_b
.
In the happy case this would just be a waste of space − the dependency is duplicated, so we use twice as much memory to store it.
But in many cases this would simply blow-up at some point — for example if lib0
is a shared library, the C linker will fail because of the duplicated symbols.
Besides that, this breaks down the purity of the build as we get a different behavior depending on what’s already in the store at the start of the build.
Getting out of this
Nix’s foundational paper shows a way out of this by rewriting hashes in substituted paths.
This is however quite complex to implement for a first version, so the current implementation settles down on a simpler (though not optimal) behavior where we only allow one build for each derivation.
In the example above, lib0
has already been instantiated (as lib0_a
), so we don’t allow pulling in lib0_b
(nor lib1_b
) and we rebuild both lib1
and foo
.
While not optimal − we’ll end-up rebuilding foo
even if it’s already in the binary cache − this solution has the advantage of preserving correctness while staying conceptually and technically simple.
What now?
Part of this has already been implemented but there’s still quite a long way forward.
I hope for it to be usable (though maybe still experimental) for Nix 3.0.
And in the meantime stay tuned with our regular updates on discourse. Or wait for the next blog post that will explain another change that will be necessary — one that is less fundamental, but more user-facing.