[Nix-dev] Guarantee Consistent Builds and Obsolete overrideScope

Thu Oct 15 20:52:22 CEST 2015

The Problem
-----------

A lot of effort goes into curated package sets like Stackage, but even
so we can compile only ~50% of the packages available from Hackage. It
appears to be the nature of the game: when lens 2.x comes out with a
fundamentally new API, then some packages will adopt the new version and
others won't. A consistent package set must chose on which side of the
fence it wants to live. Either those packages that depend on lens 1.x
compile or those that depend on 2.x compile --- but not both.

Now, Nix is not limited to one particular version of lens --- we can
have both versions available at the same time. But it's difficult to
take advantage of that feature, because once you start mixing lens 1.x
and 2.x in the same package set, you risk inconsistent builds, i.e.
builds where some part of the dependency tree refers to lens 1.x and
another part refers to lens 2.x. It's a bad idea to try and link those
two trees together into one executable; we are fortunate that Cabal
detects this error during the configure phase and aborts the build!

We need a mechanism that can mix multiple package versions within a
package set, but that also guarantees consistency for every single
build. We hoped overrideScope would be that mechanism, but somehow it
hasn't quite lived up to the promise, mostly because it is hard to
understand.

The Situation Today
-------------------

Consider an executable package foobar that depends on the libraries foo
and bar, each of which depends on lens. The corresponding definitions in
hackage-packages.nix --- stripped down to the relevant bits --- look as
follows:

    "lens"     = ... lens version 1.x ...;
    "lens_2_0" = ... lens version 2.x ...;

    "foo" = callPackage
        ({ mkDerivation, lens }:
        mkDerivation {
          pname = "foo";
          libraryHaskellDepends = [lens];
        }) {};

    "bar" = callPackage
        ({ mkDerivation, lens }:
        mkDerivation {
          pname = "bar";
          libraryHaskellDepends = [lens];
        }) {};

    "foobar" = callPackage
        ({ mkDerivation, lens }:
        mkDerivation {
          pname = "foobar"; [...]
          libraryHaskellDepends = [foo bar];
        }) {};

Let's assume that foo won't compile in that setup because it requires
lens version 2.x. We can remedy that by adding an override to
configuration-common.nix that says:

    foo = super.foo.override { lens = self.lens_2_0; };

That change fixes the build of foo, but foobar remains broken, because
now it pulls in both lens 1.x and 2.x simultaneously through its
dependencies. If bar works only with lens 1.x, then there is nothing we
can do: the version constraints conflict and we cannot compile foobar.
If bar *does* support lens 2.x, however, then we can just switch it to
the newer version with:

    bar = super.bar.override { lens = self.lens_2_0; };

Now we can compile foobar! Unfortunately, that change may break other
builds. There is a reason why lens 1.x is our default choice. If any
other package depends on bar as well as lens 1.x (directly or
indirectly), then it will no longer compile after that change.

We can avoid that side-effect by localizing the override to foobar:

    foobar = super.foobar.override {
      bar = self.bar.override { lens = self.lens_2_0; };
    };

That approach allows us to compile foobar, while still leaving the
default version of bar at lens 1.x, like most of our packages require.
Overriding build inputs this way works fine, and we have used this
technique for many years to fix builds that require non-default versions
to compile. The downside of these nested overrides is that the tend to
become freaky complicated if a package needs overriding that is
sufficiently deep in the dependency tree. The GHC 7.8.4 package set, for
example, needed many such overrides because its default version of mtl
was stuck at version 2.1.x all the while large parts of Hackage had
moved on to mtl 2.2.x. Since mtl is a rather fundamental package, we had
nested overrides 3-4 levels deep that were highly repetitious, too. It
was a mess.

Haskell NG improved on that situation by adding overrideScope. That
function changes the package set ("scope") in which Nix evaluates a
build expression. The override

    foobar = super.foobar.overrideScope (self: super: { lens = self.lens_2_0; });

creates a new temporary package set, replaces lens with lens_2_0 in it,
and then evaluates foobar. The callPackage function picks up the
re-written lens attribute, which means that there's no need to override
that choice explicitly in all dependencies of foobar. One could say that
overrideScope implements "deep overriding", i.e. it applies an override
to the given derivation as well as all sub-derivations that it refers
to.

Unfortunately, we lack a proper understanding of how expensive that
technique is memory and performance-wise. In the past, we've
occasionally crashed Nix with this kind of stuff --- keep in mind that
the interpreter creates a whole new package set for every build that
uses this mechanism ---, but when used sparingly, overrideScope seems to
work okay.

In some cases, overrideScope won't work at all, i.e. when confronted
with builds that have explicitly passed arguments. For example, let's
say that lens 3.x comes. So we try to compile foobar like this:

    foobar = super.foobar.overrideScope (self: super: { lens = self.lens_3_0; });

That build will fail, because we added an explicit override for foo
earlier that committed the build to lens_2_0, and overrideScope will not
affect that choice since that build input is not picked up with
callPackage. So foobar will pull in both lens 2.x and 3.x despite the
use of overrideScope.

Possible Improvements
---------------------

We generate Haskell build expressions automatically with cabal2nix, and
that tool knows the complete dependency tree for every package. So it
would be possible to generate builds that expect as function arguments
not just their immediate dependencies but the transitive closure of all
dependencies. Build expressions would then call their direct
dependencies, passing in appropriate versions of their respective
dependencies, etc. For example:

    "foobar" = callPackage
        ({ mkDerivation, foo, bar, many, other, inputs, of, lens }:
        let lens' = lens.override { inherit many other inputs of lens; };
            foo'  = foo.override { lens = lens'; };
            bar'  = bar.override { lens = lens'; };
        in
        mkDerivation {
          pname = "foobar"; [...]
          libraryHaskellDepends = [foo' bar'];
        }) {};

Now, foobar expects every single package that occurs anywhere inside of
its dependency tree as an argument, and it constructs the dependency
tree using those arguments. So the build must be consistent. It's
impossible for foobar to refer to two incompatible versions of lens,
because its inputs always use the same version.

Consequently,

    foobar.override { mtl = self.mtl_2.4.0; }

gives us is a version of foobar that has its entire dependency tree
built with mtl 2.4.x. We could even get rid of the override altogether
if we adopt the suggestions from "Use Function Application To Escape
Override Hell" and remove callPackage from hackage-packages.nix. We'd
define all builds as straight functions

    "foobar" =
        { mkDerivation, foo, bar, many, other, inputs, of, lens }:
        let lens' = lens { inherit many other inputs of lens; };
            foo'  = foo { lens = lens'; };
            bar'  = bar { lens = lens'; };
        in
        mkDerivation {
          pname = "foobar"; [...]
          libraryHaskellDepends = [foo' bar'];
        };

and invoke them from inside of a package set with:

    callPackage foobar { mtl = self.mtl_2.4.0; }

This would give us a guarantee for consistent builds without any
overrides.