[Nix-dev] No more multi-threaded builds for Haskell libraries (was: Please help generating data about GHC's non-deterministic library ID bug)

Peter Simons simons at cryp.to
Sat Jun 6 13:53:37 CEST 2015


Fellow Nix'ers,

friendly supporters of the cause from all over the world have run a few
thousand Haskell builds in the name of learning more about the effects
of our favorite GHC bug [2]. It's fair to say that the CPU cycles
invested into this endeavor have been well spent. Here are the results.

In our current Nixpkgs setup [3], GHC 7.10.1 generates a correct
library ID in 85% of all builds:

    |--------+---------+------|
    | builds | correct |    % |
    |--------+---------+------|
    |   3205 |    2727 | 85.1 |
    |--------+---------+------|

The meaning of "correct" is vague, of course, because we don't know
what makes a library ID "correct" and what not. Therefore, we assume
the ID generated by the majority of builds to be correct for the
purposes of this experiment. So, a more accurate way of phrasing the
result is that 15% of all builds generate an ID other than the one
produced by the remaining 85% of the builds.

Further investigation suggests that the severity of the divergence
depends on the library GHC compiles:

    |---------------+--------+---------+-------|
    | package       | builds | correct |     % |
    |---------------+--------+---------+-------|
    | mtl-2.2.1     |    700 |     700 | 100.0 |
    | text-1.2.0.4  |   1655 |    1585 |  95.8 |
    | aeson-0.8.1.0 |    850 |     442 |  52.0 |
    |---------------+--------+---------+-------|

"mtl" is a relatively simple library code-wise, and we've had no
diverging IDs for that package at all. The "text" and "aeson"
libraries are of a different caliber, however. "aeson" in particular
has so many different IDs that it's becoming hard to decide which
one we should treat as the correct one!

Those "aeson" builds provides an important insight into the nature
of the problem. The following table shows the number of distinct
library IDs assigned to "aeson" per build machine:

    |-------------------------------+--------+-----|
    | machine                       | builds | ids |
    |-------------------------------+--------+-----|
    | abbradar.net                  |    100 |  78 |
    | leroy.geek.nz                 |    100 |  78 |
    | mango.local                   |    100 |  71 |
    | work.cryp.to                  |     50 |  34 |
    | mobile.cryp.to                |     25 |  20 |
    | mono.rycee.net                |     25 |  17 |
    | archachatina.mtlaa.gebner.org |    100 |   1 |
    | c-cube.bennofs                |     25 |   1 |
    | jude.bio                      |     25 |   1 |
    | lin.wiwaxia.se                |    100 |   1 |
    | m-nix.wiwaxia.se              |    100 |   1 |
    | phreedom                      |    100 |   1 |
    |-------------------------------+--------+-----|

Some machines are all over the place and others generate the same ID
every time they compile. It turns out the difference between those
machines is the ability to do multi-threaded Haskell builds, i.e.
machines that use more than one CPU core appear far more likely to
generate diverging library IDs than those compiling the package with
one CPU core only.

To verify that hypothesis, I've run another set of tests (on a machine
with 8 cores but) with multi-threaded Haskell builds disabled. The
result is quite clear:

    |---------------+--------------+--------+---------+-----|
    | package       | system       | builds | correct |   % |
    |---------------+--------------+--------+---------+-----|
    | aeson-0.8.1.0 | x86_64-linux |   1500 |    1500 | 100 |
    | text-1.2.0.4  | x86_64-linux |    500 |     500 | 100 |
    |---------------+--------------+--------+---------+-----|

I've thus committed [4] to disable multi-threading in all Haskell
library builds. Executables are still compiled utilizing more than
one core per Nix build, though. Let's hope the number of broken
builds we observe because of this bug henceforth goes down
noticeably.

The data files used to compute those numbers are available at [1].
Thanks to everyone who contributed to the effort! I believe it's
been worthwhile. Let's do this again some time, i.e. when 7.10.2
comes out. :-)

Best regards,
Peter



[1] https://github.com/peti/ghc-library-id-bug
[2] https://ghc.haskell.org/trac/ghc/ticket/4012
[3] https://github.com/NixOS/nixpkgs-channels/commit/f93a8ee1105f4cc3770ce339a8c1a4acea3b2fb6
[4] https://github.com/NixOS/nixpkgs/commit/7e04b7319c54bf0a4c0b6b55caca80a3b7434a87



More information about the nix-dev mailing list