[Nix-dev] Nix and Plash

Mark Seaborn mrs at mythic-beasts.com
Mon Mar 17 23:59:29 CET 2008


Here is a conversation between me and Ludovic Courtès about Nix
(http://nixos.org) and Plash.  I'm CC'ing the Zero-Install list in
case the comparisons with Zero-Install are of interest.

Mark

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: Mark Seaborn <mrs at mythic-beasts.com>
To: ludovic.courtes at inria.fr
Date: Wed, 12 Mar 2008 22:28:33 +0000 (GMT)

ludovic.courtes at inria.fr (Ludovic Courtès) wrote:

> > I could make the source link more prominent but unfortunately it is
> > not easy to build because of the need to build a custom glibc.  Do you
> > think that matters?
> 
> I'm planning to package Plash for NixOS (http://nixos.org/), which is
> why I needed access to the raw source.

Ah, I find Nix very interesting.  My understanding of Nix is that
every Nix expression is given a hash, and when you build a package the
hash for a Nix value gets substituted into strings so that the built
executables refer to filenames containing hashes.  If that's right,
how can you change a library without having to rebuild all the
packages that depend on it?  How would programs use PlashGlibc instead
of the normal glibc?

Regards,
Mark

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: ludovic.courtes at inria.fr (Ludovic Courtès)
To: Mark Seaborn <mrs at mythic-beasts.com>
Date: Thu, 13 Mar 2008 09:55:41 +0100

Hi,

Mark Seaborn <mrs at mythic-beasts.com> writes:

> Ah, I find Nix very interesting.  My understanding of Nix is that
> every Nix expression is given a hash, and when you build a package the
> hash for a Nix value gets substituted into strings so that the built
> executables refer to filenames containing hashes.

Exactly.  There's no `/lib', `/usr', and `/bin' only contains `sh'.  :-)

In a way, it removes the ambient authority that stems from these
catch-all directories.  One thing is that `pola-run''s coarse-grained
`-B' option could be substituted by something more accurate that would
map only the dependencies of the given executable into its file system
(dependencies can be obtained using "nix-store -q --references
/nix/store/the-hash-path").

Overall, I think there's interesting stuff to be done by integrating
Plash and NixOS.

> If that's right, how can you change a library without having to
> rebuild all the packages that depend on it?

You can't, and that's the very purpose of Nix: if two Nix expressions
are different (e.g., because they download a different source tarball),
they yield a different hash (or "store path" in Nix terms).  The intent
is to have fully deterministic and reproducible builds, thanks to this
non-ambiguous dependency specifications.

Nix also provides garbage collection of store paths.

> How would programs use PlashGlibc instead of the normal glibc?

I haven't looked in details yet, and that will need some thought.  For
instance, on NixOS I have:

  $ ldd `which ls`
  linux-gate.so.1 =>  (0xffffe000)
  librt.so.1 => /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/librt.so.1 (0xb7ef7000)
  libc.so.6 => /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6 (0xb7dbf000)
  libpthread.so.0 => /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 (0xb7da7000)
  /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/ld-linux.so.2 (0xb7f0e000)

So extra machinery will be needed so that Plash maps the libc and loader
in the right place in the chroot.  It will boil down to using this:

  $ nix-store -q --references `which ls`
  /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7
  /nix/store/cwf53pwwkfjnay3bidd0wy6spgb7cw6q-coreutils-6.10

Of course, precisions on how to achieve this are more than welcome.  :-)

Thanks,
Ludovic.

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: Mark Seaborn <mrs at mythic-beasts.com>
To: ludovic.courtes at inria.fr
Date: Fri, 14 Mar 2008 19:29:37 +0000 (GMT)

ludovic.courtes at inria.fr (Ludovic Courtès) wrote:

> Hi,
> 
> Mark Seaborn <mrs at mythic-beasts.com> writes:
> 
> > Ah, I find Nix very interesting.  My understanding of Nix is that
> > every Nix expression is given a hash, and when you build a package the
> > hash for a Nix value gets substituted into strings so that the built
> > executables refer to filenames containing hashes.
> 
> Exactly.  There's no `/lib', `/usr', and `/bin' only contains `sh'.  :-)

Hmm, how does /bin/sh get versioned?

> In a way, it removes the ambient authority that stems from these
> catch-all directories.  One thing is that `pola-run''s coarse-grained
> `-B' option could be substituted by something more accurate that would
> map only the dependencies of the given executable into its file system
> (dependencies can be obtained using "nix-store -q --references
> /nix/store/the-hash-path").

Providing an alternative to the -B option is definitely something I am
aiming towards.  There is a package system for Plash which is based on
Debian packages (see http://plash.beasts.org/wiki/PackageSystem).  It
will follow the dependencies of Debian packages and create a
chroot-like environment containing the packages' files.  It currently
works by hard-linking files.  You could try that, or implement a
user-space directory object for the /nix/store directory.  It's
possible to implement Plash directory objects in Python.

I suppose one advantage that Plash would give you is that you wouldn't
need root access to create and populate a /nix/store directory.

How does Nix work out what the dependencies for an executable or a
package are?  Is it simply a worst-case estimate, so that, for
example, it requires the glibc source to be present in /nix/store to
run most packages?


> Overall, I think there's interesting stuff to be done by integrating
> Plash and NixOS.
> 
> > If that's right, how can you change a library without having to
> > rebuild all the packages that depend on it?
> 
> You can't, and that's the very purpose of Nix: if two Nix expressions
> are different (e.g., because they download a different source tarball),
> they yield a different hash (or "store path" in Nix terms).  The intent
> is to have fully deterministic and reproducible builds, thanks to this
> non-ambiguous dependency specifications.
> 
> Nix also provides garbage collection of store paths.
> 
> > How would programs use PlashGlibc instead of the normal glibc?

I think Nix's architecture is going to make this awkward.  You would
have to rebuild everything that you want to run under Plash, and you
would probably have to build it under Plash.

That would make it very difficult to develop PlashGlibc under Nix.  In
the worst case, rebuilding everything could take hours or even a day.

Maybe Nix is not intended to be used for normal development?  That
would be fair enough, because Debian packages are very awkward for
development too.  However, my ideal is a packaging system that can be
used during normal development as well as real deployment, so that the
development system is as close as possible to the real system, to make
sure that deployment problems can be caught as soon as possible.  (In
my day job, our development environment is subtly different from the
deployment environment in terms of environment variables and files in
/etc, which occasionally causes problems that we should really be
finding much earlier.)

I think Nix is preventing something that is legitimate here, and I
don't think this constraint is necessary for reproducible builds.
It's legitimate to build an executable E against one version of a
library, L1 (header files + .so file), and then dynamically link it
against another version of the library, L2 (just a .so file), when the
ABI is compatible.  You can record that E was produced by building
against L1.  When you produce a deployment image, you can specify that
E is to be linked against L2.  That is all reproducible.

I believe Zero-Install allows this information to be recorded.  It
also allows it to be omitted, which means it can avoid an infinite
regress / bootstrap problem.  That is, the process by which a version
of a C compiler got built involves a chain of C (or other) compilers
stretching back into time, which we can never specify completely.
(One of which might contain Ken Thompson's compiler backdoor -- it's
hard to tell. :-) )

Nix however doesn't allow L1 != L2.  I'm not entirely sure of the
exact mechanics through which hash paths get included in executables
and libraries, whether

  a. It falls out as a natural consequence of how Unix build tools
  usually work.  e.g. You pass in a pathname (containing a hash)
  referencing L1 to ./configure, which naturally gets included in the
  built executables.  or:

  b. The path gets added in by some explicit Nix packaging step.  I
  saw some reference to changing executables to add rpath fields (not
  sure of the details here).


> I haven't looked in details yet, and that will need some thought.  For
> instance, on NixOS I have:
> 
>   $ ldd `which ls`

How does PATH get set up to include "ls"?  Is the same mechanism used
to set up other variables such as PYTHONPATH?  Will packages depend on
executables being in PATH or are they supposed to refer to them by
their hashes?

If the answers are in the docs or the papers feel free to tell me to
stop being lazy and look there. :-)

Cheers,
Mark

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: ludovic.courtes at inria.fr (Ludovic Courtès)
To: Mark Seaborn <mrs at mythic-beasts.com>
Date: Sat, 15 Mar 2008 12:01:24 +0100

Hi Mark,

Mark Seaborn <mrs at mythic-beasts.com> writes:

> Hmm, how does /bin/sh get versioned?

It's a symlink:

  $ ls -l /bin/sh 
  lrwxrwxrwx 1 root root 63 2008-03-15 11:19 /bin/sh -> /nix/store/9wjz27czpbjg09w7brfngfnkc4zg23f0-bash-3.2-p33/bin/sh

> Providing an alternative to the -B option is definitely something I am
> aiming towards.  There is a package system for Plash which is based on
> Debian packages (see http://plash.beasts.org/wiki/PackageSystem).  It
> will follow the dependencies of Debian packages and create a
> chroot-like environment containing the packages' files.  It currently
> works by hard-linking files.  You could try that, or implement a
> user-space directory object for the /nix/store directory.  It's
> possible to implement Plash directory objects in Python.

Yes.  A simple shell script would do as well (I don't usually write
Python code ;-)).

> I suppose one advantage that Plash would give you is that you wouldn't
> need root access to create and populate a /nix/store directory.

We already don't need it.  :-)

Every user can install a package in its "environment".  User
environments are basically a collection of symlinks under
`~/.nix-profile', which aggregates the contents of the various packages
(a bit à la GNU Stow).

Packages that are installed are either (i) substituted (i.e., you get to
install a pre-compiled binary from the source of your choice), or (ii)
built by a system-wide daemon on behalf of the requesting user.  The
build daemon is part of the TCB, so that all users can trust that the
output it produces really corresponds to the given input Nix expression.

There's a nice paper on this topic:

  http://www.cs.uu.nl/~eelco/pubs/secsharing-ase2005-final.pdf

> How does Nix work out what the dependencies for an executable or a
> package are?  Is it simply a worst-case estimate, so that, for
> example, it requires the glibc source to be present in /nix/store to
> run most packages?

First, one has to specify all the dependencies of a package in the Nix
expression that describes how to build it (the builder daemon can build
in a chroot, to make sure you don't inadvertently get to use a
dependency that was not declared).

One the package is built, it is installed and scanned for
references to `/nix/store'---just like what a GC does.  All these
references are marked as dependencies of the store path.

Glibc is just a package among many other packages, without any special
treatment.

> I think Nix's architecture is going to make this awkward.  You would
> have to rebuild everything that you want to run under Plash, and you
> would probably have to build it under Plash.
>
> That would make it very difficult to develop PlashGlibc under Nix.  In
> the worst case, rebuilding everything could take hours or even a day.

I suppose we can have one PlashGlibc per Glibc.  I need to think more
about it.

> I think Nix is preventing something that is legitimate here, and I
> don't think this constraint is necessary for reproducible builds.
> It's legitimate to build an executable E against one version of a
> library, L1 (header files + .so file), and then dynamically link it
> against another version of the library, L2 (just a .so file), when the
> ABI is compatible.  You can record that E was produced by building
> against L1.  When you produce a deployment image, you can specify that
> E is to be linked against L2.  That is all reproducible.

Nix aims for exact reproducibility.  If two libraries are different at
the bit level, then they probably lead to different behaviors; IOW, L1
and L2 might be ABI-compatible, but at least their implementation
differ.

Nix takes a radical approach where changing a single bit in a component
yields a "different" component, and allows dependencies to be expressed
this accurately.

> I believe Zero-Install allows this information to be recorded.  It
> also allows it to be omitted, which means it can avoid an infinite
> regress / bootstrap problem.  That is, the process by which a version
> of a C compiler got built involves a chain of C (or other) compilers
> stretching back into time, which we can never specify completely.
> (One of which might contain Ken Thompson's compiler backdoor -- it's
> hard to tell. :-) )

NixOS addresses this by pre-compiling a bootstrap environment (with
Linux, GCC, Bash, etc.), which is part of the installation CD for
instance.

> Nix however doesn't allow L1 != L2.  I'm not entirely sure of the
> exact mechanics through which hash paths get included in executables
> and libraries, whether

An installed component is named by its hash (not the hash of its
source), modulo self-references.

>   a. It falls out as a natural consequence of how Unix build tools
>   usually work.  e.g. You pass in a pathname (containing a hash)
>   referencing L1 to ./configure, which naturally gets included in the
>   built executables.  or:

Yes.

>   b. The path gets added in by some explicit Nix packaging step.  I
>   saw some reference to changing executables to add rpath fields (not
>   sure of the details here).

No.  But...

Since you can't do the output hash beforehand, i.e., when you invoke
`configure', it goes like this:

  1. You run `./configure --prefix=/nix/store/some-random-value'

  2. `make install'

  3. Compute hash of the output (all files that got installed) modulo
     references to `some-random-value'.

  4. Replace references to `some-random-value' in all files with the
     actual computed in Step 3.

There's a paper explaining this better than I do, and surely Eelco's
thesis describes it:

  http://www.cs.uu.nl/~eelco/pubs/phd-thesis.pdf

> How does PATH get set up to include "ls"?  Is the same mechanism used
> to set up other variables such as PYTHONPATH?  Will packages depend on
> executables being in PATH or are they supposed to refer to them by
> their hashes?

Nope, each user just has its `~/.nix-profile' in the PATH (as well as a
system-wide profile).

> If the answers are in the docs or the papers feel free to tell me to
> stop being lazy and look there. :-)

I can just invite you to browse through
http://nix.cs.uu.nl/docs/papers.html .  There are a lot of well written
papers on the topic.  :-)

Thanks,
Ludovic.

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: Mark Seaborn <mrs at mythic-beasts.com>
To: ludovic.courtes at inria.fr
Date: Sun, 16 Mar 2008 12:41:30 +0000 (GMT)

ludovic.courtes at inria.fr (Ludovic Courtès) wrote:

> > I suppose one advantage that Plash would give you is that you wouldn't
> > need root access to create and populate a /nix/store directory.
> 
> We already don't need it.  :-)

How do you create a /nix directory (in the root filesystem) without
root access?  Even if this directory is populated by a non-root
process, that process would be part of the TCB for all users that use
the /nix directory.

> Every user can install a package in its "environment".  User
> environments are basically a collection of symlinks under
> `~/.nix-profile', which aggregates the contents of the various packages
> (a bit à la GNU Stow).
> 
> Packages that are installed are either (i) substituted (i.e., you get to
> install a pre-compiled binary from the source of your choice), or (ii)
> built by a system-wide daemon on behalf of the requesting user.  The
> build daemon is part of the TCB, so that all users can trust that the
> output it produces really corresponds to the given input Nix expression.

I would have thought a build daemon would only be in the TCB for those
users who use its output.  And if you run code with minimal authority
the build daemons don't have to be in the TCB at all.


> > I think Nix is preventing something that is legitimate here, and I
> > don't think this constraint is necessary for reproducible builds.
> > It's legitimate to build an executable E against one version of a
> > library, L1 (header files + .so file), and then dynamically link it
> > against another version of the library, L2 (just a .so file), when the
> > ABI is compatible.  You can record that E was produced by building
> > against L1.  When you produce a deployment image, you can specify that
> > E is to be linked against L2.  That is all reproducible.
> 
> Nix aims for exact reproducibility.  If two libraries are different at
> the bit level, then they probably lead to different behaviors; IOW, L1
> and L2 might be ABI-compatible, but at least their implementation
> differ.

But as I say, if you always get E by building against L1 and always
run E by linking it against L2, you should always get the same result.
I think you are mixing up reproducibility with something else.  It
might be a good idea to link executables with the same library version
they were built against, to avoid accidental ABI incompatibility, but
I don't see why we should limit ourselves to always doing that.

Does Nix do anything about dependencies on kernel versions?  Can you
upgrade the kernel without rebuilding all your software?


> > Nix however doesn't allow L1 != L2.  I'm not entirely sure of the
> > exact mechanics through which hash paths get included in executables
> > and libraries, whether
> 
> An installed component is named by its hash (not the hash of its
> source), modulo self-references.

OK.  I think this has changed from when I first looked at Nix (perhaps
a year ago), when Nix used the build input hash rather than the build
output hash.


> >   b. The path gets added in by some explicit Nix packaging step.  I
> >   saw some reference to changing executables to add rpath fields (not
> >   sure of the details here).
> 
> No.  But...
> 
> Since you can't do the output hash beforehand, i.e., when you invoke
> `configure', it goes like this:
> 
>   1. You run `./configure --prefix=/nix/store/some-random-value'
> 
>   2. `make install'
> 
>   3. Compute hash of the output (all files that got installed) modulo
>      references to `some-random-value'.
> 
>   4. Replace references to `some-random-value' in all files with the
>      actual computed in Step 3.

Rewriting hashes inside files without regard to the file format
strikes me as a bit dodgy.  I would not like to rely on it.

On the other hand, you could use hash rewriting to substitute
PlashGlibc for the normal glibc, but it's not clear to me how that
would fit with the rest of system.


> > How does PATH get set up to include "ls"?  Is the same mechanism used
> > to set up other variables such as PYTHONPATH?  Will packages depend on
> > executables being in PATH or are they supposed to refer to them by
> > their hashes?
> 
> Nope, each user just has its `~/.nix-profile' in the PATH (as well as a
> system-wide profile).

How does Nix handle Python modules?  (Googling for "Python" under
nix.cs.uu.nl didn't turn up anything.)

The manual (http://nix.cs.uu.nl/dist/nix/nix-0.11/manual/) shows a
build script setting PATH:

PATH=$perl/bin:$PATH

It's not clear to me how often setting env vars has to be done with
Nix.  The manual says that the environment is cleared out when running
builders, but I suppose the PATH containing ~/.nix-profile is
inherited when running applications normally?

Regards,
Mark

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: ludovic.courtes at inria.fr (Ludovic Courtès)
To: Mark Seaborn <mrs at mythic-beasts.com>
Date: Sun, 16 Mar 2008 21:31:04 +0100

Hi Mark,

Mark Seaborn <mrs at mythic-beasts.com> writes:

> How do you create a /nix directory (in the root filesystem) without
> root access?

It's created and populated by the build daemon on behalf of users.

> Even if this directory is populated by a non-root process, that
> process would be part of the TCB for all users that use the /nix
> directory.

Yes, the builder daemon is part of the TCB.

> I would have thought a build daemon would only be in the TCB for those
> users who use its output.

Well, every user uses its output.  The alternative is to build packages
by hand and install them under `$HOME'

> And if you run code with minimal authority
> the build daemons don't have to be in the TCB at all.

The main reason why the build daemon has to be in the TCB is that is
accesses a global name space, the `/nix/store' directory, and we want to
make sure that the has that's in a path really is the hash of its
contents.  This guarantee allows sharing among users of the system (see
"Secure Sharing...", whose reference I gave earlier).

I'm not sure what you meant above.

> But as I say, if you always get E by building against L1 and always
> run E by linking it against L2, you should always get the same result.
> I think you are mixing up reproducibility with something else.  It
> might be a good idea to link executables with the same library version
> they were built against, to avoid accidental ABI incompatibility, but
> I don't see why we should limit ourselves to always doing that.

NixOS' goal is to be able to allow the exact reproduction of whole
system configurations.  If does so by listing precisely and
unambiguously the dependencies of all aspects of the configuration.  The
underlying assumption is that the behavior of a software component is
determined by all its build inputs.

Debian's approach (and that of most other distros) is to say, roughly,
"if L1 and L2 are ABI-equivalent, then they are the same".  Thus,
packages of L1 and L2 have the same name, unlike on NixOS.  Now, assume
you once had a version of `evince' that worked fine with `libpoppler0';
suddenly, `libpoppler0' is upgraded and `evince' no longer works, but
it's hard to tell what was the previous configuration that did work, let
alone rolling back to it.

> Does Nix do anything about dependencies on kernel versions?  Can you
> upgrade the kernel without rebuilding all your software?

Yes.  IOW, in other words, the kernel is not a build input of any
user-space component (glibc depends on the kernel headers only, which
are not necessarily exactly those of the currently installed kernel).

Hmm, as I write this, I realize that this is a "breach" in the model.
Probably worth debating on `nix-dev'.  ;-)

> Rewriting hashes inside files without regard to the file format
> strikes me as a bit dodgy.  I would not like to rely on it.

Actually the authors had the same reaction before they experimented with
it.  ;-)  There's a paper explaining this story, too.

> How does Nix handle Python modules?  (Googling for "Python" under
> nix.cs.uu.nl didn't turn up anything.)

Python modules, Perl modules, Guile modules, and executables (for shell
scripts) are all instances of the same problem, so let's illustrate it
with shell script.

The `dhclient-script' shell script, for instance, relies on `ifconfig'
and other executables from net-tools.  However, there's nothing
guaranteeing that ${nettools}/bin is in the PATH when that script is
invoked.  Thus, at installation time, instead of installing the raw
script, a wrapper is installed that does:

  PATH=${netttools}/bin:$PATH
  exec the-real-dhclient-script

This ensures that the script always finds the executables it's referring
to.

The same is done with `PYTHONPATH', `PERL5LIB', `GUILE_LOAD_PATH', etc.


I'm writing in a hurry, so I think the papers and the mailing list may
be able to provide better explanations.  ;-)

Thanks,
Ludovic.

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: Mark Seaborn <mrs at mythic-beasts.com>
To: ludovic.courtes at inria.fr
Date: Mon, 17 Mar 2008 14:24:48 +0000 (GMT)

ludovic.courtes at inria.fr (Ludovic Courtès) wrote:

> Mark Seaborn <mrs at mythic-beasts.com> writes:
> 
> > How do you create a /nix directory (in the root filesystem) without
> > root access?
> 
> It's created and populated by the build daemon on behalf of users.
> 
> > Even if this directory is populated by a non-root process, that
> > process would be part of the TCB for all users that use the /nix
> > directory.
> 
> Yes, the builder daemon is part of the TCB.
> 
> > I would have thought a build daemon would only be in the TCB for those
> > users who use its output.
> 
> Well, every user uses its output.  The alternative is to build packages
> by hand and install them under `$HOME'

So I could use "/home/foo/nix" instead of "/nix"?

> > And if you run code with minimal authority
> > the build daemons don't have to be in the TCB at all.
> 
> The main reason why the build daemon has to be in the TCB is that is
> accesses a global name space, the `/nix/store' directory, and we want to
> make sure that the has that's in a path really is the hash of its
> contents.  This guarantee allows sharing among users of the system (see
> "Secure Sharing...", whose reference I gave earlier).
> 
> I'm not sure what you meant above.

It sounds like the build daemon is doing two things:
 * populating the system-wide "/nix" directory
 * carrying out the builds

What I'm getting at is that these tasks could be carried out by two
components (which is what must happen when using the result of a build
daemon on another machine anyway).

The component that populates the system's "/nix" directory only has to
verify hashes of files.  The users on the system only need to trust it
to verify hashes.  Since users share a single system-wide (or at
least, per-chroot) "/nix" directory, it needs to be populated by a
component that all the users can trust.  That component needs to
acquire the right to write into "/nix", which requires it to be given
that authority by the root user.  But if you were using Plash, each
user could have their own "/nix" directory (since the file namespace
is virtualised).  Each user could run their own process to populate
their private "/nix" directory and that wouldn't require root access.

A process is in my TCB if I am trusting it with all my authority.  If
I run a binary package for Firefox in a sandbox such that I grant it
minimal authority, then the build daemon that built Firefox is not
necessarily in my TCB.  But the build daemon would have been in my TCB
if I had run Firefox unsandboxed, with all my authority.


> > But as I say, if you always get E by building against L1 and always
> > run E by linking it against L2, you should always get the same result.
> > I think you are mixing up reproducibility with something else.  It
> > might be a good idea to link executables with the same library version
> > they were built against, to avoid accidental ABI incompatibility, but
> > I don't see why we should limit ourselves to always doing that.
> 
> NixOS' goal is to be able to allow the exact reproduction of whole
> system configurations.  If does so by listing precisely and
> unambiguously the dependencies of all aspects of the configuration.  The
> underlying assumption is that the behavior of a software component is
> determined by all its build inputs.
> 
> Debian's approach (and that of most other distros) is to say, roughly,
> "if L1 and L2 are ABI-equivalent, then they are the same".  Thus,
> packages of L1 and L2 have the same name, unlike on NixOS.  Now, assume
> you once had a version of `evince' that worked fine with `libpoppler0';
> suddenly, `libpoppler0' is upgraded and `evince' no longer works, but
> it's hard to tell what was the previous configuration that did work, let
> alone rolling back to it.

You're missing my point.  I'll try to explain it differently.

Obviously the Debian packaging system has problems.  However, it does
provide one way to specify a version of "libpoppler0" that "evince"
works with: the "Packages" file, which lists a set of .deb packages
and includes their hashes.  If I create a Debian package repository
based on that Packages file, debootstrap from it and "apt-get install
evince" inside the resulting chroot, I will get a reproducible set of
packages installed.  This is awkward, yes, but it is reproducible.
The Plash package tools make it easier to set up an environment just
to run Evince without having to use chroots.

Library versions L1 and L2 don't have the same name from the point of
view of a Packages file, which names them by hash.  The fact that L1
and L2 can't be installed at the same time in the same chroot is a
limitation which is unrelated to reproducibility.

As I see it, there are two issues for reproducibility:
 * Recording sets of binary packages so that a deployment setup can be
   reproduced.  With Debian you can do this with a Packages file.
 * Recording how a binary package was built so that the binary package
   can be reproduced.  Debian does not provide a way to do this, but
   it could do this by referring back to an immutable Packages file
   listing the packages that were installed when the package was
   built.


> > Does Nix do anything about dependencies on kernel versions?  Can you
> > upgrade the kernel without rebuilding all your software?
> 
> Yes.  IOW, in other words, the kernel is not a build input of any
> user-space component (glibc depends on the kernel headers only, which
> are not necessarily exactly those of the currently installed kernel).
> 
> Hmm, as I write this, I realize that this is a "breach" in the model.
> Probably worth debating on `nix-dev'.  ;-)

Right, that's what I was getting at.  I think Nix's architecture is
neither necessary nor sufficient for 100% reproducible builds.  This
is an example of why it's not sufficient.

I'll send a copy of our discussion to nix-dev and CC the Plash and
Zero-Install mailing lists, if that's okay with you?


> > How does Nix handle Python modules?  (Googling for "Python" under
> > nix.cs.uu.nl didn't turn up anything.)
> 
> Python modules, Perl modules, Guile modules, and executables (for shell
> scripts) are all instances of the same problem, so let's illustrate it
> with shell script.
> 
> The `dhclient-script' shell script, for instance, relies on `ifconfig'
> and other executables from net-tools.  However, there's nothing
> guaranteeing that ${nettools}/bin is in the PATH when that script is
> invoked.  Thus, at installation time, instead of installing the raw
> script, a wrapper is installed that does:
> 
>   PATH=${netttools}/bin:$PATH
>   exec the-real-dhclient-script

OK.  This is quite similar to how Zero-Install does things, except
that it is a built-in part of Zero-Install.

Zero-Install handles libraries this way by setting LD_LIBRARY_PATH.
If Nix did the same, substituting hashes inside binary files might
become unnecessarly.

Regards,
Mark

-----------------------------------------------------------------------

Subject: Re: Plash source URL
From: ludovic.courtes at inria.fr (Ludovic Courtès)
To: Mark Seaborn <mrs at mythic-beasts.com>
Date: Mon, 17 Mar 2008 16:14:02 +0100

Hi Mark,

Mark Seaborn <mrs at mythic-beasts.com> writes:

> ludovic.courtes at inria.fr (Ludovic Courtès) wrote:

>> Well, every user uses its output.  The alternative is to build packages
>> by hand and install them under `$HOME'
>
> So I could use "/home/foo/nix" instead of "/nix"?

No no no.  By "by hand", I really mean: wget foo.tar.gz, tar xzvf
foo.tar.gz, cd foo && ./configure --prefix=$HOME/local && make install.
IOW, without support from Nix.

> It sounds like the build daemon is doing two things:
>  * populating the system-wide "/nix" directory
>  * carrying out the builds

Exactly.

> What I'm getting at is that these tasks could be carried out by two
> components (which is what must happen when using the result of a build
> daemon on another machine anyway).

Yes.

But the daemon actually does another (optional) thing: build within a
chroot whose file system contains only the declared build inputs of the
package being built.  And `chroot(2)' is a privileged operation.

Admittedly, Plash would remove the need for this part of the build
daemon---or rather, we'd remove that part of the build daemon from the
TCB and add Plash instead.

> The component that populates the system's "/nix" directory only has to
> verify hashes of files.  The users on the system only need to trust it
> to verify hashes.  Since users share a single system-wide (or at
> least, per-chroot) "/nix" directory, it needs to be populated by a
> component that all the users can trust.  That component needs to
> acquire the right to write into "/nix", which requires it to be given
> that authority by the root user.  But if you were using Plash, each
> user could have their own "/nix" directory (since the file namespace
> is virtualised).  Each user could run their own process to populate
> their private "/nix" directory and that wouldn't require root access.

In practice, you want to avoid having one Nix store per user in order to
maximize sharing: you definitely don't want to have a copy of Glibc,
OpenOffice.org, etc., for each user.

Having a global, shared Nix store also provides single-instance storage
and, alongside the trusted build daemon, secure sharing.

>> NixOS' goal is to be able to allow the exact reproduction of whole
>> system configurations.  If does so by listing precisely and
>> unambiguously the dependencies of all aspects of the configuration.  The
>> underlying assumption is that the behavior of a software component is
>> determined by all its build inputs.
>> 
>> Debian's approach (and that of most other distros) is to say, roughly,
>> "if L1 and L2 are ABI-equivalent, then they are the same".  Thus,
>> packages of L1 and L2 have the same name, unlike on NixOS.  Now, assume
>> you once had a version of `evince' that worked fine with `libpoppler0';
>> suddenly, `libpoppler0' is upgraded and `evince' no longer works, but
>> it's hard to tell what was the previous configuration that did work, let
>> alone rolling back to it.
>
> You're missing my point.  I'll try to explain it differently.
>
> Obviously the Debian packaging system has problems.  However, it does
> provide one way to specify a version of "libpoppler0" that "evince"
> works with: the "Packages" file, which lists a set of .deb packages
> and includes their hashes.  If I create a Debian package repository
> based on that Packages file, debootstrap from it and "apt-get install
> evince" inside the resulting chroot, I will get a reproducible set of
> packages installed.  This is awkward, yes, but it is reproducible.
> The Plash package tools make it easier to set up an environment just
> to run Evince without having to use chroots.
>
> Library versions L1 and L2 don't have the same name from the point of
> view of a Packages file, which names them by hash.  The fact that L1
> and L2 can't be installed at the same time in the same chroot is a
> limitation which is unrelated to reproducibility.

He he, sure you can do anything with `.debs', provided you are careful
enough.  The thing is, a Debian `control' file names dependencies using
human-readable and human-assigned names, such as `libpoppler0'---it does
*not* name dependencies using hashes.  Thus, a `control' file alone is
ambiguous because it depends on a naming context provided by the
`Packages' file.

Conversely, the Nix function that builds a package takes its
dependencies as explicitly passed arguments: looking at the invocation
of a build function tells you unambiguously all about its build
environment.

> As I see it, there are two issues for reproducibility:
>  * Recording sets of binary packages so that a deployment setup can be
>    reproduced.  With Debian you can do this with a Packages file.

That's the build-time issue that I described above.

>  * Recording how a binary package was built so that the binary package
>    can be reproduced.  Debian does not provide a way to do this, but
>    it could do this by referring back to an immutable Packages file
>    listing the packages that were installed when the package was
>    built.

That's the run-time dual of the first aspect.  Under Debian, `evince' is
linked under `libpoppler.so.0'; at run-time, whatever library under
`/usr/lib' whose name matches is picked up and used as `libpoppler0',
regardless of whether this shared object was the one used at
build-time.  Here Debian relies, again, on human-maintained library
naming: upstream developers are expected to change SO names when there's
ABI breakage.

Under NixOS, `evince' refers unambiguously to the very `libpoppler.so.0'
that was used at build-time (through its `RPATH').  The corollary is
that NixOS does not exploit ABI-compatibility at all: when `libpoppler'
is upgraded, `evince' keeps using the old `libpoppler', or you have to
rebuild `evince' to use the new one---but that is a deliberate choice.

>> Yes.  IOW, in other words, the kernel is not a build input of any
>> user-space component (glibc depends on the kernel headers only, which
>> are not necessarily exactly those of the currently installed kernel).
>> 
>> Hmm, as I write this, I realize that this is a "breach" in the model.
>> Probably worth debating on `nix-dev'.  ;-)
>
> Right, that's what I was getting at.  I think Nix's architecture is
> neither necessary nor sufficient for 100% reproducible builds.  This
> is an example of why it's not sufficient.

Well, not a 100% sufficient, but now we're down to a single degree of
variation: the kernel.  That's a huge improvement compared to other
distros.

> I'll send a copy of our discussion to nix-dev and CC the Plash and
> Zero-Install mailing lists, if that's okay with you?

Sure!

> Zero-Install handles libraries this way by setting LD_LIBRARY_PATH.

Nix uses the `RPATH' for librairies.

> If Nix did the same, substituting hashes inside binary files might
> become unnecessarly.

No: hash substitution has nothing to do with the fact that Nix uses
`RPATH'.  As I mentioned, hash substitution is used to rewrite
self-references in files resulting from a build with the actual hash of
the output modulo self-references.  See, e.g., Section 6.3.2 of Eelco's
thesis:

  http://www.cs.uu.nl/~eelco/pubs/phd-thesis.pdf

Thanks,
Ludovic.



More information about the nix-dev mailing list