[Nix-dev] nix on compute cluster?

Wout Mertens wout.mertens at gmail.com
Fri Oct 10 15:32:52 CEST 2014


I think you could do this. You would set it up so the nix server does the
compiles and the grid runs distcc. See the wiki, the raspberry pi page has
explanations about distcc.

Note that only one node can write to nix store at the same time due to the
db.

Another option is to have private nix stores on all nodes but nfs mount all
of them under the remote stores directory. That way nix-store will fetch
missing packages from the remotes and store them locally. At least, that's
my understanding.

As for the Intel compiler, that could be a challenge, but right now we have
several gcc versions and clang, so it's not impossible. You can decide on a
per-package basis which compiler to use.

Not sure how mpi would influence state, can you elaborate?

The rest is totally doable.

Wout.
(on phone, sorry for brevity)
On Oct 10, 2014 1:34 PM, "Andreas Herrmann" <andreash87 at gmx.ch> wrote:

> Hi,
>
> How would you go about bringing the benefits of Nix to the users of a
> compute cluster?
>
> Assume the following cluster: A login node, a file-system node, and a
> number of compute nodes. All nodes run on a recent CentOS and are fairly
> homogeneous. The fs node holds all user data and some common libraries. Its
> storage is nfs mounted on all other nodes.
>
> Users ssh into the login node, write and compile some code, then they use
> the Sun Grid engine (sge) to submit compute jobs, and once these are
> finished they copy the results on their workstations and are happy.
>
> There are subgroups of users with fairly exotic software requirements.
> These are not available in any package repositories, and the cluster admin
> doesn't have the time to install and maintain them. So, currently, most of
> these users just compile everything themselves in their home-directory,
> which is a huge waste of time, and storage space.
>
> I would like to suggest Nix to the admin as a way to let these
> user-subgroups manage their own packages, but that in a well organized
> manner, that avoids redundant work, and storage. But, I'm not sure how
> exactly that should work.
>
> There are a few constraints:
>
>   1. Unfortunately, NixOS/nixops is not an option. This will have to work
> with the currently installed cluster OS.
>   2. Compilation should not put too much load on the login node. Ideally,
> build jobs would be referred to the compute nodes.
>   3. Build jobs on the compute nodes should be managed by the sge.
>   4. (Some) users should be allowed to initiate builds, and use their own
> overloads of packages, and extra packages.
>   5. Some impurity is necessary. Be it for things that are hard to package
> (e.g. intel compiler), or for global state (mpi jobs).
>
> My question to you: Do you think this is possible to achieve (within a
> reasonable time-frame), and how would you do it?
>
> Here's what I have in mind so far (please feel free to take it apart if
> you think there is a better way):
>
> Have a nix-store on the file-server, nfs mount that on all nodes (cached).
> The login node runs the nix-daemon. Builds are deferred to the grid-engine
> (how?) which are executed on the compute nodes, and store the results on
> the nfs mounted nix-store. Users would use `nix-env` on the login node to
> install software into their profile. This profile should be visible on all
> nodes, so that jobs can use those libraries and tools in the nix-profile.
> Things like myEnvFun should allow running jobs in different software
> environments simultaneously.
>
> Best,
>
> Andreas
> _______________________________________________
> nix-dev mailing list
> nix-dev at lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.science.uu.nl/pipermail/nix-dev/attachments/20141010/538a0b2e/attachment-0001.html 


More information about the nix-dev mailing list