[Nix-dev] nix on compute cluster?

Andreas Herrmann andreash87 at gmx.ch
Fri Oct 10 13:34:43 CEST 2014


Hi,

How would you go about bringing the benefits of Nix to the users of a compute cluster?

Assume the following cluster: A login node, a file-system node, and a number of compute nodes. All nodes run on a recent CentOS and are fairly homogeneous. The fs node holds all user data and some common libraries. Its storage is nfs mounted on all other nodes.

Users ssh into the login node, write and compile some code, then they use the Sun Grid engine (sge) to submit compute jobs, and once these are finished they copy the results on their workstations and are happy.

There are subgroups of users with fairly exotic software requirements. These are not available in any package repositories, and the cluster admin doesn't have the time to install and maintain them. So, currently, most of these users just compile everything themselves in their home-directory, which is a huge waste of time, and storage space.

I would like to suggest Nix to the admin as a way to let these user-subgroups manage their own packages, but that in a well organized manner, that avoids redundant work, and storage. But, I'm not sure how exactly that should work. 

There are a few constraints:

  1. Unfortunately, NixOS/nixops is not an option. This will have to work with the currently installed cluster OS.
  2. Compilation should not put too much load on the login node. Ideally, build jobs would be referred to the compute nodes.
  3. Build jobs on the compute nodes should be managed by the sge.
  4. (Some) users should be allowed to initiate builds, and use their own overloads of packages, and extra packages.
  5. Some impurity is necessary. Be it for things that are hard to package (e.g. intel compiler), or for global state (mpi jobs).

My question to you: Do you think this is possible to achieve (within a reasonable time-frame), and how would you do it?

Here's what I have in mind so far (please feel free to take it apart if you think there is a better way):

Have a nix-store on the file-server, nfs mount that on all nodes (cached). The login node runs the nix-daemon. Builds are deferred to the grid-engine (how?) which are executed on the compute nodes, and store the results on the nfs mounted nix-store. Users would use `nix-env` on the login node to install software into their profile. This profile should be visible on all nodes, so that jobs can use those libraries and tools in the nix-profile. Things like myEnvFun should allow running jobs in different software environments simultaneously.

Best,

Andreas


More information about the nix-dev mailing list