[Nix-dev] Use Haskell for Shell Scripting

Sat Jan 31 13:22:09 CET 2015

> At this current point in time, GHC is packaged in a poor manner, with
> GHC being unbelievably huge. Dynamic linking is the answer, which
> isn't done by default.

I have actually experimented with using Haskell (and a few other FP
languages) as a substitute for shells.  It is feasible if you disable
dynamic linking.  The non-Haskell libraries are still linked
dynamically, but the reference to the GHC derivation is then gone.  This
brings the closure of a Haskell hello-world "script" from a huge 1.1 GiB
down to a mere 131 MiB (on my x86_64 system), which makes it on par with
shell scripts.

However, static linking is probably not a good idea.  The resulting
"scripts" are on the order of megabytes and can quickly approach a few
tens of them.  To really fix this and make Haskell viable as a shell
substitute we need to split the GHC derivation.  There should be a pure
library derivation and a separate compiler derivation.  The former
should be as small as possible.  Ideally there would be one derivation
per library.

The other languages I have tried are Scheme (via Chicken), Curry (via
PAKCS), SML (via mlton) and Idris.

Before I present my results, let me clarify what I think a "script" is:
It is a string that I can run through a simple Nix function, which gives
me a derivation that contains a runnable version of that string, either
binary or shebanged.  This derivation pulls a reasonably sized closure
along with it.  I can choose to combine many such runnable scripts to a
single derivation using buildEnv, which is often very useful.  In other
words:  For the language "blah" there is a simple, deterministic,
unconfigurable function that would have the following signature in a
hypothetical typed Nix:

    blahScript : String -> Derivation

This function can be a special case of a slightly more powerful function
that takes a directory and a main entry point, because if we choose to
use a better language, we might as well choose to utilise its module
system, if it has one, for some of our larger scripts.

Now to my results:  All of the above languages, except Curry, work more
or less, if all you need to do is to start programs or move files
around.  As soon as you need to do operating-system-specific stuff
(e.g. `unshare` on Linux) it gets less juicy, because unless someone has
written a nice high-level library you need to touch the FFI.

Chicken Scheme worked best for that, because rather than trying to model
the syscall in the language, you can just dump C code into it.  Not a
nice and clean solution, but a working one for the many cases when you
just need to -- you know -- get stuff done.

Haskell works, because lots of the OS bindings can be found on Hackage,
including Linux-specific libraries.  But it does require a slightly more
expressive 'haskellScriptWith' function.  You need to be able to tell it
what you depend on.

SML works and produces surprisingly small executables.  It loses at the
library end, because there aren't many OS-specific libraries around (or
I couldn't find them).  Also some of the advanced FFI tooling that I'm
used to from Haskell seems to be missing.  Finally I would say that the
syntax is too verbose for quick scripting (but that's subjective -- I
have seen people use VB.NET for scripting).

You might be interested why Curry didn't work.  Simple: I couldn't
figure out how to write a program.  Actually I went through the whole
tutorial, did all the exercises (they aren't really difficult to a
Haskell programmer) and then skimmed through the whole PAKCS manual.  I
could write extremely elegant algorithmic code and was quite amazed at
the beauty of this language, even compared to Haskell.  But in the end I
still didn't know how to turn all this beautiful Curry code into an
executable file that I can run without invoking PAKCS explicitly.
Something with a shebang or ideally something binary.  It would probably
be possible to write wrapper scripts, but let's just wait until one of
the implementations becomes mature enough for systems programming.

Finally there is Idris.  It is a beautiful language that comes with
reasonable editor integration and a lightweight syntax.  It compiles to
executable binary code and has a carefully designed yet useful FFI.
Sounds good for scripting.  On the other hand it is very young and
documentation is far from mature.  Not that I would mind its youth, but
I do mind the barrier to entry at this point.  At the very least when
other authors don't understand my code, it should be reasonably obvious
where to look for answers.  Also the library landscape is very flat, so
bootstrapping might use most of your time, if you choose to use Idris
for systems-level scripting at this point.

The most viable options seem to be Chicken Scheme and Haskell.  Both are
well documented and have a usable FFI.  Chicken produces much smaller
executables, and programs are very memory-efficient.  By design it
compiles via C; because of that instead of providing a carefully
designed FFI it simply allows you to dump C code into it in the spirit
of inline assembly.  This may seem poor, but it is very useful in
practice for systems programming, because even in 2015 our operating
system are very C-centric.

Haskell performs better on the large scale.  It comes with lots of
well-designed and safe abstractions, usually gets along with shorter
code, has a good run-time system (e.g. for concurrency), etc.

All in all while I would use Scheme for small quick-and-dirty batch
scripts, I would use Haskell for larger scripts or services that
potentially run for a long time.  But there is no formal line to help
with this choice.  It would take a while of experimentation to provide a
more educated answer on when to use which language.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
Url : http://lists.science.uu.nl/pipermail/nix-dev/attachments/20150131/545a9737/attachment.bin