[Nix-dev] Nix interpreter optimisation

Fri Jul 3 18:01:34 CEST 2009

Hi nixers,

Currently, the Nix interpreter posses a caching system which is highly
important in our usage of Nix.  When you run nix commands with
NIX_SHOW_STATS=1, you will see that the current caching system is
really efficient because it can cache ~40% of the evaluated
expressions.  This system is extremely nice and efficient.

This caching system does not change the semantic of the language
because Nix is pure.  This implies that all functions are reentrant
and that a caching system can be implemented safely.  Nix also
provides function to read and write files from/into the file system
and such operation cannot be cached except if you consider that these
operation will return the same result.  This means that the
interpreter cannot read two different states of one file (which is
safe).

My idea is to optimize such caching based on the idea that Nix is
pure.  So why not extending the caching through multiple evaluations?
The problem is that files and environment variables can change between
executions.

The solution is to prune all normal forms result computed on the path
between the root and the impure statement (impure = session dependent
result).  Thus only result which are not dependent on the evaluation
session can be cached through evaluations.  To implement my solution
you have to consider that all evaluations verify one hypothesis: No
evaluation of a node modify its siblings in the tree.  This statement
is verified in the Nix interpreter.  Then you can draw a tree where
each node represents a statement of your program and each edge
represents the call to the evalExpr function.  Draw a X on the right
side of each node which is an impure statement.

The algorithm that I am suggesting require to add a new variable
inside the state which is part of the evaluation session.  This
variable is a depth counter which is reset to 0 each time you evaluate
an impure statement.  The depth counter is decremented only when it is
not zero.  When this depth counter is zero, this means that the result
of the evaluated expression depends on the session.  Otherwise, the
result is independent of the session and can be cached for the next
evaluation.  Except a little problem caused by blobs (ATerm hidden
memory) which are used to refer to prim-op's functions when they are
stored in the tree.  This issue is solved with a boolean value which
is reset after the prim-op call.  This correspond to declare some
branch / leaves of the tree to be session dependent where the result
is still session independent.

To see how efficient such result can be, I have implemented this
algorithm (without the load/save of the caching system) and the result
are amazing:

$ NIX_SHOW_STATS=1 /tmp/nix-global-cache/bin/nix-build
/etc/nixos/nixos -A system --dry-run
evaluated 1662185 expressions, 669000 cache hits, 40.2482% efficiency,
used 55438984 ATerm bytes, used 211600 bytes of stack space,
981449 session independent reduction,
11736 session dependent statement.

-> 981449 / (981449 + 11736) = 98,8 %

$ NIX_SHOW_STATS=1 /tmp/nix-global-cache/bin/nix-env -i firefox --dry-run
evaluated 1819594 expressions, 715449 cache hits, 39.3192% efficiency,
used 70293760 ATerm bytes, used 216368 bytes of stack space,
1088461 session independent reduction,
15606 session dependent reduction.

-> 1088461 / (1088461 + 15606) = 98,5 %

(all time bench are made on the same computer which is a 800 MHz with
an old hard drive)

To go further, I've implemented the loading & saving of the cache into
a file.  The main issue is the size of the cache.  Loading and saving
1088461 key-value pairs is not an efficient way to save disk-space as
well as loading time.  So;
- I tried to serialized ATermMap into multiple ATerms (seen as
strings).  Unfortunately this takes more than 10 minutes to save the
cache.  So I've have to use ATerms maximal sharing to avoid redundant
content.
- I converted the ATermMap into a list of key-value pairs but the
ATerm library (that we are using) has some non-optimized tail
recursion on list serialization.
- I converted the ATermMap into a tree structure where each node has
an arity of 128 (because the ATerm library limits the arity of nodes)
and all leaves are either empty nodes or have an arity of 2 when a key
is map to a value.

Such serialization on the million of key-value pairs generate a 25 MB
file (thanks to the maximal sharing).  Unfortunately this takes almost
20 seconds to be read which is still less efficient than evaluating
the command without this feature.  So I've decided to add an
""heuristic"" based on nothing which has to reduce the number of saved
key-value pairs.  The heuristic should probably be based on figures
which are made on larger samples.  I've succeed to reduce the size to
8 MB (250000 key-value pairs), the evaluation time of 27% (from 22,6s
to 16,4s), and the ATerm memory of 17% (from 70 MB to 58 MB) on the
command "nix-env -i firefox --dry-run" with a simple heuristic which
only keeps evaluations which are taking more than 100 call of evalExpr
function.

Another future optimization, should be to rewrite the ATerm library to
handle the serialization of an ATermMap as a dump of the internal
memory.  Moreover, re-writing the ATerm library may help to get rid of
the garbage collector, which is not used because all terms are cached
for the implementation of the re-entrance, and also to support
concurrent evaluations, for multiple core CPU.

In a few days, I will create a branch named global-cache inside the
Nix repository in order to clean-up the interface, ensure the validity
of the evaluation result, and find better heuristics.

Cheers,

-- 
Nicolas Pierron
http://www.linkedin.com/in/nicolasbpierron