[Nix-dev] Re: [Nix-commits] SVN commit: nix - 19980 - viric -nix/branches

Thu Feb 25 11:30:52 CET 2010

Hi,

Nicolas Pierron wrote:

> 2010/2/14 Sander van der Burg - EWI <S.vanderBurg at tudelft.nl>:
>> So, what are you guys going to do? Implementing a clean room implementation
>> of ATerm with the same interfaces or forking ATerm? And if so, why should
>> this ATerm version developed inside the Nix repository?
> 
> Nix does not need ATerm, Nix need maximal sharing to implements
> maximal laziness.

Having given this some more thought, I think that creating another term library
to replace ATerm won't solve our problems at all (and it's rather reinventing
the wheel).  This is because those problems are caused not by the ATerm library
but by the evaluation strategy of the Nix expression evaluator.

This evaluation strategy is very simple, since it's entirely based on
substituting terms in other terms.  For instance, a beta redex ((x: E1) E2) is
rewritten to (E1 [x := E2]) (i.e. every occurrence of `x' in E1 is replaced by
E2).  Such a strategy is normally very slow because it leads to work
duplication: if `x' is used multiple times, then E2 will be evaluated multiple
times.  However, thanks to the normal form cache, we can efficiently see whether
we've already evaluated E2.  This is very simple to implement (just a few lines
of code), but it doesn't scale because the normal form cache prevents garbage
collection.  Using a different term implementation doesn't fix this:

- The normal form cache will still grow without bounds.

- You still spend lots of time doing substitutions.  (According to Valgrind, Nix
spends up to 50% of its time in the substitute() function, and 30% or so of the
allocations originate there.)

A more conventional way to interpret a functional language is to have an
explicit environment mapping variables in scope to their values.  I.e., when you
evaluate ((x: E1) E2), you add [x := E2] to the environment and evaluate E1.  If
`x' is needed, you fetch its value E2 from the environment, evaluate it to its
normal form E2', and update the environment with [x := E2'] so that you don't
need to evaluate `x' again.  Or you compile to some byte-code representation
where variables point to closures in the heap that are overwritten by their
normal forms when entered.  I'm sort of leaning towards that approach (compile
to Parrot or Guile or whatever).

-- 
Eelco Dolstra | http://www.st.ewi.tudelft.nl/~dolstra/