[Nix-dev] Dependency Semantics

Thu Jul 12 20:40:18 CEST 2012

Bryce L Nordgren <bnordgren at gmail.com> writes:

>     I'm not saying you shouldn't. But the current build system finds out
>     about the breakage, so the maintainer can investigate and put the extra
>     dependency there. You have to understand that most package expressions
>     are probably created as a bare minimum, just adding the stuff the
>     builder/installer complains about. This works fine most of the time.
>     So in this case, where (up till now) just depending on python sufficed,
>     there's probably only python in the dependency list. For all we know,
>     the python maintainer will take care all the optional parts get built
>     anyway. If this changes in the future, I would like to find out
>     automatically.
>
> Well, I think two different cases are getting mixed up here. I'm talking about a rebuild of the exact same python Nix expression, and you're
> talking about the python Nix expression evolving over time. It's certainly possible to treat the two cases differently, probably even
> desirable.

That's not what I was talking about, the python expression didn't change
in my example. Just a (made up) underlying graphics library.
That triggers a python rebuild, which can't use the new gfx version, so
it builds without bindings for it.
Then a package on top of that gets triggered to rebuild, and its
upstream configure/build script detects the missing bindings and fails.

If that package had a "weak" dependency on python, it wouldn't rebuild,
so its build scripts would not alarm us something is wrong.

>
> For instance, python code could have a "recompile only when the python version is changed"  semantics. Or "recompile whenever the python nix
> expression changes in any way". Either of these would put a halt to cascading pointlessness when a third generation dependency of the python
> interpreter causes the same interpreter to be rebuilt; both would provide the automatic checks you desire when something important changes
> about the environment.

That doesn't help for my example case as the python expression itself
isn't upgraded or changed. So then the semantic would need to become "if
python, or any of its inputs change", which sounds a lot like current behaviour.

>  
>  
>
>     This sounds like a problem to me, one that the current semantics solve
>     by saying "if any inputs change, we want to make sure everything is
>     still fine (up to the level that upstream build/test scripts would
>     accept).
>
> The new semantics really only apply to these environments intended to serve as an isolation layer between the host system and code which
> executes in the environment. I realize none of them will provide perfect isolation and you can all find exceptions to the rule. However, many
> codes can run unchanged on Windows/Linux/Mac: in java's case, many compiled binaries can run unchanged on Linux/Windows/Mac and it is actually
> pretty rare to rebuild from source. A finished java app is commonly compiled by a variety of JDK versions on a variety of different platforms.
> In any case, the exact version of glibc (and a host of other 2nd, 3rd ... generation dependencies) on the system is pretty irrelevant--windows
> doesn't even have glibc at all!

I agree on java. I think java builds a solid abstraction which really
shields the host system off from packages using it.

But I don't believe the same can be said about any other environment.

>
> The relevant question is: What is more common--that the isolation is effective or that it isn't? So, let me quote you:
>
> "You have to understand that most package expressions are probably created as a bare minimum, just adding the stuff the builder/installer
> complains about." 
>
> I would suggest that the "bare minimum" is to declare that a python package depends on python with "recompile when the python version number
> changes" semantics. If a particular package proves problematic, strengthen the dependency semantics for that package to depend on a specific
> python build. Then it's always rebuilt.

Nope, the bare minimum is to just leave all detection to _upstream_
scripts. 
Choosing a "weak by default" path basically leads to bypassing all work
done in those scripts (optimizing, using workarounds for certain
versions of underlying deps) and saying "I know better, this package
won't run any different on these new inputs".

As I argued, weakening should only be applied when you (as a maintainer)
know absolutely sure it is safe. You know all details about the upstream
build system and the checks it performs, all (implicit) dependencies,
and you check the changelog for changes before bumping a version.

Weak dependency == manually managing dependencies.

It should not be default in any case or template (except for java maybe,
or binary proprietary packages) because I'm pretty sure most maintainers
don't (and don't want to) investigate their package's build scripts and
dependencies to that deep level.

>
> You still get a full rebuild every time the environment is upgraded, but you don't rebuild everything whenever the same environment is
> rebuilt. 

The same environment is never rebuilt. When something rebuilds, it means
underlying inputs changed. I know that in a lot of cases, the change is
small and isn't going to affect anything (like adding an extra man
page), but in other cases, when (parent/grantparent) dependencies
change, we really want to evaluate the effects these have on everything
built on top of it.

The problem with weak semantics is that you just cannot tell what
underlying change _is_ important and which isn't.

Think of when glibc added support for ipv6... 
With weak semantics, all packages that run _on_ python, or _on_ ruby
wouldn't rebuild, because python and ruby themselves didn't upgrade
version and their expressions stayed the same.
Maybe packages check at runtime for ipv6 support, but it's perfectly
reasonable for a package to not build/install ipv6 stuff (like a gui for
entering addresses) when it detects it's not available at install time.

So, this would mean adding an exception for glibc (force trigger
rebuilds for weak deps when glibc is involved), or having maintainers
specify that their package might use ipv6, and flag glibc versions with
"ipv6 capable" so to trigger just those rebuilds. Sounds like a lot of
work to me, and probably stuff will slip through. 

This too, is just an example, I'm not saying this has been spotted "in
the wild", but the problem is that I can come up with many more
examples that sound reasonable. This means maintainers would need to
carefully watch what might influence their package (like ipv6 support in
glibc) and meta-decorate their package with all these cases that
validate a rebuild even on weak dependency.

And even then, there are always things you don't think of up front.
The only way to know is to find out what checks/decisions/macros
upstream makes during build/installation.

> Also anything with "native code" (linking directly against a system library, subverting the isolation layer) should have a dependency
> on the specific build of the library it links to, causing a rebuild of that module as well as any other python modules which depend on it. As
> you said, the Nix builder will fail if the system library is missing,
> so the packager will be forced to add it.

As my new example made clear, it's not just libraries at play here.
Many decisions can be taken at build-time and it's a hell of a job to
find out what/why for every package, just to specify when rebuilding can
be skipped.

Mathijs

>
> Bryce
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
Url : http://lists.science.uu.nl/pipermail/nix-dev/attachments/20120712/5b2dab2b/attachment.bin