[Nix-dev] cabal install vs. libgcc_s.so.1 must be installed for pthread_cancel to work

Gergely Risko gergely at risko.hu
Tue Aug 12 03:53:03 CEST 2014


Hi,

Sorry for the long email, this is a somewhat complicated topic.

I hope both Eelco and Peter will find the time to read through though,
thanks! :)

I started to get "libgcc_s.so.1 must be installed for pthread_cancel to
work" errors randomly with the new parallel cabal install.

This is only an annoyance right now, because I can just rerun the
command and it will succeed sooner or later, since this is
non-deterministic, but I started to dig nevertheless.

It's not apparent to me that pthread_cancel is ever called by GHC
runtime itself, most probably it's a third party library that's issuing
the call.

Take this code as an example...

ctest.c:
#include <pthread.h>
#include <unistd.h>

static void *thread_func(void *ignored_argument) {
  sleep(100);
  return NULL;
}

void x() {
  pthread_t thr;
  pthread_create(&thr, NULL, &thread_func, NULL);
  pthread_cancel(thr);
}

test.hs:
{-# LANGUAGE ForeignFunctionInterface #-}

foreign import ccall "x" x :: IO ()

main :: IO ()
main = x >> putStrLn "end"

Compile with:
gcc -Wall -c ctest.c
/nix/store/8sdpn4z5skf3hpmaihg926v1ma1sw9zq-ghc-7.8.3/bin/ghc --make -fforce-recomp -threaded test ctest.o

Or you can also do:
gcc -Wall -c ctest.c
/opt/ceh/bin/ghc --make -fforce-recomp -threaded test ctest.o
if you have http://github.com/nilcons/ceh installed.

With the first version, when we run the executable, we have:
$ ./test
libgcc_s.so.1 must be installed for pthread_cancel to work
Aborted
$ strace -e file ./test 2>&1 | tail -n 5
libgcc_s.so.1 must be installed for pthread_cancel to work
open("/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = 10
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=24640, si_uid=1000} ---
+++ killed by SIGABRT +++

So libgcc_s.so.1 is searched for in glibc-2.19/lib/libgcc_s.so.1 instead
of searching in the gcc closure.

With the second version:
$ ./test
end
$ strace -e file ./test 2>&1 | grep libgcc_s
open("/nix/store/dsfs84981xvlilg0kzia7rgab69w15mc-gmp-5.1.3/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/i686/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/i686/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/i686/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/i686/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3

Now libgcc_s is searched for everywhere correctly.  This is because:
$ ldd ./test
	linux-gate.so.1 (0xf776d000)
	libgmp.so.10 => /nix/store/dsfs84981xvlilg0kzia7rgab69w15mc-gmp-5.1.3/lib/libgmp.so.10 (0xf76e8000)
	libm.so.6 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libm.so.6 (0xf76a6000)
	librt.so.1 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/librt.so.1 (0xf769e000)
	libdl.so.2 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libdl.so.2 (0xf769a000)
	libpthread.so.0 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libpthread.so.0 (0xf7680000)
	libgcc_s.so.1 => /nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/libgcc_s.so.1 (0xf7664000)
	libc.so.6 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libc.so.6 (0xf74ce000)
	/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf776e000)
libgcc_s is now linked into the binary.

This is done by the hack here:
https://github.com/nilcons/ceh/blob/master/lib/Packages/GHC.pm#L29

Basically we add -lgcc_s to every linker command when running GHC in
Ceh.  Further complication is caused by the fact that there is no
libgcc_s.a, so we have to create a fake one, see the details in the ceh
repository if you're interested.

We ship this hack in Ceh, because we've been seen this problem with our
in-house proprietary long running, threaded Haskell applications before.
But now apparently this is happening in cabal-install itself too, so
maybe we should fix this in Nix itself once and for all.

I see the following possible solutions to this problem:
  - adopt the hack from Ceh to the GHC wrapper in some form,
  - ship libgcc_s.so with glibc (copy it there from the gcc distribution
    that we use to compile the glibc in the first place),
  - patch glibc to look for libgcc_s.so everywhere where the normal
    dynamic linker would look (LD_LIBRARY_PATH, current binary's rpath,
    etc.).

Option #1 seems kludgy and will only help Haskell apps, not the whole
NixOS ecosystem.

Easiest and cleanest is most probably #2 from the above.

#3 is somewhat backwards, as gcc depends on glibc and not the other way
around, so this would mean depending on the libgcc_s that was installed
in the gcc that compiled the glibc.  But this would keep that stage 0
gcc from being garbage collected after compiling the glibc, seems silly
for only this one libgcc_s.so, if we just copy it out instead of keeping
the whole gcc, then we get back option #2.

So I propose option #2.

Should I go ahead and try to create the pull request for that, so it can
be pushed to stdenv-changes and then to master with the next merge?

Any other ideas, comments, concerns, opinions?

Thanks,
Gergely



More information about the nix-dev mailing list