[Nix-dev] cabal install vs. libgcc_s.so.1 must be installed for pthread_cancel to work
Gergely Risko
gergely at risko.hu
Tue Aug 12 03:53:03 CEST 2014
Hi,
Sorry for the long email, this is a somewhat complicated topic.
I hope both Eelco and Peter will find the time to read through though,
thanks! :)
I started to get "libgcc_s.so.1 must be installed for pthread_cancel to
work" errors randomly with the new parallel cabal install.
This is only an annoyance right now, because I can just rerun the
command and it will succeed sooner or later, since this is
non-deterministic, but I started to dig nevertheless.
It's not apparent to me that pthread_cancel is ever called by GHC
runtime itself, most probably it's a third party library that's issuing
the call.
Take this code as an example...
ctest.c:
#include <pthread.h>
#include <unistd.h>
static void *thread_func(void *ignored_argument) {
sleep(100);
return NULL;
}
void x() {
pthread_t thr;
pthread_create(&thr, NULL, &thread_func, NULL);
pthread_cancel(thr);
}
test.hs:
{-# LANGUAGE ForeignFunctionInterface #-}
foreign import ccall "x" x :: IO ()
main :: IO ()
main = x >> putStrLn "end"
Compile with:
gcc -Wall -c ctest.c
/nix/store/8sdpn4z5skf3hpmaihg926v1ma1sw9zq-ghc-7.8.3/bin/ghc --make -fforce-recomp -threaded test ctest.o
Or you can also do:
gcc -Wall -c ctest.c
/opt/ceh/bin/ghc --make -fforce-recomp -threaded test ctest.o
if you have http://github.com/nilcons/ceh installed.
With the first version, when we run the executable, we have:
$ ./test
libgcc_s.so.1 must be installed for pthread_cancel to work
Aborted
$ strace -e file ./test 2>&1 | tail -n 5
libgcc_s.so.1 must be installed for pthread_cancel to work
open("/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = 10
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=24640, si_uid=1000} ---
+++ killed by SIGABRT +++
So libgcc_s.so.1 is searched for in glibc-2.19/lib/libgcc_s.so.1 instead
of searching in the gcc closure.
With the second version:
$ ./test
end
$ strace -e file ./test 2>&1 | grep libgcc_s
open("/nix/store/dsfs84981xvlilg0kzia7rgab69w15mc-gmp-5.1.3/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/i686/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/i686/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/tls/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/i686/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/i686/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/sse2/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
Now libgcc_s is searched for everywhere correctly. This is because:
$ ldd ./test
linux-gate.so.1 (0xf776d000)
libgmp.so.10 => /nix/store/dsfs84981xvlilg0kzia7rgab69w15mc-gmp-5.1.3/lib/libgmp.so.10 (0xf76e8000)
libm.so.6 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libm.so.6 (0xf76a6000)
librt.so.1 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/librt.so.1 (0xf769e000)
libdl.so.2 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libdl.so.2 (0xf769a000)
libpthread.so.0 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libpthread.so.0 (0xf7680000)
libgcc_s.so.1 => /nix/store/7whfi1pd7bqcy5w9s07ak93348xlcg9h-gcc-4.8.3/lib/libgcc_s.so.1 (0xf7664000)
libc.so.6 => /nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/libc.so.6 (0xf74ce000)
/nix/store/jllh2r8dbjhl513ljgips79yld9mnf0h-glibc-2.19/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf776e000)
libgcc_s is now linked into the binary.
This is done by the hack here:
https://github.com/nilcons/ceh/blob/master/lib/Packages/GHC.pm#L29
Basically we add -lgcc_s to every linker command when running GHC in
Ceh. Further complication is caused by the fact that there is no
libgcc_s.a, so we have to create a fake one, see the details in the ceh
repository if you're interested.
We ship this hack in Ceh, because we've been seen this problem with our
in-house proprietary long running, threaded Haskell applications before.
But now apparently this is happening in cabal-install itself too, so
maybe we should fix this in Nix itself once and for all.
I see the following possible solutions to this problem:
- adopt the hack from Ceh to the GHC wrapper in some form,
- ship libgcc_s.so with glibc (copy it there from the gcc distribution
that we use to compile the glibc in the first place),
- patch glibc to look for libgcc_s.so everywhere where the normal
dynamic linker would look (LD_LIBRARY_PATH, current binary's rpath,
etc.).
Option #1 seems kludgy and will only help Haskell apps, not the whole
NixOS ecosystem.
Easiest and cleanest is most probably #2 from the above.
#3 is somewhat backwards, as gcc depends on glibc and not the other way
around, so this would mean depending on the libgcc_s that was installed
in the gcc that compiled the glibc. But this would keep that stage 0
gcc from being garbage collected after compiling the glibc, seems silly
for only this one libgcc_s.so, if we just copy it out instead of keeping
the whole gcc, then we get back option #2.
So I propose option #2.
Should I go ahead and try to create the pull request for that, so it can
be pushed to stdenv-changes and then to master with the next merge?
Any other ideas, comments, concerns, opinions?
Thanks,
Gergely
More information about the nix-dev
mailing list