|
Prev: using sed to modify html
Next: Linking with a particular shared library by full path overriding LD_LIBRARY_PATH
From: Michael B Allen on 10 Apr 2008 17:03 Hello CUP, One of my clients is seeing an FPE in ld-linux-x86-64.so when loading a custom dso in Apache (actually httpd is dlopen-ing mod_php5.so which is dlopen-ing my dso which is dynamically linked with some other libs). The backtrace is inlined below. At least I assume it's blowing up loading my dso - with my dso disabled, httpd loads and runs ok. # gdb /usr/sbin/httpd2-prefork GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"... (no debugging symbols found) Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) run -X Starting program: /usr/sbin/httpd2-prefork -X (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [snipped] [Thread debugging using libthread_db enabled] [New Thread 47783686579872 (LWP 11366)] (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [snipped] Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 47783686579872 (LWP 11366)] 0x00002b758030468f in do_lookup_x () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 0x00002b758030468f in do_lookup_x () from /lib64/ld-linux-x86-64.so.2 #1 0x00002b7580304a77 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #2 0x00002b7580306028 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x00002b758030c2a5 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #4 0x00002b75803081f6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #5 0x00002b758030bacb in _dl_open () from /lib64/ld-linux-x86-64.so.2 #6 0x00002b75811961fa in dlopen_doit () from /lib64/libdl.so.2 #7 0x00002b75803081f6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #8 0x00002b758119658d in _dlerror_run () from /lib64/libdl.so.2 #9 0x00002b7581196171 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2 #10 0x00002b7582edf486 in php_dl () from /usr/lib64/apache2/mod_php5.so #11 0x00002b7582f3dfe3 in ?? () from /usr/lib64/apache2/mod_php5.so #12 0x00002b7582f6e937 in zend_llist_apply () from /usr/lib64/apache2/mod_php5.so #13 0x00002b7582f3dfaa in php_ini_register_extensions () from /usr/lib64/apache2/mod_php5.so #14 0x00002b7582f388bb in php_module_startup () from /usr/lib64/apache2/mod_php5.so #15 0x00002b7582ff5825 in ?? () from /usr/lib64/apache2/mod_php5.so #16 0x00002b7582ff58ad in ?? () from /usr/lib64/apache2/mod_php5.so #17 0x000055555558c93c in ap_run_post_config () from /usr/sbin/httpd2-prefork #18 0x0000555555579fd7 in main () from /usr/sbin/httpd2-prefork (gdb) q The program is running. Exit anyway? (y or n) y The client is running SUSE server on "big" hardware. I have an OpenSUSE 10.3 install here and the extension loads and runs perfectly. I have many other people using the extension and no one has reported this problem. I have recompiled *everything* with no change. It seems to be this one particular installation. Ldd on all binaries yields the expected "ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped" with the exception of my extension which is "not stripped". My dso is very small but it is linked with a largish (5MB) .so created from numerous .a archives of -fpic compiled code. Also, the big lib the dso is linked with is created using a version script so that only the handful of symbols used by the extension are exported. So httpd loads mod_php.so which calls dlopen on my dso and at that point an FPE occurs in do_lookup_symbol_x. Could it be that the loader is trying to relocate something that is not relocatable because of how things are linked? Has anyone seen anything like this before? Can anyone recommend a method for debugging this issue? Could this be an issue which the linux loader? Any help would be appreciated, Mike
From: Charles Coldwell on 10 Apr 2008 21:08 Michael B Allen <miallen(a)ioplex.com> writes: > Hello CUP, > > One of my clients is seeing an FPE in ld-linux-x86-64.so when loading a > custom dso in Apache (actually httpd is dlopen-ing mod_php5.so which is > dlopen-ing my dso which is dynamically linked with some other libs). The > backtrace is inlined below. I ran into something similar that turned out to be an ABI incompatibility across Linux/glibc versions. I think the specific issue is the change from a DT_HASH table to a DT_GNU_HASH table. Did you build your DSO on precisely the same architecture/OS version that they are loading it on? > The client is running SUSE server on "big" hardware. I have an OpenSUSE > 10.3 install here and the extension loads and runs perfectly. Which SuSE is the client running? > I have many other people using the extension and no one has reported > this problem. I have recompiled *everything* with no change. Recompiled on which platform? > Could it be that the loader is trying to relocate something that is not > relocatable because of how things are linked? Most likely, the dynamic loader is puking when it doesn't find DT_HASH. Chip -- Charles M. "Chip" Coldwell "Turn on, log in, tune out" GPG Key ID: 852E052F GPG Key Fingerprint: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F
From: Michael B Allen on 10 Apr 2008 23:05
On Fri, 11 Apr 2008 01:08:41 GMT Charles Coldwell <coldwell(a)gmail.com> wrote: > Michael B Allen <miallen(a)ioplex.com> writes: > > > Hello CUP, > > > > One of my clients is seeing an FPE in ld-linux-x86-64.so when loading a > > custom dso in Apache (actually httpd is dlopen-ing mod_php5.so which is > > dlopen-ing my dso which is dynamically linked with some other libs). The > > backtrace is inlined below. > > I ran into something similar that turned out to be an ABI > incompatibility across Linux/glibc versions. I think the specific issue > is the change from a DT_HASH table to a DT_GNU_HASH table. Did you > build your DSO on precisely the same architecture/OS version that they > are loading it on? Hi Chip, Yup. Right after posting this I googled on "SIGFPE do_lookup_x" and saw a number of people ran into the .hash.gnu issue. I have relinked with -Wl,--hash-style=both and verified using objdump -h that the shared lib and dso now contain the .hash section as well as the .hash.gnu section. I'm waiting for the client to try the new package. > > The client is running SUSE server on "big" hardware. I have an OpenSUSE > > 10.3 install here and the extension loads and runs perfectly. > > Which SuSE is the client running? Not sure. It's 10.something. > > I have many other people using the extension and no one has reported > > this problem. I have recompiled *everything* with no change. > > Recompiled on which platform? CentOS 5.0 on x86_64 which uses .hash.gnu by default (contrary to what the ld man page says). I'll follow up with the definitive result. Thanks, Mike |