From: Andrew Poelstra on
On 2010-01-12, cerr <ron.eggler(a)gmail.com> wrote:
> On Jan 12, 2:07�pm, cerr <ron.egg...(a)gmail.com> wrote:
>> On Jan 12, 2:02�pm, John Gordon <gor...(a)panix.com> wrote:> In <e304a543-a511-4c8a-9179-be00acaaf...(a)k17g2000yqh.googlegroups.com> cerr <ron.egg...(a)gmail.com> writes:
>>
>> > > I just saw that I got a SIGABRT - but that twice only... :o may this
>> > > be a clue?
>> > > How would my process be getting a SIGABRT? Any clues? :o
>>
>> > SIGABRT is raised by calling the abort() system call, or when an assert()
>> > evaluates to false.
>>
>> > Does your program contain any abort() or assert() calls?
>>
>> Yes, there's abort() calls in a file called memwatch.c - I guess I
>> should have a look at this one...
>
> Oh look there, it's a huge file and in the header it says:
>
> ** MEMWATCH.C
> ** Nonintrusive ANSI C memory leak / overwrite detection
> ** Copyright (C) 1992-2001 Johan Lindh
> ** All rights reserved.
> ** Version 2.67
>
> o after all I probably sent myself these signals...Does anyone know
> anything about this memwatch.c file - gotta make myself smart 1st :)?

I don't know anything about memwatch.c but I bet if you added a printf
before each abort() call, you'd be able to learn something about what's
going wrong.

From: cerr on
On Jan 12, 2:16 pm, Andrew Poelstra <apoels...(a)localhost.localdomain>
wrote:
> On 2010-01-12, cerr <ron.egg...(a)gmail.com> wrote:
>
>
>
>
>
> > On Jan 12, 2:07 pm, cerr <ron.egg...(a)gmail.com> wrote:
> >> On Jan 12, 2:02 pm, John Gordon <gor...(a)panix.com> wrote:> In <e304a543-a511-4c8a-9179-be00acaaf...(a)k17g2000yqh.googlegroups.com> cerr <ron.egg....(a)gmail.com> writes:
>
> >> > > I just saw that I got a SIGABRT - but that twice only... :o may this
> >> > > be a clue?
> >> > > How would my process be getting a SIGABRT? Any clues? :o
>
> >> > SIGABRT is raised by calling the abort() system call, or when an assert()
> >> > evaluates to false.
>
> >> > Does your program contain any abort() or assert() calls?
>
> >> Yes, there's abort() calls in a file called memwatch.c - I guess I
> >> should have a look at this one...
>
> > Oh look there, it's a huge file and in the header it says:
>
> > ** MEMWATCH.C
> > ** Nonintrusive ANSI C memory leak / overwrite detection
> > ** Copyright (C) 1992-2001 Johan Lindh
> > ** All rights reserved.
> > ** Version 2.67
>
> > o after all I probably sent myself these signals...Does anyone know
> > anything about this memwatch.c file - gotta make myself smart 1st :)?
>
> I don't know anything about memwatch.c but I bet if you added a printf
> before each abort() call, you'd be able to learn something about what's
> going wrong.

Yup, put a couple of syslog commands in there, but there's probably
still something else going on cause i can't find any kill statements
or anything... :(
From: Ersek, Laszlo on
In article <cb714bed-32c7-4ed6-a3c7-b6f8a02c9c8f(a)j24g2000yqa.googlegroups.com>, cerr <ron.eggler(a)gmail.com> writes:

> This GDB was configured as "i586-linux-uclibc".

> [root(a)DEVNEMS logrecord]# ldd prs
> libpthread.so.0 =3D> /lib/libpthread.so.0 (0xb7f58000)
> libssl.so.0.9.7 =3D> /usr/lib/libssl.so.0.9.7 (0xb7f31000)
> librt.so.0 =3D> /lib/librt.so.0 (0xb7f2f000)
> libstdc++.so.6 =3D> /lib/libstdc++.so.6 (0xb7ebc000)
> libm.so.0 =3D> /lib/libm.so.0 (0xb7eae000)
> libgcc_s.so.1 =3D> /lib/libgcc_s.so.1 (0xb7ea6000)
> libc.so.0 =3D> /lib/libc.so.0 (0xb7e5a000)
> libcrypto.so.0.9.7 =3D> /usr/lib/libcrypto.so.0.9.7 (0xb7d8b000)
> libdl.so.0 =3D> /lib/libdl.so.0 (0xb7d88000)
> ld-uClibc.so.0 =3D> /lib/ld-uClibc.so.0 (0xb7f6d000)
>
> I'm not quite certain what this would tell us tho... :(

At least it allows for some wild speculation :)

First, memwatch:

http://www.linkdata.se/memwatch

I'd risk after a very superficial look at memwatch that it does some
nifty signals hacking. If your program is multi-threaded, that's not
very easy. The memwatch USING itself says:

Is this stuff thread-safe?

I doubt it. As of version 2.66, there is rudimentary support
for threads, if you happen to be using Win32 or if you have
pthreads. Define WIN32 or MW_PTHREADS to signify this fact.

This will cause a global mutex to be created, and memwatch
will lock it when accessing the global memory chain, but it's
still far from certified threadsafe.

Second, you use uclibc. I have no idea whether memwatch was developed
for / tested with uclibc. I'd say try building the app with memwatch
disabled. I don't know if uclibc ships its own pthreads implementation,
but if so, it may use some signals internally. (At least before NPTL,
glibc used LinuxThreads which utilized some realtime (queued) signals.
Or so I remember.)

Third, before you start the application in gdb, set breakpoints at
pthread_kill(), kill(), and raise(). (You may need system library debug
symbols for this.) Whenever you stop in one of them, get a backtrace.
Some system library (eg. the pthreads implementation) might detect such
a mess that it has no choice but to kill the process.

Fourth, are you sure your kernel and syslog are configured for maximum
verbosity? Did you check all syslog files, dmesg etc?

Did the app always behave like this? Didn't you change platforms
recently or so? Did you go multi-threaded recently?

Good luck,
lacos
From: guenther on
On Jan 12, 10:25 am, sc...(a)slp53.sl.home (Scott Lurndal) wrote:
> "guent...(a)gmail.com" <guent...(a)gmail.com> writes:
....
> >There are situations under which the kernel will send SIGKILL to a
> >process.  Others have mentioned the Linux OOM killer; a more rarely
> >seen one is if you have a CPU-time resource hard limit set (such as
> >via the ulimit shell-builtin) then the kernel will send the process a
> >SIGKILL when the limit is reached.
>
> I think the cpu hard limit sends SIGXCPU, not SIGKILL.

SIGXCPU is sent when you reach the *soft* limit; SIGKILL when you read
the hard limit.

At least that's what the setrlimit() manpage and kernel sources say on
the RHEL5 system I'm looking at.


Philip Guenther
From: cerr on
On Jan 12, 5:18 pm, la...(a)ludens.elte.hu (Ersek, Laszlo) wrote:
> In article <cb714bed-32c7-4ed6-a3c7-b6f8a02c9...(a)j24g2000yqa.googlegroups..com>, cerr <ron.egg...(a)gmail.com> writes:
>
> > This GDB was configured as "i586-linux-uclibc".
> > [root(a)DEVNEMS logrecord]# ldd prs
> >         libpthread.so.0 =3D> /lib/libpthread.so.0 (0xb7f58000)
> >         libssl.so.0.9.7 =3D> /usr/lib/libssl.so.0.9.7 (0xb7f31000)
> >         librt.so.0 =3D> /lib/librt.so.0 (0xb7f2f000)
> >         libstdc++.so.6 =3D> /lib/libstdc++.so.6 (0xb7ebc000)
> >         libm.so.0 =3D> /lib/libm.so.0 (0xb7eae000)
> >         libgcc_s.so.1 =3D> /lib/libgcc_s.so.1 (0xb7ea6000)
> >         libc.so.0 =3D> /lib/libc.so.0 (0xb7e5a000)
> >         libcrypto.so.0.9.7 =3D> /usr/lib/libcrypto.so.0.9.7 (0xb7d8b000)
> >         libdl.so.0 =3D> /lib/libdl.so.0 (0xb7d88000)
> >         ld-uClibc.so.0 =3D> /lib/ld-uClibc.so.0 (0xb7f6d000)
>
> > I'm not quite certain what this would tell us tho... :(
>
> At least it allows for some wild speculation :)

aha, hehe :)

> First, memwatch:
>
> http://www.linkdata.se/memwatch
>
> I'd risk after a very superficial look at memwatch that it does some
> nifty signals hacking. If your program is multi-threaded, that's not
> very easy. The memwatch USING itself says:
>
> Is this stuff thread-safe?
>
>         I doubt it. As of version 2.66, there is rudimentary support
>         for threads, if you happen to be using Win32 or if you have
>         pthreads. Define WIN32 or MW_PTHREADS to signify this fact.
>
>         This will cause a global mutex to be created, and memwatch
>         will lock it when accessing the global memory chain, but it's
>         still far from certified threadsafe.

I'm using multiple threads, yes and i am using pthread on Linux. So
the global mutexes would be able to lock-up these things but then
again, it certainly is suspecious that I added another thread with
lots of dynamic memory allocating when it starts receiving SIGKILLs..

>
> Second, you use uclibc. I have no idea whether memwatch was developed
> for / tested with uclibc. I'd say try building the app with memwatch
> disabled. I don't know if uclibc ships its own pthreads implementation,
> but if so, it may use some signals internally. (At least before NPTL,
> glibc used LinuxThreads which utilized some realtime (queued) signals.
> Or so I remember.)

That would be the next step - to build it without memwatch, yes! As I
wrote these replys from the bottom up I would luke to see if my
breakpoints kick-in succesfully first - because if i receive SIGKILLs
without the breakpoints kicking-in, it comes from something else we
can say safely, right?

> Third, before you start the application in gdb, set breakpoints at
> pthread_kill(), kill(), and raise(). (You may need system library debug
> symbols for this.) Whenever you stop in one of them, get a backtrace.
> Some system library (eg. the pthreads implementation) might detect such
> a mess that it has no choice but to kill the process.

Well, i connected to the remote target, hit continue and ctrl-c-ed
back to the (gdb) prpmt in order to add breakpoints like:
(gdb) break pthread_kill()
Function "pthread_kill()" not defined.
Make breakpoint pending on future shared library load? (y or [n])
(gdb) break kill()
Function "kill()" not defined.
Make breakpoint pending on future shared library load? (y or [n])
(gdb) break raise()
Function "raise()" not defined.
Make breakpoint pending on future shared library load? (y or [n])
(gdb) continue
Continuing.

I have very little experience with gdb and hope i did this
correctly...?

> Fourth, are you sure your kernel and syslog are configured for maximum
> verbosity? Did you check all syslog files, dmesg etc?

I don't know the kernel but syslog-ng.conf
only includes one filter: filter f_notice {not level (info); };

> Did the app always behave like this? Didn't you change platforms
> recently or so? Did you go multi-threaded recently?

No, but I added another thread to the app that's responsible for
sending off the log lines rather than just using syslog-ng with a
defined remote target because we wanna verify that the logserver is
sending a layer 7 acknowledge back..