From: Scott Lurndal on
"guenther(a)gmail.com" <guenther(a)gmail.com> writes:
>On Jan 11, 11:28=A0am, cerr <ron.egg...(a)gmail.com> wrote:
>> My application sometimes randomly receives a SIGKILL signal, gdb would
>> say something like:
>> Program terminated with signal SIGKILL, Killed.
>> The program no longer exists.
>> (gdb)
>> And i have no clue why? When does the system send a SIGKILL? There is
>> no 3rd application send anything to mine....
>
>There are situations under which the kernel will send SIGKILL to a
>process. Others have mentioned the Linux OOM killer; a more rarely
>seen one is if you have a CPU-time resource hard limit set (such as
>via the ulimit shell-builtin) then the kernel will send the process a
>SIGKILL when the limit is reached.

I think the cpu hard limit sends SIGXCPU, not SIGKILL.

scott
From: Scott Lurndal on
John Gordon <gordon(a)panix.com> writes:
>In <slrnhkpd4v.nvv.apoelstra(a)localhost.localdomain> Andrew Poelstra <apoelstra(a)localhost.localdomain> writes:
>
>> If you try to write to memory you don't own, and that memory happens
>> to be in the stack
>
>But if the memory contains stack information, then by definition you
>*do* own it, right? Because it was allocated by your process.
>
>If you are sufficiently unlucky, you could be corrupting memory in such
>a way that your stack gets hosed but you don't see the SEGV until later.
>

A run-away loop initializing an array declared as an automatic (stack)
variable will, eventually hit the stack resource limit and cause a SIGSEGV;

if a handler is established and no sigaltstack is present, then the stack
will not be valid and GDB will not be able to generate a backtrace and
the handler will not be called (this might result in SIGKILL on some systems).

If same array is accessed with negative indices (for example if a signed
index overflows), then the stack frame itself could also become corrupt.
From: cerr on
On Jan 12, 8:36 am, John Gordon <gor...(a)panix.com> wrote:
> In <61358e14-656c-4ae6-b57c-5723dd1d2...(a)j5g2000yqm.googlegroups.com> cerr <ron.egg...(a)gmail.com> writes:
>
> > I'm root and I also - just to make sure no unknown external process
> > would send anything to my device - disconnected the network cable -
> > same behaviour -> SIGKILL
>
> How long does the application run before it gets killed?
There's no fix time base. siometimes after a few seconds, other times
after a few minutes or even like a half an hour...it's always
different...
>
> --
> John Gordon                   A is for Amy, who fell down the stairs
> gor...(a)panix.com              B is for Basil, assaulted by bears
>                                 -- Edward Gorey, "The Gashlycrumb Tinies"

From: Lew Pitcher on
On January 12, 2010 11:42, in comp.unix.programmer, ron.eggler(a)gmail.com
wrote:

> On Jan 11, 5:16 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote:
>> cerr <ron.egg...(a)gmail.com> writes:
>> >On Jan 11, 3:09=A0pm, Ben Finney <ben+u...(a)benfinney.id.au> wrote:
>> >The memory seems to be fine with "free" but I just caught a seg fault
>> >now:
>> >Program received signal SIGSEGV, Segmentation fault.
>> >0x0804cb4c in ?? ()
>> >Then I tried to get a backtrace with bt but i only got:
>> >(gdb) bt
>> >#0  0x0804cb4c in ?? ()
>> >how come? I am in the source directory and i disd compile the binary
>> >with -ggdb3 ... any clues?
>>
>> If your SEGV resulted in corrupting the stack, GDB will not be able to
>> produce a stack traceback.
>>
>> scott
>
> Mh, why would a segfault happen on the stack unless I declare a
> pointer and explicitly free() it before i try to access it? Variables
> on the stack get discarded at the end of the function and then they're
> out of scope anyways, eh?
> Not sure if i'm missing something here...

If you overwrite an automatic variable with a longer value than it can hold
(typically done with arrays of some sort), you can corrupt memory so as to
not only destroy the automatic variable (and other automatics), but the
function's activation record (all typically held on a stack of some sort),
and get a SIGSEGV when the function terminates (or perhaps SIGFPE, if you
perform math on automatic floatingpoint variables).

Witness the effect of this code....

/*********** code begins ***************/

/*
cause SIGSEGV
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char buffer[2];
int count;

strcpy(buffer,"this might clobber some stack values");

for (count = 0; count < 10; ++count)
printf("Count = %d\n",count);

return EXIT_SUCCESS;
}

/*********** code ends ***************/

$ cc -o segv segv.c
$ segv
Count = 0
Count = 1
Count = 2
Count = 3
Count = 4
Count = 5
Count = 6
Count = 7
Count = 8
Count = 9
Segmentation fault
$

The strcpy() clobbered memory, both the storage following buffer[] /and/ the
activation record for main(). The SIGSEGV only occurred when the main()
function terminated with a return, (logically) long after the corruption
occurred.

HTH
--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------


From: cerr on
On Jan 11, 5:43 pm, William Ahern <will...(a)wilbur.25thandClement.com>
wrote:
> John Gordon <gor...(a)panix.com> wrote:
> > In <729a2645-1597-4941-acff-e6aad7db8...(a)p24g2000yqm.googlegroups.com> cerr <ron.egg...(a)gmail.com> writes:
>
> > > My application sometimes randomly receives a SIGKILL signal, gdb would
> > > say something like:
> > > Program terminated with signal SIGKILL, Killed.
> > > The program no longer exists.
> > > (gdb)
> > > And i have no clue why? When does the system send a SIGKILL? There is
> > > no 3rd application send anything to mine....
>
> > Have you asked the system administrator if they're sending the kill signal?
>
> My thoughts, too. Some systems will kill processes owned by a user not
> currently logged into the system. Or perhaps user processes not attached to
> a TTY. Whatever the condition, I've been on systems that effectively
> prevented ordinary users from running daemons for any signficant length of
> time.

Nah, I'm root and there's no one else on this system.... :(