fs: limit maximum concurrent coredumps [Kernel]

Prev: [PATCH 2/5] isdn/gigaset: correct CAPI voice connection encoding
Next: [PATCH net-next 2/2] cxgb3: request 7.10 firmware

From: KAMEZAWA Hiroyuki on 22 Jun 2010 05:00

On Mon, 21 Jun 2010 18:41:16 -0700
Andrew Morton <akpm(a)linux-foundation.org> wrote:

> On Mon, 21 Jun 2010 18:23:03 -0700 (PDT) Roland McGrath <roland(a)redhat.com> wrote:

> > That won't make your crashers each complete quickly, but it will prevent
> > the thrashing. Instead of some crashers suddenly not producing dumps at
> > all, they'll just all queue up waiting to finish crashing but not using any
> > CPU or IO resources. That way you don't lose any core dumps unless you
> > want to start SIGKILL'ing things (which oom_kill might do if need be),
> > you just don't die in flames trying to do nothing but dump cores.
>
> A global knob is a bit old-school. Perhaps it should be a per-memcg
> knob or something.
>

Hmm, in my desktop, it seems coredump in a group is charged against
root cgroup. (not against the group it belongs to.)
This seems strange.....I've chased why...for 2 hours. I noticed

==
[root(a)bluextal kamezawa]# cat /proc/sys/kernel/core_pattern
|/usr/libexec/abrt-hook-ccpp /var/cache/abrt %p %s %u %c
==
This is fedora-12.

Then, for recent distros, doing "coredump" with some limited resource may
be a job of abrt program. It can make use of I/O cgroup + direct I/O.

If a kernel help is necesary, this helper function should work in
the caller's cgroup, maybe.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Roland McGrath on 22 Jun 2010 05:00

> But priority settings don't apply any more for core dumping process, do
> they?

Why wouldn't they?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Oleg Nesterov on 23 Jun 2010 12:10

On 06/21, Edward Allcutt wrote:
>
> The ability to limit concurrent coredumps allows dumping core to be safely
> enabled in these situations without affecting responsiveness of the system
> as a whole.

OK, but please note that the patch is not right,

> @@ -1844,6 +1845,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> int retval = 0;
> int flag = 0;
> int ispipe;
> + int dump_count = 0;
> static atomic_t core_dump_count = ATOMIC_INIT(0);
> struct coredump_params cprm = {
> .signr = signr,
> @@ -1865,6 +1867,14 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> if (!__get_dumpable(cprm.mm_flags))
> goto fail;
>
> + dump_count = atomic_inc_return(&core_dump_count);
> + if (core_max_concurrency && (core_max_concurrency < dump_count)) {
> + printk(KERN_WARNING "Pid %d(%s) over core_max_concurrency\n",
> + task_tgid_vnr(current), current->comm);
> + printk(KERN_WARNING "Skipping core dump\n");
> + goto fail;
> + }
> +

We can't return here. We should kill other threads which share the same
->mm in any case.

Suppose that core_dump_count > core_max_concurrency, and we send, say,
SIGQUIT to the process. With this patch SIGQUIT suddenly starts to kill
the single thread, this must not happen.

If you change the patch to sleep until core_dump_count < core_max_concurrency,
then, again, we should kill other threads first.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Oleg Nesterov on 23 Jun 2010 12:10

On 06/23, Oleg Nesterov wrote:
>
> On 06/21, Edward Allcutt wrote:
> >
> > The ability to limit concurrent coredumps allows dumping core to be safely
> > enabled in these situations without affecting responsiveness of the system
> > as a whole.
>
> OK, but please note that the patch is not right,

OOPS, sorry, I was not exactly right too.

> > @@ -1844,6 +1845,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> > int retval = 0;
> > int flag = 0;
> > int ispipe;
> > + int dump_count = 0;
> > static atomic_t core_dump_count = ATOMIC_INIT(0);
> > struct coredump_params cprm = {
> > .signr = signr,
> > @@ -1865,6 +1867,14 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> > if (!__get_dumpable(cprm.mm_flags))
> > goto fail;
> >
> > + dump_count = atomic_inc_return(&core_dump_count);
> > + if (core_max_concurrency && (core_max_concurrency < dump_count)) {
> > + printk(KERN_WARNING "Pid %d(%s) over core_max_concurrency\n",
> > + task_tgid_vnr(current), current->comm);
> > + printk(KERN_WARNING "Skipping core dump\n");
> > + goto fail;
> > + }
> > +
>
> We can't return here. We should kill other threads which share the same
> ->mm in any case.
>
> Suppose that core_dump_count > core_max_concurrency, and we send, say,
> SIGQUIT to the process. With this patch SIGQUIT suddenly starts to kill
> the single thread, this must not happen.

well, the caller does do_group_exit() after do_coredump(), this kills
sub-threads.

However, this doesn't kill other CLONE_VM tasks. Perhaps this is fine,
but I am not sure.

> If you change the patch to sleep until core_dump_count < core_max_concurrency,
> then, again, we should kill other threads first.

Yes, this is true. If we are going to sleep, we shouldn't allow other
threads to run.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: [PATCH 2/5] isdn/gigaset: correct CAPI voice connection encoding
Next: [PATCH net-next 2/2] cxgb3: request 7.10 firmware