From: David Rientjes on
On Tue, 30 Mar 2010, Oleg Nesterov wrote:

> ->siglock is no longer needed to access task->signal, change
> oom_adjust_read() and oom_adjust_write() to read/write oom_adj
> lockless.
>
> Yes, this means that "echo 2 >oom_adj" and "echo 1 >oom_adj"
> can race and the second write can win, but I hope this is OK.
>

Ok, but could you base this on -mm at
http://userweb.kernel.org/~akpm/mmotm/ since an additional tunable has
been added (oom_score_adj), which does the same thing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 03/30, David Rientjes wrote:
>
> On Tue, 30 Mar 2010, Oleg Nesterov wrote:
>
> > ->siglock is no longer needed to access task->signal, change
> > oom_adjust_read() and oom_adjust_write() to read/write oom_adj
> > lockless.
> >
> > Yes, this means that "echo 2 >oom_adj" and "echo 1 >oom_adj"
> > can race and the second write can win, but I hope this is OK.
> >
>
> Ok, but could you base this on -mm at
> http://userweb.kernel.org/~akpm/mmotm/ since an additional tunable has
> been added (oom_score_adj), which does the same thing?

Ah, OK, will do.

Thanks David.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 03/30, David Rientjes wrote:
>
> On Tue, 30 Mar 2010, Oleg Nesterov wrote:
>
> > ->siglock is no longer needed to access task->signal, change
> > oom_adjust_read() and oom_adjust_write() to read/write oom_adj
> > lockless.
> >
> > Yes, this means that "echo 2 >oom_adj" and "echo 1 >oom_adj"
> > can race and the second write can win, but I hope this is OK.
>
> Ok, but could you base this on -mm at
> http://userweb.kernel.org/~akpm/mmotm/ since an additional tunable has
> been added (oom_score_adj), which does the same thing?

David, I just can't understand why
oom-badness-heuristic-rewrite.patch
duplicates the related code in fs/proc/base.c and why it preserves
the deprecated signal->oom_adj.

OK. Please forget about lock_task_sighand/signal issues. Can't we kill
signal->oom_adj and create a single helper for both
/proc/pid/{oom_adj,oom_score_adj} ?

static ssize_t oom_any_adj_write(struct file *file, const char __user *buf,
size_t count, bool deprecated_mode)
{
struct task_struct *task;
char buffer[PROC_NUMBUF];
unsigned long flags;
long oom_score_adj;
int err;

memset(buffer, 0, sizeof(buffer));
if (count > sizeof(buffer) - 1)
count = sizeof(buffer) - 1;
if (copy_from_user(buffer, buf, count))
return -EFAULT;

err = strict_strtol(strstrip(buffer), 0, &oom_score_adj);
if (err)
return -EINVAL;

if (depraceted_mode) {
if (oom_score_adj == OOM_ADJUST_MAX)
oom_score_adj = OOM_SCORE_ADJ_MAX;
else
oom_score_adj = (oom_score_adj * OOM_SCORE_ADJ_MAX) /
-OOM_DISABLE;
}

if (oom_score_adj < OOM_SCORE_ADJ_MIN ||
oom_score_adj > OOM_SCORE_ADJ_MAX)
return -EINVAL;

task = get_proc_task(file->f_path.dentry->d_inode);
if (!task)
return -ESRCH;
if (!lock_task_sighand(task, &flags)) {
put_task_struct(task);
return -ESRCH;
}
if (oom_score_adj < task->signal->oom_score_adj &&
!capable(CAP_SYS_RESOURCE)) {
unlock_task_sighand(task, &flags);
put_task_struct(task);
return -EACCES;
}

task->signal->oom_score_adj = oom_score_adj;

unlock_task_sighand(task, &flags);
put_task_struct(task);
return count;
}

This is just the current oom_score_adj_read() + "if (depraceted_mode)"
which does oom_adj -> oom_score_adj conversion.

Now,

static ssize_t oom_adjust_write(...)
{
printk_once(KERN_WARNING "... deprecated ...\n");

return oom_any_adj_write(..., true);
}

static ssize_t oom_score_adj_write(...)
{
return oom_any_adj_write(..., false);
}

The same for oom_xxx_read().

What is the point to keep signal->oom_adj ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on
On Wed, 31 Mar 2010, Oleg Nesterov wrote:

> David, I just can't understand why
> oom-badness-heuristic-rewrite.patch
> duplicates the related code in fs/proc/base.c and why it preserves
> the deprecated signal->oom_adj.
>

You could combine the two write functions together and then two read
functions together if you'd like.

> OK. Please forget about lock_task_sighand/signal issues. Can't we kill
> signal->oom_adj and create a single helper for both
> /proc/pid/{oom_adj,oom_score_adj} ?
>
> static ssize_t oom_any_adj_write(struct file *file, const char __user *buf,
> size_t count, bool deprecated_mode)
> {
> struct task_struct *task;
> char buffer[PROC_NUMBUF];
> unsigned long flags;
> long oom_score_adj;
> int err;
>
> memset(buffer, 0, sizeof(buffer));
> if (count > sizeof(buffer) - 1)
> count = sizeof(buffer) - 1;
> if (copy_from_user(buffer, buf, count))
> return -EFAULT;
>
> err = strict_strtol(strstrip(buffer), 0, &oom_score_adj);
> if (err)
> return -EINVAL;
>
> if (depraceted_mode) {
> if (oom_score_adj == OOM_ADJUST_MAX)
> oom_score_adj = OOM_SCORE_ADJ_MAX;

???

> else
> oom_score_adj = (oom_score_adj * OOM_SCORE_ADJ_MAX) /
> -OOM_DISABLE;
> }
>
> if (oom_score_adj < OOM_SCORE_ADJ_MIN ||
> oom_score_adj > OOM_SCORE_ADJ_MAX)

That doesn't work for depraceted_mode (sic), you'd need to test for
OOM_ADJUST_MIN and OOM_ADJUST_MAX in that case.

> return -EINVAL;
>
> task = get_proc_task(file->f_path.dentry->d_inode);
> if (!task)
> return -ESRCH;
> if (!lock_task_sighand(task, &flags)) {
> put_task_struct(task);
> return -ESRCH;
> }
> if (oom_score_adj < task->signal->oom_score_adj &&
> !capable(CAP_SYS_RESOURCE)) {
> unlock_task_sighand(task, &flags);
> put_task_struct(task);
> return -EACCES;
> }
>
> task->signal->oom_score_adj = oom_score_adj;
>
> unlock_task_sighand(task, &flags);
> put_task_struct(task);
> return count;
> }
>

There have been efforts to reuse as much of this code as possible for
other sysctl handlers as well, you might be better off looking for other
users of the common read and write code and then merging them first
(comm_write, proc_coredump_filter_write, etc).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 03/31, David Rientjes wrote:
>
> On Wed, 31 Mar 2010, Oleg Nesterov wrote:
>
> > David, I just can't understand why
> > oom-badness-heuristic-rewrite.patch
> > duplicates the related code in fs/proc/base.c and why it preserves
> > the deprecated signal->oom_adj.
>
> You could combine the two write functions together and then two read
> functions together if you'd like.

Yes,

> > static ssize_t oom_any_adj_write(struct file *file, const char __user *buf,
> > size_t count, bool deprecated_mode)
> > {
> >
> > if (depraceted_mode) {
> > if (oom_score_adj == OOM_ADJUST_MAX)
> > oom_score_adj = OOM_SCORE_ADJ_MAX;
>
> ???

What?

> > else
> > oom_score_adj = (oom_score_adj * OOM_SCORE_ADJ_MAX) /
> > -OOM_DISABLE;
> > }
> >
> > if (oom_score_adj < OOM_SCORE_ADJ_MIN ||
> > oom_score_adj > OOM_SCORE_ADJ_MAX)
>
> That doesn't work for depraceted_mode (sic), you'd need to test for
> OOM_ADJUST_MIN and OOM_ADJUST_MAX in that case.

Yes, probably "if (depraceted_mode)" should do more checks, I didn't try
to verify that MIN/MAX are correctly converted. I showed this code to explain
what I mean.

> There have been efforts to reuse as much of this code as possible for
> other sysctl handlers as well, you might be better off looking for

David, sorry ;) Right now I'd better try to stop the overloading of
->siglock. And, I'd like to shrink struct_signal if possible, but this
is minor.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/