From: Oleg Nesterov on
On 06/16, Louis Rilling wrote:
>
> Detached tasks are not seen by zap_pid_ns_processes()->sys_wait4(), so
> that release_task()->proc_flush_task() of container init can be called
> before it is for some detached tasks in the namespace.
>
> Pin proc_mnt's in copy_process(), so that proc_flush_task() becomes safe
> whatever the ordering of tasks.

I must have missed something, but can't we just move mntput() ?

Oleg.

--- x/kernel/pid_namespace.c
+++ x/kernel/pid_namespace.c
@@ -110,6 +110,9 @@ static void destroy_pid_namespace(struct
{
int i;

+ if (ns->proc_mount)
+ mntput(ns->proc_mount);
+
for (i = 0; i < PIDMAP_ENTRIES; i++)
kfree(ns->pidmap[i].page);
kmem_cache_free(pid_ns_cachep, ns);
--- x/fs/proc/base.c
+++ x/fs/proc/base.c
@@ -2745,10 +2745,6 @@ void proc_flush_task(struct task_struct
proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
tgid->numbers[i].nr);
}
-
- upid = &pid->numbers[pid->level];
- if (upid->nr == 1)
- pid_ns_release_proc(upid->ns);
}

static struct dentry *proc_pid_instantiate(struct inode *dir,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 06/17, Eric W. Biederman wrote:
>
> The task->children isn't changed until __unhash_process() which runs
> after flush_proc_task().

Yes. But this is only the current implementation detail.
It would be nice to cleanup the code so that EXIT_DEAD tasks are
never sit in ->children list.

> So we should be able to come up with
> a variant of do_wait() that zap_pid_ns_processes can use that does
> what we need.

See above...

Even if we modify do_wait() or add the new variant, how the caller
can wait for EXIT_DEAD tasks? I don't think we want to modify
release_task() to do __wake_up_parent() or something similar.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Louis Rilling on
On 17/06/10 23:20 +0200, Oleg Nesterov wrote:
> On 06/16, Louis Rilling wrote:
> >
> > Detached tasks are not seen by zap_pid_ns_processes()->sys_wait4(), so
> > that release_task()->proc_flush_task() of container init can be called
> > before it is for some detached tasks in the namespace.
> >
> > Pin proc_mnt's in copy_process(), so that proc_flush_task() becomes safe
> > whatever the ordering of tasks.
>
> I must have missed something, but can't we just move mntput() ?

See the log of the commit introducing pid_ns_release_proc() (6f4e6433):

Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
superblock holds the namespace we must explicitly break this circle to destroy
all the stuff. This is done after the init of the namespace dies. Running a
few steps forward - when init exits it will kill all its children, so no
proc_mnt will be needed after its death.

Thanks,

Louis

>
> Oleg.
>
> --- x/kernel/pid_namespace.c
> +++ x/kernel/pid_namespace.c
> @@ -110,6 +110,9 @@ static void destroy_pid_namespace(struct
> {
> int i;
>
> + if (ns->proc_mount)
> + mntput(ns->proc_mount);
> +
> for (i = 0; i < PIDMAP_ENTRIES; i++)
> kfree(ns->pidmap[i].page);
> kmem_cache_free(pid_ns_cachep, ns);
> --- x/fs/proc/base.c
> +++ x/fs/proc/base.c
> @@ -2745,10 +2745,6 @@ void proc_flush_task(struct task_struct
> proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
> tgid->numbers[i].nr);
> }
> -
> - upid = &pid->numbers[pid->level];
> - if (upid->nr == 1)
> - pid_ns_release_proc(upid->ns);
> }
>
> static struct dentry *proc_pid_instantiate(struct inode *dir,
>

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
From: Louis Rilling on
On 17/06/10 23:36 +0200, Oleg Nesterov wrote:
> On 06/17, Eric W. Biederman wrote:
> >
> > The task->children isn't changed until __unhash_process() which runs
> > after flush_proc_task().
>
> Yes. But this is only the current implementation detail.
> It would be nice to cleanup the code so that EXIT_DEAD tasks are
> never sit in ->children list.
>
> > So we should be able to come up with
> > a variant of do_wait() that zap_pid_ns_processes can use that does
> > what we need.
>
> See above...
>
> Even if we modify do_wait() or add the new variant, how the caller
> can wait for EXIT_DEAD tasks? I don't think we want to modify
> release_task() to do __wake_up_parent() or something similar.

Indeed, I was thinking about calling __wake_up_parent() from release_task()
once parent->children becomes empty.

Not sure about the performance impact though. Maybe some WAIT_NO_CHILDREN flag
in parent->signal could limit it. But if EXIT_DEAD children are removed from
->children before release_task(), I'm afraid that this becomes impossible.

Thanks,

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
From: Oleg Nesterov on
On 06/18, Louis Rilling wrote:
>
> On 17/06/10 23:20 +0200, Oleg Nesterov wrote:
> > On 06/16, Louis Rilling wrote:
> > >
> > > Detached tasks are not seen by zap_pid_ns_processes()->sys_wait4(), so
> > > that release_task()->proc_flush_task() of container init can be called
> > > before it is for some detached tasks in the namespace.
> > >
> > > Pin proc_mnt's in copy_process(), so that proc_flush_task() becomes safe
> > > whatever the ordering of tasks.
> >
> > I must have missed something, but can't we just move mntput() ?
>
> See the log of the commit introducing pid_ns_release_proc() (6f4e6433):
>
> Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
> superblock holds the namespace we must explicitly break this circle to destroy
> all the stuff. This is done after the init of the namespace dies.

I see thanks.

Besides, put_pid_ns() can be called in any context...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5
Prev: [RFC][PATCH 0/8] perf pmu interface
Next: $100,000 OFFER