From: Sukadev Bhattiprolu on
Oleg Nesterov [oleg(a)redhat.com] wrote:
| This is mostly cleanup and optimization, but also fixes the bug.
|
| proc_flush_task() checks upid->nr == 1 to detect the case when
| a sub-namespace exits. However, this doesn't work in case when
| a multithreaded init execs and calls release_task(old_leader),
| the old leader has the same pid 1.
|
| Move pid_ns_release_proc() to zap_pid_ns_processes(), it is called
| when we know for sure that init is exiting.

Hmm, I almost agreed, but have a question :-)

Yes, we know that the container-init is exiting. But if its parent (in
the parent ns) waits on it and calls release_task(), won't we call
proc_flush_task_mnt() on this container-init ? This would happen after
dropping the mnt in zap_pid_ns_processes() no ?

At the time zap_pid_ns_processes() is called, the container-init is still
not in EXIT_ZOMBIE state right ? (Or does your statement below include
EXIT_DEAD and EXIT_ZOMBIE tasks ?)

|
| Note: with or without this change this mntput() can happen before the
| EXIT_DEAD tasks not visible to do_wait() have passed proc_flush_task().
| We need more fixes.
|

Sukadev
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 06/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg(a)redhat.com> writes:
>
> > This is mostly cleanup and optimization, but also fixes the bug.
>
> Oleg with respect to your other patches I think they are some of
> the best ones we have on the table.
>
> > proc_flush_task() checks upid->nr == 1 to detect the case when
> > a sub-namespace exits. However, this doesn't work in case when
> > a multithreaded init execs and calls release_task(old_leader),
> > the old leader has the same pid 1.
> >
> > Move pid_ns_release_proc() to zap_pid_ns_processes(), it is called
> > when we know for sure that init is exiting.
>
> This actually guarantees a use after free for the namespace init:

Yes, thanks. I am stupid.

Please ignore the patch.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 06/23, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [oleg(a)redhat.com] wrote:
> | This is mostly cleanup and optimization, but also fixes the bug.
> |
> | proc_flush_task() checks upid->nr == 1 to detect the case when
> | a sub-namespace exits. However, this doesn't work in case when
> | a multithreaded init execs and calls release_task(old_leader),
> | the old leader has the same pid 1.
> |
> | Move pid_ns_release_proc() to zap_pid_ns_processes(), it is called
> | when we know for sure that init is exiting.
>
> Hmm, I almost agreed, but have a question :-)
>
> Yes, we know that the container-init is exiting. But if its parent (in
> the parent ns) waits on it and calls release_task(), won't we call
> proc_flush_task_mnt() on this container-init ? This would happen after
> dropping the mnt in zap_pid_ns_processes() no ?

Indeed. Thanks!

Somehow I forgot that init itself has not passed proc_flush_task().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/