From: Peter Zijlstra on
On Thu, 2010-07-01 at 14:34 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> >
> > The patch using this is here:
> > http://www.mail-archive.com/kvm(a)vger.kernel.org/msg35411.html
> >
> > It simply copies the affinity from the parent when thread is created.
>
> Sounds like policy, not something the kernel should do..

The alternative would be using clone() instead of thread_create() and
inherit everything from the creating task. Inheriting from kthreadd and
then undoing some aspects just sounds like daft policy that really ought
to be in userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Thu, Jul 01, 2010 at 02:32:43PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:
>
> > > - why can't it set the kernel thread's affinity too?
> >
> > It can. However: the threads are started internally by the driver
> > when qemu does an ioctl. What we want to do is give it a sensible
> > default affinity. management tool can later tweak it if it wants to.
>
> So have that ioctl return the tid of that new fancy thread and then set
> its affinity, stuff it in cgroup, whatever you fancy.
>
> > > - what happens if someone changes the tasks' affinity?
> >
> > We would normally create a cgroup including all internal
> > tasks, making it easy to find and change affinity for
> > them all if necessary.
>
> And to stuff them in a cgroup you also need the tid, at which point it
> might as well set the affinity from userspace, right?

We also put it in a cgroup transparently. I think that it's actually
important to do it on thread creation: if we don't, malicious userspace
can create large amount of work exceeding the cgroup limits.

And the same applies so the affinity, right? If the qemu process
is limited to a set of CPUs, isn't it important to make
the kernel thread that does work our behalf limited to the same
set of CPUs?

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-07-01 at 15:50 +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 01, 2010 at 02:32:43PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:
> >
> > > > - why can't it set the kernel thread's affinity too?
> > >
> > > It can. However: the threads are started internally by the driver
> > > when qemu does an ioctl. What we want to do is give it a sensible
> > > default affinity. management tool can later tweak it if it wants to.
> >
> > So have that ioctl return the tid of that new fancy thread and then set
> > its affinity, stuff it in cgroup, whatever you fancy.
> >
> > > > - what happens if someone changes the tasks' affinity?
> > >
> > > We would normally create a cgroup including all internal
> > > tasks, making it easy to find and change affinity for
> > > them all if necessary.
> >
> > And to stuff them in a cgroup you also need the tid, at which point it
> > might as well set the affinity from userspace, right?
>
> We also put it in a cgroup transparently. I think that it's actually
> important to do it on thread creation: if we don't, malicious userspace
> can create large amount of work exceeding the cgroup limits.
>
> And the same applies so the affinity, right? If the qemu process
> is limited to a set of CPUs, isn't it important to make
> the kernel thread that does work our behalf limited to the same
> set of CPUs?

I'm not sure we have anything like this, but I wouldn't think so, if a
driver creates a kthread and manages to inject tons of work its not
typically limited to whatever limits apply to the task that supplied the
work.

Take the encryption threads for example, those don't run in the context
of whoever provides the data to be encrypted (file,net whatever) and
thus the task responsible could consume heaps more resources than when
it would have to do the encryption itself.

That's how kthreads work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Thu, Jul 01, 2010 at 02:46:35PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 14:34 +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> > >
> > > The patch using this is here:
> > > http://www.mail-archive.com/kvm(a)vger.kernel.org/msg35411.html
> > >
> > > It simply copies the affinity from the parent when thread is created.
> >
> > Sounds like policy, not something the kernel should do..
>
> The alternative would be using clone() instead of thread_create() and
> inherit everything from the creating task.
> Inheriting from kthreadd and then undoing some aspects just sounds
> like daft policy that really ought to be in userspace.

Yes, that's basically what this patchset is trying to do:
create a workqueue inheriting everything from the creating task.
Sridhar started with an API to do exactly this:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-05/msg07478.html

Then we switched to raw kthread to avoid stepping on cwq toes.
Maybe it makes sense to add kthread_clone (in addition to
kthread_create) that would do what you suggest?
If yes, any hints on an implementation?


--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Thu, Jul 01, 2010 at 03:07:26PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 15:50 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 01, 2010 at 02:32:43PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:
> > >
> > > > > - why can't it set the kernel thread's affinity too?
> > > >
> > > > It can. However: the threads are started internally by the driver
> > > > when qemu does an ioctl. What we want to do is give it a sensible
> > > > default affinity. management tool can later tweak it if it wants to.
> > >
> > > So have that ioctl return the tid of that new fancy thread and then set
> > > its affinity, stuff it in cgroup, whatever you fancy.
> > >
> > > > > - what happens if someone changes the tasks' affinity?
> > > >
> > > > We would normally create a cgroup including all internal
> > > > tasks, making it easy to find and change affinity for
> > > > them all if necessary.
> > >
> > > And to stuff them in a cgroup you also need the tid, at which point it
> > > might as well set the affinity from userspace, right?
> >
> > We also put it in a cgroup transparently. I think that it's actually
> > important to do it on thread creation: if we don't, malicious userspace
> > can create large amount of work exceeding the cgroup limits.
> >
> > And the same applies so the affinity, right? If the qemu process
> > is limited to a set of CPUs, isn't it important to make
> > the kernel thread that does work our behalf limited to the same
> > set of CPUs?
>
> I'm not sure we have anything like this, but I wouldn't think so, if a
> driver creates a kthread and manages to inject tons of work its not
> typically limited to whatever limits apply to the task that supplied the
> work.
>
> Take the encryption threads for example, those don't run in the context
> of whoever provides the data to be encrypted (file,net whatever) and
> thus the task responsible could consume heaps more resources than when
> it would have to do the encryption itself.
>
> That's how kthreads work.

Right. And IMHO ideally all such work would run on the appropriate
CPU and be accounted to. It's just that with virt people seem to
run untrusted applications and expect the damage to be contained.
So we came up with a simple approach that seems to do the
just just for us.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/