CPU isolation extensions [Kernel]

Prev: 2.6.22-stable causes oomkiller to be invoked
Next: bcm203x bluetooth dongle does not work

From: Paul Jackson on 28 Jan 2008 14:20

Max wrote:
> Also "CPU sets" seem to mostly deal with the scheduler domains.

True - though "cpusets" (no space ;) sched_load_balance flag can
be used to see that some CPUs are not in any scheduler domain,
which is equivalent to not having the scheduler run on them.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj(a)sgi.com> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 28 Jan 2008 15:30

On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:
>
> On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
> > >> [PATCH] [CPUISOL] Support for workqueue isolation
> > >
> > > The thing about workqueues is that they should only be woken on a CPU if
> > > something on that CPU accessed them. IOW, the workqueue on a CPU handles
> > > work that was called by something on that CPU. Which means that
> > > something that high prio task did triggered a workqueue to do some work.
> > > But this can also be triggered by interrupts, so by keeping interrupts
> > > off the CPU no workqueue should be activated.
>
> > No no no. That's what I though too ;-). The problem is that things like NFS and friends
> > expect _all_ their workqueue threads to report back when they do certain things like
> > flushing buffers and stuff. The reason I added this is because my machines were getting
> > stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though no IRQs
> > or other things are running on it.
>
> This sounds more like we should fix NFS than add this for all workqueues.
> Again, we want workqueues to run on the behalf of whatever is running on
> that CPU, including those tasks that are running on an isolcpu.

agreed, by looking at my top output (and not the nfs code) it looks like
it just spawns a configurable number of active kernel threads which are
not cpu bound by in any way. I think just removing the isolated cpus
from their runnable mask should take care of them.

>
> >
> > >> [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
> > >
> > > This I find very dangerous. We are making an assumption that tasks on an
> > > isolated CPU wont be doing things that stopmachine requires. What stops
> > > a task on an isolated CPU from calling something into the kernel that
> > > stop_machine requires to halt?
>
> > I agree in general. The thing is though that stop machine just kills any kind of latency
> > guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
> > when module is inserted/removed. And running without dynamic module loading is not very
> > practical on general purpose machines. So I'd rather have an option with a big red warning
> > than no option at all :).
>
> Well, that's something one of the greater powers (Linus, Andrew, Ingo)
> must decide. ;-)

I'm in favour of better engineered method, that is, we really should try
to solve these problems in a proper way. Hacks like this might be fine
for custom kernels, but I think we should have a higher standard when it
comes to upstream - we all have to live many years with whatever we put
in there, we'd better think well about it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Max Krasnyanskiy on 28 Jan 2008 16:50

Peter Zijlstra wrote:
> On Mon, 2008-01-28 at 14:00 -0500, Steven Rostedt wrote:
>> On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
>>>>> [PATCH] [CPUISOL] Support for workqueue isolation
>>>> The thing about workqueues is that they should only be woken on a CPU if
>>>> something on that CPU accessed them. IOW, the workqueue on a CPU handles
>>>> work that was called by something on that CPU. Which means that
>>>> something that high prio task did triggered a workqueue to do some work.
>>>> But this can also be triggered by interrupts, so by keeping interrupts
>>>> off the CPU no workqueue should be activated.
>>> No no no. That's what I though too ;-). The problem is that things like NFS and friends
>>> expect _all_ their workqueue threads to report back when they do certain things like
>>> flushing buffers and stuff. The reason I added this is because my machines were getting
>>> stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though no IRQs
>>> or other things are running on it.
>> This sounds more like we should fix NFS than add this for all workqueues.
>> Again, we want workqueues to run on the behalf of whatever is running on
>> that CPU, including those tasks that are running on an isolcpu.
>
> agreed, by looking at my top output (and not the nfs code) it looks like
> it just spawns a configurable number of active kernel threads which are
> not cpu bound by in any way. I think just removing the isolated cpus
> from their runnable mask should take care of them.

Actually NFS was just one example. I cannot remember of a top of my head what else was there
but there are definitely other users of work queues that expect all the threads to run at
some point in time.
Also if you think about it. The patch does _exactly_ what you propose. It removes workqueue
threads from isolated CPUs. But instead of doing just for NFS and/or other subsystems
separately it just does it in a generic way by simply not starting those threads in first
place.

>>>>> [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
>>>> This I find very dangerous. We are making an assumption that tasks on an
>>>> isolated CPU wont be doing things that stopmachine requires. What stops
>>>> a task on an isolated CPU from calling something into the kernel that
>>>> stop_machine requires to halt?
>>> I agree in general. The thing is though that stop machine just kills any kind of latency
>>> guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
>>> when module is inserted/removed. And running without dynamic module loading is not very
>>> practical on general purpose machines. So I'd rather have an option with a big red warning
>>> than no option at all :).
>> Well, that's something one of the greater powers (Linus, Andrew, Ingo)
>> must decide. ;-)
>
> I'm in favour of better engineered method, that is, we really should try
> to solve these problems in a proper way. Hacks like this might be fine
> for custom kernels, but I think we should have a higher standard when it
> comes to upstream - we all have to live many years with whatever we put
> in there, we'd better think well about it.

100% agree. That's why I said mentioned that this patches is controversial in the first place.
Right now those short from rewriting module loading to not use stop machine there is no other
option. I'll think some more about it. If you guys have other ideas please drop me a note.

Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Max Krasnyanskiy on 28 Jan 2008 16:50

Paul Jackson wrote:
> Max wrote:
>> So far it seems that extending cpu_isolated_map
>> is more natural way of propagating this notion to the rest of the kernel.
>> Since it's very similar to the cpu_online_map concept and it's easy to integrated
>> with the code that already uses it.
>
> If it were just realtime support, then I suspect I'd agree that
> extending cpu_isolated_map makes more sense.
>
> But some people use realtime on systems that are also heavily
> managed using cpusets. The two have to work together. I have
> customers with systems running realtime on a few CPUs, at the
> same time that they have a large batch scheduler (which is layered
> on top of cpusets) managing jobs on a few hundred other CPUs.
> Hence with the cpuset 'sched_load_balance' flag I think I've already
> done what I think is one part of what your patches achieve by extending
> the cpu_isolated_map.
>
> This is a common situation with "resource management" mechanisms such
> as cpusets (and more recently cgroups and the subsystem modules it
> supports.) They cut across existing core kernel code that manages such
> key resources as CPUs and memory. As best we can, they have to work
> with each other.

Thanks for the info Paul. I'll definitely look into using this flag instead
and reply with pros and cons (if any).

Max

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Daniel Walker on 28 Jan 2008 18:50

On Mon, 2008-01-28 at 10:32 -0800, Max Krasnyanskiy wrote:
> Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai can't do that.
> For example I have separate tasks with hard deadlines that must be enforced in 50usec kind
> of range and basically no idle time whatsoever. Just to give more background it's a wireless
> basestation with SW MAC/Scheduler. Another requirement is for the SW to know precise timing
> because SW. For example there is no way we can do predictable 1-2 usec sleeps.
> So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
> overhead from the kernel, just IPIs for memory management and that's basically it. When my legal
> department lets me I'll do a presentation on this stuff at Linux RT conference or something.

What kind of hardware are you doing this on? Also I should note there is
HRT (High resolution timers) which provided microsecond level
granularity ..

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: 2.6.22-stable causes oomkiller to be invoked
Next: bcm203x bluetooth dongle does not work