CPU isolation extensions [Kernel]

Prev: 2.6.22-stable causes oomkiller to be invoked
Next: bcm203x bluetooth dongle does not work

From: Max Krasnyanskiy on 28 Jan 2008 13:40

Hi Peter,

Peter Zijlstra wrote:
> [ You really ought to CC people :-) ]
I was not sure who though :)
Do we have a mailing list for scheduler development btw ?
Or it's just folks that you included in CC ?
Some of the latest scheduler patches brake things that I'm doing and I'd like to make
them configurable (RT watchdog, etc).

> On Sun, 2008-01-27 at 20:09 -0800, maxk(a)qualcomm.com wrote:
>> Following patch series extends CPU isolation support. Yes, most people want to virtuallize
>> CPUs these days and I want to isolate them :).
>> The primary idea here is to be able to use some CPU cores as dedicated engines for running
>> user-space code with minimal kernel overhead/intervention, think of it as an SPE in the
>> Cell processor.
>>
>> We've had scheduler support for CPU isolation ever since O(1) scheduler went it.
>> I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
>> In fact that the primary distinction that I'm making between say "CPU sets" and
>> "CPU isolation". "CPU sets" let you manage user-space load while "CPU isolation" provides
>> a way to isolate a CPU as much as possible (including kernel activities).
>
> Ok, so you're aware of CPU sets, miss a feature, but instead of
> extending it to cover your needs you build something new entirely?
It's not really new. CPU isolation bits just has not been exported before that's all.
Also "CPU sets" seem to mostly deal with the scheduler domains. I'll reply to Paul's
proposal to use that instead.

>> I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to
>> achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
>> multi- processor/core systems under exteme system load. I'm working with legal folks on releasing
>> hard RT user-space framework for that.
>> I can also see other application like simulators and stuff that can benefit from this.
>
> have you been using just this, or in combination with the -rt effort?
Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai can't do that.
For example I have separate tasks with hard deadlines that must be enforced in 50usec kind
of range and basically no idle time whatsoever. Just to give more background it's a wireless
basestation with SW MAC/Scheduler. Another requirement is for the SW to know precise timing
because SW. For example there is no way we can do predictable 1-2 usec sleeps.
So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
overhead from the kernel, just IPIs for memory management and that's basically it. When my legal
department lets me I'll do a presentation on this stuff at Linux RT conference or something.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Max Krasnyanskiy on 28 Jan 2008 13:50

Steven Rostedt wrote:
> On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
>> Thanks for the CC, Peter.
>
> Thanks from me too.
>
>> Max wrote:
>>> We've had scheduler support for CPU isolation ever since O(1) scheduler went it.
>>> I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
>> I recently added the per-cpuset flag 'sched_load_balance' for some
>> other realtime folks, so that they can disable the kernel scheduler
>> load balancing on isolated CPUs. It essentially allows for dynamic
>> control of which CPUs are isolated by the scheduler, using the cpuset
>> hierarchy, rather than enhancing the 'isolated_cpus' mask. That
>> 'isolated_cpus' mask remained a minimal kernel boottime parameter.
>> I believe this went to Linus's tree about Oct 2007.
>>
>> It looks like you have three additional tweaks for realtime in this
>> patch set, with your patches:
>>
>> [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
>
> I didn't know we still routed IRQs to isolated CPUs. I guess I need to
> look deeper into the code on this one. But I agree that isolated CPUs
> should not have IRQs routed to them.
Also note that it's just a convenience feature. In other words it's not that with this patch
we'll never route IRQs to those CPUs. They can still be explicitly routed by writing to
irq/N/smp_affitnity.

>> [PATCH] [CPUISOL] Support for workqueue isolation
>
> The thing about workqueues is that they should only be woken on a CPU if
> something on that CPU accessed them. IOW, the workqueue on a CPU handles
> work that was called by something on that CPU. Which means that
> something that high prio task did triggered a workqueue to do some work.
> But this can also be triggered by interrupts, so by keeping interrupts
> off the CPU no workqueue should be activated.
No no no. That's what I though too ;-). The problem is that things like NFS and friends
expect _all_ their workqueue threads to report back when they do certain things like
flushing buffers and stuff. The reason I added this is because my machines were getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though no IRQs
or other things are running on it.

>> [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
>
> This I find very dangerous. We are making an assumption that tasks on an
> isolated CPU wont be doing things that stopmachine requires. What stops
> a task on an isolated CPU from calling something into the kernel that
> stop_machine requires to halt?
I agree in general. The thing is though that stop machine just kills any kind of latency
guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
when module is inserted/removed. And running without dynamic module loading is not very
practical on general purpose machines. So I'd rather have an option with a big red warning
than no option at all :).

Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Max Krasnyanskiy on 28 Jan 2008 14:00

Peter Zijlstra wrote:
> On Mon, 2008-01-28 at 11:34 -0500, Steven Rostedt wrote:
>> On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
>>> Thanks for the CC, Peter.
>> Thanks from me too.
>>
>>> Max wrote:
>>>> We've had scheduler support for CPU isolation ever since O(1) scheduler went it.
>>>> I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
>>> I recently added the per-cpuset flag 'sched_load_balance' for some
>>> other realtime folks, so that they can disable the kernel scheduler
>>> load balancing on isolated CPUs. It essentially allows for dynamic
>>> control of which CPUs are isolated by the scheduler, using the cpuset
>>> hierarchy, rather than enhancing the 'isolated_cpus' mask. That
>>> 'isolated_cpus' mask remained a minimal kernel boottime parameter.
>>> I believe this went to Linus's tree about Oct 2007.
>>>
>>> It looks like you have three additional tweaks for realtime in this
>>> patch set, with your patches:
>>>
>>> [PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot
>> I didn't know we still routed IRQs to isolated CPUs. I guess I need to
>> look deeper into the code on this one. But I agree that isolated CPUs
>> should not have IRQs routed to them.
>
> While I agree with this in principle, I'm not sure flat out denying all
> IRQs to these cpus is a good option. What about the case where we want
> to service just this one specific IRQ on this CPU and no others?
>
> Can't this be done by userspace irq routing as used by irqbalanced?
Peter, I think you missed the point of this patch. It's just a convenience feature.
It simply excludes isolated CPUs from IRQ smp affinity masks. That's all. What did you
mean by "flat out denying all IRQs to these cpus" ? IRQs can still be routed to them
by writing to /proc/irq/N/smp_affinity.

Also, this happens naturally when we bring a CPU off-line and then bring it back online.
ie When CPU comes back online it's excluded from the IRQ smp_affinity masks even without
my patch.

>>> [PATCH] [CPUISOL] Support for workqueue isolation
>> The thing about workqueues is that they should only be woken on a CPU if
>> something on that CPU accessed them. IOW, the workqueue on a CPU handles
>> work that was called by something on that CPU. Which means that
>> something that high prio task did triggered a workqueue to do some work.
>> But this can also be triggered by interrupts, so by keeping interrupts
>> off the CPU no workqueue should be activated.
>
> Quite so, if nobody uses it, there is no harm in having them around. If
> they are used, its by someone already allowed on the cpu.

No no no. I just replied to Steven about that. The problem is that things like NFS and
friends expect _all_ their workqueue threads to report back when they do certain things
like flushing buffers and stuff. The reason I added this is because my machines were
getting stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though
no IRQs, softirqs or other things are running on it.

>>> [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
>> This I find very dangerous. We are making an assumption that tasks on an
>> isolated CPU wont be doing things that stopmachine requires. What stops
>> a task on an isolated CPU from calling something into the kernel that
>> stop_machine requires to halt?
>
> Very dangerous indeed!
Please see my reply to Steven. I agree it's somewhat dangerous. What we could do is make it
configurable with a big fat warning. In other words I'd rather have an option than just says
"do not use dynamic module loading" on those systems.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 28 Jan 2008 14:10

On Mon, 28 Jan 2008, Max Krasnyanskiy wrote:
> >> [PATCH] [CPUISOL] Support for workqueue isolation
> >
> > The thing about workqueues is that they should only be woken on a CPU if
> > something on that CPU accessed them. IOW, the workqueue on a CPU handles
> > work that was called by something on that CPU. Which means that
> > something that high prio task did triggered a workqueue to do some work.
> > But this can also be triggered by interrupts, so by keeping interrupts
> > off the CPU no workqueue should be activated.

> No no no. That's what I though too ;-). The problem is that things like NFS and friends
> expect _all_ their workqueue threads to report back when they do certain things like
> flushing buffers and stuff. The reason I added this is because my machines were getting
> stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though no IRQs
> or other things are running on it.

This sounds more like we should fix NFS than add this for all workqueues.
Again, we want workqueues to run on the behalf of whatever is running on
that CPU, including those tasks that are running on an isolcpu.

>
> >> [PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"
> >
> > This I find very dangerous. We are making an assumption that tasks on an
> > isolated CPU wont be doing things that stopmachine requires. What stops
> > a task on an isolated CPU from calling something into the kernel that
> > stop_machine requires to halt?

> I agree in general. The thing is though that stop machine just kills any kind of latency
> guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
> when module is inserted/removed. And running without dynamic module loading is not very
> practical on general purpose machines. So I'd rather have an option with a big red warning
> than no option at all :).

Well, that's something one of the greater powers (Linus, Andrew, Ingo)
must decide. ;-)

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul Jackson on 28 Jan 2008 14:10

Max wrote:
> So far it seems that extending cpu_isolated_map
> is more natural way of propagating this notion to the rest of the kernel.
> Since it's very similar to the cpu_online_map concept and it's easy to integrated
> with the code that already uses it.

If it were just realtime support, then I suspect I'd agree that
extending cpu_isolated_map makes more sense.

But some people use realtime on systems that are also heavily
managed using cpusets. The two have to work together. I have
customers with systems running realtime on a few CPUs, at the
same time that they have a large batch scheduler (which is layered
on top of cpusets) managing jobs on a few hundred other CPUs.
Hence with the cpuset 'sched_load_balance' flag I think I've already
done what I think is one part of what your patches achieve by extending
the cpu_isolated_map.

This is a common situation with "resource management" mechanisms such
as cpusets (and more recently cgroups and the subsystem modules it
supports.) They cut across existing core kernel code that manages such
key resources as CPUs and memory. As best we can, they have to work
with each other.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj(a)sgi.com> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: 2.6.22-stable causes oomkiller to be invoked
Next: bcm203x bluetooth dongle does not work