CPU isolation extensions [Kernel]

Prev: 2.6.22-stable causes oomkiller to be invoked
Next: bcm203x bluetooth dongle does not work

From: Max Krasnyanskiy on 28 Jan 2008 19:20

Daniel Walker wrote:
> On Mon, 2008-01-28 at 10:32 -0800, Max Krasnyanskiy wrote:
>> Just this patches. RT patches cannot achieve what I needed. Even RTAI/Xenomai can't do that.
>> For example I have separate tasks with hard deadlines that must be enforced in 50usec kind
>> of range and basically no idle time whatsoever. Just to give more background it's a wireless
>> basestation with SW MAC/Scheduler. Another requirement is for the SW to know precise timing
>> because SW. For example there is no way we can do predictable 1-2 usec sleeps.
>> So I wrote a user-space engine that does all this, it requires full control of the CPU ie minimal
>> overhead from the kernel, just IPIs for memory management and that's basically it. When my legal
>> department lets me I'll do a presentation on this stuff at Linux RT conference or something.
>
> What kind of hardware are you doing this on?
All kinds of HW. I mentioned it in the intro email.
Here are the highlights
HP XW9300 (Dual Opteron NUMA box) and XW9400 (Dual Core Opteron)
HP DL145 G2 (Dual Opteron) and G3 (Dual Core Opteron)
Dell Precision workstations (Core2 Duo and Quad)
Various Core2 Duo based systems uTCA boards
Mercury AXA110 (1.5Ghz)
Concurrent Tech AM110 (2.1Ghz)

This scheme should work on anything that lets you disable SMI on the isolated core(s).

> Also I should note there is HRT (High resolution timers) which provided microsecond level
> granularity ..
Not accurate enough and way too much overhead for what I need. I know at this point it probably
sounds like I'm talking BS :). I wish I've released the engine and examples by now. Anyway let
me just say that SW MAC has crazy tight deadlines with lots of small tasks. Using nanosleep() &
gettimeofday() is simply not practical. So it's all TSC based with clever time sync logic between
HW and SW.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Daniel Walker on 28 Jan 2008 20:40

On Mon, 2008-01-28 at 16:12 -0800, Max Krasnyanskiy wrote:

> Not accurate enough and way too much overhead for what I need. I know at this point it probably
> sounds like I'm talking BS :). I wish I've released the engine and examples by now. Anyway let
> me just say that SW MAC has crazy tight deadlines with lots of small tasks. Using nanosleep() &
> gettimeofday() is simply not practical. So it's all TSC based with clever time sync logic between
> HW and SW.

I don't know if it's BS or not, you clearly fixed your own problem which
is good .. Although when you say "RT patches cannot achieve what I
needed. Even RTAI/Xenomai can't do that." , and HRT is "Not accurate
enough and way too much overhead" .. Given the hardware your using,
that's all difficult to believe.. You also said this code has been
running on production systems for two year, which means it's at least
two years old .. There's been some good sized leaps in real time linux
in the past two years ..

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mark Hounschell on 1 Feb 2008 08:20

maxk(a)qualcomm.com wrote:
> Following patch series extends CPU isolation support. Yes, most people want to virtuallize
> CPUs these days and I want to isolate them :).
> The primary idea here is to be able to use some CPU cores as dedicated engines for running
> user-space code with minimal kernel overhead/intervention, think of it as an SPE in the
> Cell processor.
>
> We've had scheduler support for CPU isolation ever since O(1) scheduler went it.
> I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
> In fact that the primary distinction that I'm making between say "CPU sets" and
> "CPU isolation". "CPU sets" let you manage user-space load while "CPU isolation" provides
> a way to isolate a CPU as much as possible (including kernel activities).
>
> I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to
> achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
> multi- processor/core systems under exteme system load. I'm working with legal folks on releasing
> hard RT user-space framework for that.
> I can also see other application like simulators and stuff that can benefit from this.
>
> I've been maintaining this stuff since around 2.6.18 and it's been running in production
> environment for a couple of years now. It's been tested on all kinds of machines, from NUMA
> boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
> The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess
> goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1)
> did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs).
> So this seems like a good time to merge.
>
> Anyway. The patchset consist of 5 patches. First three are very simple and non-controversial.
> They simply make "CPU isolation" a configurable feature, export cpu_isolated_map and provide
> some helper functions to access it (just like cpu_online() and friends).
> Last two patches add support for isolating CPUs from running workqueus and stop machine.
> More details in the individual patch descriptions.
>
> Ideally I'd like all of this to go in during this merge window. If people think it's acceptable
> Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this patch set from
> git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
>

It's good to see hear from someone else that thinks a multi-processor
box _should_ be able to run a CPU intensive (%100) RT app on one of the
processors without adversely affecting or being affected by the others.
I have had issues that were _traced_ back to the fact that I am doing
just that. All I got was, you can't do that or we don't support that
kind of thing in the Linux kernel.

One example, Andrew Mortons feedback to the LKML thread "floppy.c soft
lockup"

Good luck with this. I hope this gets someones attention.

BTW, I have tried your patches against a vanilla 2.6.24 kernel but am
not successful.

# echo '1' > /sys/devices/system/cpu/cpu1/isolated
bash: echo: write error: Device or resource busy

The cpuisol=1 cmdline option yields:

harley:# cat /sys/devices/system/cpu/cpu1/isolated

0

harley:# cat /proc/cmdline
root=/dev/sda3 vga=normal apm=off selinux=0 noresume splash=silent
kmalloc=192M cpuisol=1

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Max Krasnyanskiy on 1 Feb 2008 14:20

Hi Mark,
> maxk(a)qualcomm.com wrote:
>> Following patch series extends CPU isolation support. Yes, most people want to virtuallize
>> CPUs these days and I want to isolate them :).
>> The primary idea here is to be able to use some CPU cores as dedicated engines for running
>> user-space code with minimal kernel overhead/intervention, think of it as an SPE in the
>> Cell processor.
>>
>> We've had scheduler support for CPU isolation ever since O(1) scheduler went it.
>> I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
>> In fact that the primary distinction that I'm making between say "CPU sets" and
>> "CPU isolation". "CPU sets" let you manage user-space load while "CPU isolation" provides
>> a way to isolate a CPU as much as possible (including kernel activities).
>>
>> I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to
>> achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
>> multi- processor/core systems under exteme system load. I'm working with legal folks on releasing
>> hard RT user-space framework for that.
>> I can also see other application like simulators and stuff that can benefit from this.
>>
>> I've been maintaining this stuff since around 2.6.18 and it's been running in production
>> environment for a couple of years now. It's been tested on all kinds of machines, from NUMA
>> boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
>> The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess
>> goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1)
>> did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs).
>> So this seems like a good time to merge.
>>
>> Anyway. The patchset consist of 5 patches. First three are very simple and non-controversial.
>> They simply make "CPU isolation" a configurable feature, export cpu_isolated_map and provide
>> some helper functions to access it (just like cpu_online() and friends).
>> Last two patches add support for isolating CPUs from running workqueus and stop machine.
>> More details in the individual patch descriptions.
>>
>> Ideally I'd like all of this to go in during this merge window. If people think it's acceptable
>> Linus or Andrew (or whoever is more appropriate Ingo maybe) can pull this patch set from
>> git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
>>
>
> It's good to see hear from someone else that thinks a multi-processor
> box _should_ be able to run a CPU intensive (%100) RT app on one of the
> processors without adversely affecting or being affected by the others.
> I have had issues that were _traced_ back to the fact that I am doing
> just that. All I got was, you can't do that or we don't support that
> kind of thing in the Linux kernel.
>
> One example, Andrew Mortons feedback to the LKML thread "floppy.c soft lockup"
>
> Good luck with this. I hope this gets someones attention.
Thanks for the support. I do the best I can because just like you I believe that it's
a perfectly valid workload and there a lot of interesting applications that will benefit
from mainline support.

> BTW, I have tried your patches against a vanilla 2.6.24 kernel but am
> not successful.
>
> # echo '1' > /sys/devices/system/cpu/cpu1/isolated
> bash: echo: write error: Device or resource busy
You have to bring it offline first.
In other words:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/isolated
echo 1 > /sys/devices/system/cpu/cpu1/online

> The cpuisol=1 cmdline option yields:
>
> harley:# cat /sys/devices/system/cpu/cpu1/isolated
> 0
>
> harley:# cat /proc/cmdline
> root=/dev/sda3 vga=normal apm=off selinux=0 noresume splash=silent
> kmalloc=192M cpuisol=1
Sorry my bad. I had a typo in the patch description the option is "isolcpus=N".
We've had that option for awhile now. I mean it's not even part of my patch.

Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4
Prev: 2.6.22-stable causes oomkiller to be invoked
Next: bcm203x bluetooth dongle does not work