BFS vs. mainline scheduler benchmarks and measurements [Kernel]

Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member

From: Frederic Weisbecker on 7 Sep 2009 07:10

On Mon, Sep 07, 2009 at 06:38:36AM +0300, Nikos Chantziaras wrote:
> Unfortunately, I can't come up with any way to somehow benchmark all of
> this. There's no benchmark for "fluidity" and "responsiveness". Running
> the Doom 3 benchmark, or any other benchmark, doesn't say anything about
> responsiveness, it only measures how many frames were calculated in a
> specific period of time. How "stable" (with no stalls) those frames were
> making it to the screen is not measurable.

That looks eventually benchmarkable. This is about latency.
For example, you could try to run high load tasks in the
background and then launch a task that wakes up in middle/large
periods to do something. You could measure the time it takes to wake
it up to perform what it wants.

We have some events tracing infrastructure in the kernel that can
snapshot the wake up and sched switch events.

Having CONFIG_EVENT_TRACING=y should be sufficient for that.

You just need to mount a debugfs point, say in /debug.

Then you can activate these sched events by doing:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/sched/sched_switch/enable
echo 1 > /debug/tracing/events/sched/sched_wake_up/enable

#Launch your tasks

echo 1 > /debug/tracing/tracing_on

#Wait for some time

echo 0 > /debug/tracing/tracing_off

That will require some parsing of the result in /debug/tracing/trace
to get the delays between wake_up events and switch in events
for the task that periodically wakes up and then produce some
statistics such as the average or the maximum latency.

That's a bit of a rough approach to measure such latencies but that
should work.

> If BFS would imply small drops in pure performance counted in
> instructions per seconds, that would be a totally acceptable regression
> for desktop/multimedia/gaming PCs. Not for server machines, of course.
> However, on my machine, BFS is faster in classic workloads. When I run
> "make -j2" with BFS and the standard scheduler, BFS always finishes a bit
> faster. Not by much, but still. One thing I'm noticing here is that BFS
> produces 100% CPU load on each core with "make -j2" while the normal
> scheduler stays at about 90-95% with -j2 or higher in at least one of the
> cores. There seems to be under-utilization of CPU time.

That also could be benchmarkable by using the above sched events and
look at the average time spent in a cpu to run the idle tasks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jens Axboe on 7 Sep 2009 08:10

On Mon, Sep 07 2009, Jens Axboe wrote:
> Scheduler Runtime Max lat Avg lat Std dev
> ----------------------------------------------------------------
> CFS 100 951 462 267
> CFS-x2 100 983 484 308
> BFS
> BFS-x2

Those numbers are buggy, btw, it's not nearly as bad. But responsiveness
under compile load IS bad though, the test app just didn't quantify it
correctly. I'll see if I can get it working properly.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Markus Tornqvist on 7 Sep 2009 09:50

Please Cc me as I'm not a subscriber.

(LKML bounced this message once already for 8-bit headers, I'm retrying
now - sorry if someone gets it twice)

On Mon, Sep 07, 2009 at 02:16:13PM +0200, Ingo Molnar wrote:
>
>Con posted single-socket quad comparisons/graphs so to make it 100%
>apples to apples i re-tested with a single-socket (non-NUMA) quad as
>well, and have uploaded the new graphs/results to:
>
> kernel build performance on quad:
> http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
[...]
>
>It shows similar curves and behavior to the 8-core results i posted
>- BFS is slower than mainline in virtually every measurement. The
>ratios are different for different parts of the graphs - but the
>trend is similar.

Dude, not cool.

1. Quad HT is not the same as a 4-core desktop, you're doing it with 8 cores
2. You just proved BFS is better on the job_count == core_count case, as BFS
says it is, if you look at the graph
3. You're comparing an old version of BFS against an unreleased dev kernel

Also, you said on http://article.gmane.org/gmane.linux.kernel/886319
"I also tried to configure the kernel in a BFS friendly way, i used
HZ=1000 as recommended, turned off all debug options, etc. The
kernel config i used can be found here:
http://redhat.com/~mingo/misc/config
"

Quickly looking at the conf you have
CONFIG_HZ_250=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y

And other DEBUG.

--
mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 7 Sep 2009 10:20

* Jens Axboe <jens.axboe(a)oracle.com> wrote:

> On Mon, Sep 07 2009, Jens Axboe wrote:
> > Scheduler Runtime Max lat Avg lat Std dev
> > ----------------------------------------------------------------
> > CFS 100 951 462 267
> > CFS-x2 100 983 484 308
> > BFS
> > BFS-x2
>
> Those numbers are buggy, btw, it's not nearly as bad. But
> responsiveness under compile load IS bad though, the test app just
> didn't quantify it correctly. I'll see if I can get it working
> properly.

What's the default latency target on your box:

cat /proc/sys/kernel/sched_latency_ns

?

And yes, it would be wonderful to get a test-app from you that would
express the kind of pain you are seeing during compile jobs.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arjan van de Ven on 7 Sep 2009 10:40

On Mon, 07 Sep 2009 06:38:36 +0300
Nikos Chantziaras <realnc(a)arcor.de> wrote:

> On 09/06/2009 11:59 PM, Ingo Molnar wrote:
> >[...]
> > Also, i'd like to outline that i agree with the general goals
> > described by you in the BFS announcement - small desktop systems
> > matter more than large systems. We find it critically important
> > that the mainline Linux scheduler performs well on those systems
> > too - and if you (or anyone else) can reproduce suboptimal behavior
> > please let the scheduler folks know so that we can fix/improve it.
>
> BFS improved behavior of many applications on my Intel Core 2 box in
> a way that can't be benchmarked. Examples:

Have you tried to see if latencytop catches such latencies ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member