From: Nikos Chantziaras on
On 09/09/2009 02:42 AM, Frans Pop wrote:
> But this evening, while I was preparing and running the tests, I've had 4
> freezes of the desktop.

Unfortunately BFS doesn't provide a reliable way (yet?) to run such
tests on it. This might be the cause for the hangs (from bfs-faq.txt):

Currently known problems?
[...]
3. Stuck tasks after extensive use of trace functions
(ptrace etc.).
[...]
5. More likely to show up bugs in *other* code due to
being much more aggressive at using multiple CPUs so
race conditions will show up more frequently.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frans Pop on
On Wednesday 09 September 2009, Frans Pop wrote:
> BFS general impression
> ----------------------
> I've used BFS for over a day yesterday and today, and in general I'm
> very impressed. During normal use (coding and testing a shell script
> that's CPU/memory heavy + normal mail/news/browser + amarok) I've not
> seen any strange issues. My notebook even suspended and resumed (StR)
> without any problems.
>
> With CFS I regularly have short freezes of the mouse cursor or when
> typing. I think that it's related to KDE's news reader knode updating
> from my local news server. With CFS I also saw such freezes a few
> times, but they _seemed_ less frequent and less severe. No hard data
> though.

The 2nd CFS should have been BFS here. Sorry.

> But this evening, while I was preparing and running the tests, I've had
> 4 freezes of the desktop. The first two times it was only a partial
> freeze: taskbar was frozen, but I could still switch apps and use the
> graphical console; the last two times it was a full freeze of the
> display and keyboard (incl. e.g. numlock), but in the background
> everything continued to run normally and I could log in over SSH
> without any problem. On reboot some file systems did fail to unmount
> though.
>
> Normally my desktop and X.Org are 100% reliable.

Cheers,
FJP

P.S. I've received a very positive and friendly private reply from Con.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frans Pop on
On Friday 11 September 2009, Ingo Molnar wrote:
> Note, the one you used was a still buggy version of latt.c producing
> bogus latency numbers - you will need the fix to it attached below.

Yes, I'm aware of that and have already copied Jens' latest version.

> Furthermore, the following tune might be needed on mainline to make
> it produce consistently good max numbers (not just good averages):
>
> echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns

Ack. I've seen the patches to change some defaults floating by.
Hmmm. I think the proposed new default for my system is 2ms with 2 CPUs?

I will not test against TIP at this time, but I plan to do the following:
- repeat my tests now using vanilla 2.6.31 for both BFS and CFS
This will provide a baseline to verify improvements.
- do two additional runs with CFS with some modified tunables
- do one more run probably when .32-rc2 is out
I'd expect that to have the scheduler fixes, while the worst post-merge
issues should be resolved.

I also have a couple of ideas for getting additional data. I'll post my
results as follow-ups.

I'm very impressed with the responses to the issues that have been raised,
but I think we do owe Con a huge thank you for setting off that process.

I also think there is a lot to be said for having a very straightforward
alternative scheduler available for baseline comparisons. It's much
easier to come out and say "something's broken" if you know some latency
issue is not due to buggy hardware or applications or orange bunnies with
a cosmic ray gun. I'll not go into the question whether such a scheduler
should be in mainline or not.

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Fri, Sep 11 2009, Frans Pop wrote:
> On Friday 11 September 2009, Ingo Molnar wrote:
> > Note, the one you used was a still buggy version of latt.c producing
> > bogus latency numbers - you will need the fix to it attached below.
>
> Yes, I'm aware of that and have already copied Jens' latest version.

BTW, I put it in a git repo, it quickly gets really confusing with so
many version going around. So that can be accessed here:

git://git.kernel.dk/latt.git

and as with my other repos, snapshots are automatically generated every
hour when new commits have been made. To get the very latest latt and
not have to use git, download:

http://brick.kernel.dk/snaps/latt-git-latest.tar.gz

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Fri, Sep 11 2009, Ingo Molnar wrote:
>
> * Jens Axboe <jens.axboe(a)oracle.com> wrote:
>
> > On Fri, Sep 11 2009, Frans Pop wrote:
> > > On Friday 11 September 2009, Ingo Molnar wrote:
> > > > Note, the one you used was a still buggy version of latt.c producing
> > > > bogus latency numbers - you will need the fix to it attached below.
> > >
> > > Yes, I'm aware of that and have already copied Jens' latest version.
> >
> > BTW, I put it in a git repo, it quickly gets really confusing with so
> > many version going around. So that can be accessed here:
> >
> > git://git.kernel.dk/latt.git
> >
> > and as with my other repos, snapshots are automatically generated every
> > hour when new commits have been made. To get the very latest latt and
> > not have to use git, download:
> >
> > http://brick.kernel.dk/snaps/latt-git-latest.tar.gz
>
> Btw., your earlier latt reports should be discarded as invalid due
> to that bug.

Yes

> With the fixed latt.c version the mainline latencies (both
> worst-case and average) were reported to be better after the poll()
> bug got fixed, so in that area, for this kind of measurement,
> mainline seems to be working well.
>
> [ What happened is that the poll() bug was creating false latencies
> in the mainline scheduler tests. (BFS avoided measuring that bug
> incidentally, by its agressive balancer moved the wakee tasks away
> from the buggy busy-looping poll() looping parent task. Two
> instances of latt.c would possibly have shown similar latencies.) ]
>
> I see you added new 'work generator' changes to latt.c now, will
> check/validate that version of latt.c too.

I did, it's a simple 'generate random data and compress it' work piece
for each client. You can control the amount of work with -x, which sets
the kb of data it'll work on. Stats are generated both for wakeup
latency, and work processing latency.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/