From: Pauli Nieminen on
Hi,

While running rendering benchmark we noticed huge context switch numbers that
were a lot larger than excepted. Numbers were showing 20-30 context switches
per frame. Theoretical minimum context switches is less than 10.

Problem was traced using FineToothComp (FTC) [1]. Tracing down to per
instruction view what happens in system.

Conext switches are relatively easy to track down from the trace
visualization so it was easy to find out what was causing the extra context
switches.

* What application is doing when calling xserver?

Application opens unix socket connection to xserver. Communication is based
on simple message protocol. Application does following call sequence when
communicating to xserver.

writev(...) // Send message(s) to xserver

poll(...) // wait for response(s) from xserver

read(...) // read the response(s)

* Where does extra context switches happen?

Application developer would except that only poll call would trigger context
switch. But in worst case every system call is triggering context switch.

- writev()

writev does implicit context switch in kernel after writing to socket. This
isn't perfect behavior for asynchronous IPC because application might want to
queue up more messages before context switch.

This doesn't affect X communication because there is buffering implementation
in xcb for asynchronous operations. But it might be good to fix for some
other implementations that don't have buffering.

- poll()

But scheduling in writev backfires when poll is called. There is some X calls
(DRI2GetBuffers etc) that takes very long to handle. While xserver is handlnig
the request kernel decides to schedule client before there is response.

Now client executes very few instructions before hitting the poll. Now poll
blocks because there is no response from xserver.

- read()

Application hits read soon after returning from poll. xserver has gone to
sleep in select call waiting for multiple file descriptor.

When read call is returning from kernel unlocking the socket triggers xserver
scheduling. This schedulnig never returns from kernel space but it iterates
over all file descriptors that xserver is waiting for.

* Test case

There is relatively simple test case attached that can check if kernel is
scheduling too many times. Test case is xwininfo tool modified not to output
anything. Running it with -root and -tree parameters is generating lots of
request for xserver.

Compilation instructions are in xwininfo.c.

test.sh can be used to automatically test if there was too many extra
scheduling. In perfect case scheduling should be only twice the number of
request.

* Where the bug has been seen?

I first saw this while doing development on arm. I have also tested a few x86
kernels to see if problem can be reproduced there too. Problem was
reproducing in most of configurations that I tested.

broken:
2.6.32 (ARM)
2.6.28 (x86 Debian Lenny)
2.6.35-rc4 (x86 vanila, Ubuntu Karmic user space)

But for my surprise I could find single working configuration that was my
work desktop using Ubuntu kernel.

correctly working:
2.6.31 (x86 Ubuntu Karmic)

* Bisecting failed

I tried to bisect if I could find the kernel commit that has fixed the
problem for Ubuntu karmic but I hit problem that my self compiled kernels
didn't boot. I don't have enough time now to try to do the bisecting.

If bisecting would help I can try to find some time.


There is still open question if the bug is caused by userspace doing stupid
stuff with unix socket or actual kernel problem. But my debugging is pointing
to kernel problem because same userspace hits the bug with different kernel.

Pauli

[1] QT blog post explaining what can be done with FTC
http://labs.trolltech.com/blogs/2009/09/29/exploring-qt-performance-on-arm-using-finetoothcomb/