From: zhigang gong on

I experienced a very strange performance problem. I wrote a user space
application which has three threads:
Thread A, Thread B, Thread C.  The initial state is that C is waiting
on a semaphore S0, B is waiting on another
semaphore S1.

Now A prepare a buffer, and do some processing on the buffer, and it
will wakeup C, then A will wait on semaphore S2.
C will copy the buffer and do some processing on the buffer, and then
it will wakeup B then itself will wait on S0 again.

B will copy the buffer and do the same as what A have done before to
pass the buffer to A through C.

It's a A-->C-->B-->C--->A sequence, and it will repeat for many times.
I measure the ping-pong latency at A

My test environment is
a 2core, each core has 4 hardware threads,a intel machine. The linux
kernel version is 2.6.31.

Now the result is that when I bind all threads on 1 hardware thread,
the latency is about 17us.
taskset 10 ./latency_test.

But when I just execute ./latency_test   it got about 40us.

I use vmstat to monitor the performance, and I found that:

For the first case, the interrupt count is much lesser than the second
case, while the context switch count is very close
to each other. The intterupt count for the first case is about 30K ,
for the second case is about 79K.
The ratio of 79K/30K is approximatly equal to 40us/17us.

My question is why these two cases have such a different interrupt
count ? And is there any tool other than vmstat to measure
what is the intterupt source, it's  IPC intterupt or a system call or
something else?

Is there anybody can give me a clue on this? I will very appreciate
for your help.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)
More majordomo info at
Please read the FAQ at