From: Shawn Bohrer on
Hello,

Currently we have a workload that depends on around 50 processes that
wake up 1000 times a second do a small amount of work and go back to
sleep. This works great on RHEL 5 (2.6.18-164.6.1.el5), but on recent
kernels we are unable to achieve 1000 iterations per second. Using
the simple test application below on RHEL 5 2.6.18-164.6.1.el5 I can run
500 of these processes on and still achieve 999.99 iterations per
second. Running just 10 of these processes on the same machine with
2.6.32.6 produces results like:

....
Iterations Per Sec: 905.659667
Iterations Per Sec: 805.099068
Iterations Per Sec: 925.195578
Iterations Per Sec: 759.310773
Iterations Per Sec: 702.849261
Iterations Per Sec: 782.157292
Iterations Per Sec: 917.138031
Iterations Per Sec: 834.770391
Iterations Per Sec: 850.543755
....

I've tried playing with some of the cfs tunables in /proc/sys/kernel/
without success. Are there any suggestions on how to achieve the
results we are looking for using a recent kernel?

Thanks,
Shawn


#include <sys/epoll.h>
#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>

int main ()
{
int epfd = epoll_create(1);
int i, j;
struct timeval tv;
unsigned long start, end;
const unsigned int count = 60000;

while (1) {
gettimeofday(&tv, NULL);
start = tv.tv_sec * 1000000 + tv.tv_usec;

for (i = 0; i < count; ++i) {
if (epoll_wait(epfd, 0, 1, 1) == -1)
perror("epoll failed");

for (j = 0; j < 10000; ++j)
/* simulate work */;
}
gettimeofday(&tv, NULL);
end = tv.tv_sec * 1000000 + tv.tv_usec;

printf("Iterations Per Sec: %f\n", count/((double)(end - start)/1000000));
}

close(epfd);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on
On Sat, 30 Jan 2010 17:45:51 -0600
Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote:

> Hello,
>
> Currently we have a workload that depends on around 50 processes that
> wake up 1000 times a second do a small amount of work and go back to
> sleep. This works great on RHEL 5 (2.6.18-164.6.1.el5), but on recent
> kernels we are unable to achieve 1000 iterations per second. Using
> the simple test application below on RHEL 5 2.6.18-164.6.1.el5 I can
> run 500 of these processes on and still achieve 999.99 iterations per
> second. Running just 10 of these processes on the same machine with
> 2.6.32.6 produces results like:
>
> ...
> Iterations Per Sec: 905.659667
> Iterations Per Sec: 805.099068
> Iterations Per Sec: 925.195578
> Iterations Per Sec: 759.310773
> Iterations Per Sec: 702.849261
> Iterations Per Sec: 782.157292
> Iterations Per Sec: 917.138031
> Iterations Per Sec: 834.770391
> Iterations Per Sec: 850.543755
> ...
>
> I've tried playing with some of the cfs tunables in /proc/sys/kernel/
> without success. Are there any suggestions on how to achieve the
> results we are looking for using a recent kernel?

I'll play a bit, but I wonder idly what kind of machine this is on ?
(number and types of cpus)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on
On Sat, 30 Jan 2010 17:45:51 -0600
Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote:
>
> int main ()
> {
> int epfd = epoll_create(1);
> int i, j;
> struct timeval tv;
> unsigned long start, end;
> const unsigned int count = 60000;
>
> while (1) {
> gettimeofday(&tv, NULL);
> start = tv.tv_sec * 1000000 + tv.tv_usec;
>
> for (i = 0; i < count; ++i) {
> if (epoll_wait(epfd, 0, 1, 1) == -1)
> perror("epoll failed");
>
> for (j = 0; j < 10000; ++j)
> /* simulate work */;
> }
> gettimeofday(&tv, NULL);
> end = tv.tv_sec * 1000000 + tv.tv_usec;
>
> printf("Iterations Per Sec: %f\n",
> count/((double)(end - start)/1000000)); }
>
> close(epfd);
> }

btw do you have an equivalent program that uses poll instead of epoll
by chance?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on
On Sat, 30 Jan 2010 17:45:51 -0600
Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote:

> Hello,
>
> Currently we have a workload that depends on around 50 processes that
> wake up 1000 times a second do a small amount of work and go back to
> sleep. This works great on RHEL 5 (2.6.18-164.6.1.el5), but on recent
> kernels we are unable to achieve 1000 iterations per second. Using
> the simple test application below on RHEL 5 2.6.18-164.6.1.el5 I can
> run 500 of these processes on and still achieve 999.99 iterations per
> second. Running just 10 of these processes on the same machine with
> 2.6.32.6 produces results like:
> ]

there's an issue with your expectation btw.
what your application does, in practice is

<wait 1 millisecond>
<do a bunch of work>
<wait 1 millisecond>
<do a bunch of work>
etc

you would only be able to get close to 1000 per second if "bunch of
work" is nothing.....but it isn't.
so lets assume "bunch of work" is 100 microseconds.. the basic period
of your program (ignoring any costs/overhead in the implementation)
is 1.1 milliseconds, which is approximately 909 per second, not 1000!

I suspect that the 1000 you get on RHEL5 is a bug in the RHEL5 kernel
where it gives you a shorter delay than what you asked for; since it's
clearly not a correct number to get.

(and yes, older kernels had such rounding bugs, current kernels go
through great length to give applications *exactly* the delay they are
asking for....)



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Shawn Bohrer on
On Sat, Jan 30, 2010 at 04:11:14PM -0800, Arjan van de Ven wrote:
> On Sat, 30 Jan 2010 17:45:51 -0600
> Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote:
>
> > Hello,
> >
> > Currently we have a workload that depends on around 50 processes that
> > wake up 1000 times a second do a small amount of work and go back to
> > sleep. This works great on RHEL 5 (2.6.18-164.6.1.el5), but on recent
> > kernels we are unable to achieve 1000 iterations per second. Using
> > the simple test application below on RHEL 5 2.6.18-164.6.1.el5 I can
> > run 500 of these processes on and still achieve 999.99 iterations per
> > second. Running just 10 of these processes on the same machine with
> > 2.6.32.6 produces results like:
> > ]
>
> there's an issue with your expectation btw.
> what your application does, in practice is
>
> <wait 1 millisecond>
> <do a bunch of work>
> <wait 1 millisecond>
> <do a bunch of work>
> etc
>
> you would only be able to get close to 1000 per second if "bunch of
> work" is nothing.....but it isn't.
> so lets assume "bunch of work" is 100 microseconds.. the basic period
> of your program (ignoring any costs/overhead in the implementation)
> is 1.1 milliseconds, which is approximately 909 per second, not 1000!
>
> I suspect that the 1000 you get on RHEL5 is a bug in the RHEL5 kernel
> where it gives you a shorter delay than what you asked for; since it's
> clearly not a correct number to get.
>
> (and yes, older kernels had such rounding bugs, current kernels go
> through great length to give applications *exactly* the delay they are
> asking for....)

I agree that we are currently depending on a bug in epoll. The epoll
implementation currently rounds up to the next jiffie, so specifying a
timeout of 1 ms really just wakes the process up at the next timer tick.
I have a patch to fix epoll by converting it to use
schedule_hrtimeout_range() that I'll gladly send, but I still need a way
to achieve the same thing.

--
Shawn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/