From: Victor Javier on
Hello,


I am doing some research where I need to collect performance information
for SPEC CPU2006 benchmarks on a POWER6 JS22 system. Previously I was
using perfmon2, but after the release of "performance counters for
linux" (and the 'perf' tool), I decided to try it. One of the reasons
was the native support for multiplexing.

However, I have been noticing a much higher variability when using perf,
compared to perfmon2. As an example, I will provide data for 'bwaves'
benchmark when run with the reference input set (it takes around 20
minutes to finish).

The information for the kernels I am using is:
* perfmon2: Linux version 2.6.28-pfmon2 (gcc version 4.1.2 20070115 (SUSE Linux)) #6 SMP
* perf: Linux version 2.6.33.3-perf (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP

I am using libpfm version 3.8.

I can provide more information, such as modules, detailed processor
information, etc.) if necessary.

The commands I used to collect the counters are:

perfmon2: pfmon -e PM_CYC,PM_INST_CMPL,PM_LD_MISS_L1 ./bwaves_base.Linux64
perf: perf stat -e r1e:u,r2:u,r80080:u ./bwaves_base.Linux64

I also tried to pin the execution to a given CPU, but the results were
the same.
I repeated the executions 10 times, so I am also providing the mean and
the standard deviation.

============
= perfmon2 =
============

cycles instrs completed L1 load misses
4,567,041,667,206 2,772,827,993,242 6,918,871,375
4,569,071,274,248 2,772,827,992,642 6,931,066,292
4,568,234,790,260 2,772,827,992,716 6,922,975,235
4,566,485,780,016 2,772,827,992,065 6,917,600,192
4,566,437,677,239 2,772,827,992,067 6,915,222,376
4,566,640,807,800 2,772,827,992,066 6,915,703,838
4,566,466,402,423 2,772,827,992,062 6,914,107,325
4,569,322,329,138 2,772,828,006,865 6,933,546,730
4,567,018,722,323 2,772,827,992,066 6,914,210,622
4,566,778,622,700 2,772,827,992,066 6,914,251,098

mean 4,567,349,807,335 2,772,827,993,786 6,919,755,508
stdev 1,107,043,810 4,614 7,178,958

========
= perf =
========

cycles instrs completed L1 load misses
4,562,017,366,591 2,772,768,370,128 7,134,353,697
4,541,500,651,248 2,772,868,724,285 6,341,491,710
4,550,876,532,582 2,772,787,520,375 6,661,719,666
4,540,558,691,334 2,772,868,724,156 6,266,617,715
4,573,942,460,136 2,772,861,831,519 7,419,020,488
4,587,876,861,751 2,772,868,724,189 8,174,507,077
4,550,771,568,044 2,772,841,147,861 6,547,437,055
4,600,947,093,875 2,772,787,520,375 9,152,895,835
4,572,501,705,517 2,772,861,831,526 7,765,464,256
4,561,690,369,227 2,772,787,520,368 6,902,452,934

mean 4,564,268,330,031 2,772,830,191,478 7,236,596,043
stdev 19,770,352,264 41,980,009 914,965,698

As can be seen, the standard deviation for perf is significantly higher.
Considering the instructions completed, perf shows a 10000x higher
standard deviation. Although this variation may not be very high if
compared to the absolute number of instructions completed, it is an
issue for the case of L1 load misses. In the case of perfmon2 I can
expect misses to be in the range [6,905,397,592 .. 6,934,113,424], which
is a tight confidence interval. However, for perf this interval grows
until [5,406,664,646 .. 9,066,527,440]. This variation is clearly not
acceptable, as I cannot really draw any conclusion from those results.

I would like to know if you are aware of this issue, and which could be
the causes. I would also appreciate any help into fixing this.

In case it is not easy to read the data, I provide it as a separate PDF
file as well. I also attach a couple of graphs showing the variation for
instructions and misses.

Thank you for any help on this,
Victor