From: Tom St Denis on
On May 20, 8:30 pm, Maaartin <grajc...(a)seznam.cz> wrote:
> There are three cores in AMD Phenom II X3 710 CPU, not four.
>
> On May 21, 12:48 am, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote:
>
> > Datesfat Chicks wrote:
> > > Whether concurrent applications are running is a separate issue.
>
> > > My assumption is that the benchmark is for the thing as a whole, i.e.
> > > not "per core".
>
> > If there are multicores the performance would anyway also depend on the
> > performance of the compiler in well exploiting the available resources,
> > I suppose. If there were concurrent jobs running, then the benchmark
> > result obtained would be dependent on these. Hence benchmarking should
> > be done with a single job running, unless one is investigating the
> > proformance of a computer under a general multitasking scenario, which
> > of course also has its own sense, in which case the job mix needs
> > however to be properly specified. Thus in OP's case the figure in
> > question is surely for the whole processor as you said above.
>
> The most efficient way for computing this benchmark is probably to
> start just one thread per core. This way there's no need for
> parallelization of a single job and no unnecessary pressure on the
> cache (although here it's no problem anyway, since even the L1 cache
> is large enough). I assume, the reason for AMD winning over Intel in
> this benchmark is having one more core (not counting virtual cores due
> to hyperthreading).

You can easily compute a signature in two threads on two cores in
parallel to get a very low latency signature implementation.

Tom

From: Thomas Pornin on
According to Clark Smith <noaddress(a)nowhere.net>:
> According to this link
>
> http://www.phoronix.com/scan.php?page=article&item=intel_corei3_530&num=6
>
> one can compute some 148 signatures per second using 4096-bit RSA moduli
> on an AMD Phenom II X3 710 CPU.

With OpenSSL, my older Core2 Quad Q6600, clocked at 2.4 GHz, can produce
35 4096-bit RSA signatures per second on each core. This means 140
signatures per second using all four cores. The Phenom II is more recent
and clocked at higher frequencies, so my guess is that the "148 sig/s"
figure is for a single core. That's what OpenSSL benches anyway.

(Also, a Phenom II X3 has three cores, not four.)


--Thomas Pornin
From: Clark Smith on
On Fri, 21 May 2010 13:18:29 +0000, Thomas Pornin wrote:

> According to Clark Smith <noaddress(a)nowhere.net>:
>> According to this link
>>
>> http://www.phoronix.com/scan.php?
page=article&item=intel_corei3_530&num=6
>>
>> one can compute some 148 signatures per second using 4096-bit RSA
>> moduli on an AMD Phenom II X3 710 CPU.
>
> With OpenSSL, my older Core2 Quad Q6600, clocked at 2.4 GHz, can produce
> 35 4096-bit RSA signatures per second on each core. This means 140
> signatures per second using all four cores. The Phenom II is more recent
> and clocked at higher frequencies, so my guess is that the "148 sig/s"
> figure is for a single core. That's what OpenSSL benches anyway.

By default. It's very easy to get it to use all cores in parallel:

openssl speed rsa -multi 2

will use two cores in parallel, if you have them. I have an old dual core
box and this what I get with -multi 2:

sign verify sign/s verify/s
rsa 512 bits 0.000425s 0.000038s 2353.3 25991.6
rsa 1024 bits 0.002138s 0.000102s 467.8 9830.9
rsa 2048 bits 0.012373s 0.000339s 80.8 2948.5
rsa 4096 bits 0.081385s 0.001201s 12.3 832.9

Without it:

sign verify sign/s verify/s
rsa 512 bits 0.000786s 0.000069s 1272.3 14407.9
rsa 1024 bits 0.004018s 0.000199s 248.9 5015.6
rsa 2048 bits 0.024223s 0.000665s 41.3 1503.5
rsa 4096 bits 0.159683s 0.002351s 6.3 425.4

The link above does not specify what approach is used and, sadly,
nobody in this forum seems know for sure either.
From: Maaartin on
On May 21, 3:46 pm, Clark Smith <noaddr...(a)nowhere.net> wrote:
> On Fri, 21 May 2010 13:18:29 +0000, Thomas Pornin wrote:
> > According to Clark Smith  <noaddr...(a)nowhere.net>:
> >>        According to this link
>
> >>http://www.phoronix.com/scan.php?
>
> page=article&item=intel_corei3_530&num=6
>
>
>
> >> one can compute some 148 signatures per second using 4096-bit RSA
> >> moduli on an AMD Phenom II X3 710 CPU.
>
> > With OpenSSL, my older Core2 Quad Q6600, clocked at 2.4 GHz, can produce
> > 35 4096-bit RSA signatures per second on each core. This means 140
> > signatures per second using all four cores. The Phenom II is more recent
> > and clocked at higher frequencies, so my guess is that the "148 sig/s"
> > figure is for a single core. That's what OpenSSL benches anyway.
>
>         By default. It's very easy to get it to use all cores in parallel:
>
>         openssl speed rsa -multi 2
>
> will use two cores in parallel, if you have them. I have an old dual core
> box and this what I get with -multi 2:
>
>                   sign    verify    sign/s verify/s
> rsa  512 bits 0.000425s 0.000038s   2353.3  25991.6
> rsa 1024 bits 0.002138s 0.000102s    467.8   9830.9
> rsa 2048 bits 0.012373s 0.000339s     80.8   2948.5
> rsa 4096 bits 0.081385s 0.001201s     12.3    832.9
>
> Without it:
>
>                   sign    verify    sign/s verify/s
> rsa  512 bits 0.000786s 0.000069s   1272.3  14407.9
> rsa 1024 bits 0.004018s 0.000199s    248.9   5015.6
> rsa 2048 bits 0.024223s 0.000665s     41.3   1503.5
> rsa 4096 bits 0.159683s 0.002351s      6.3    425.4
>
>         The link above does not specify what approach is used and, sadly,
> nobody in this forum seems know for sure either.

Mine is AMD Phenom(tm) II X4 920 Processor, 2800 MHz, and I get

sign verify sign/s verify/s
rsa 512 bits 0.000390s 0.000032s 2564.1 31250.0
rsa 1024 bits 0.001742s 0.000081s 574.1 12345.7
rsa 2048 bits 0.009622s 0.000266s 103.9 3759.4
rsa 4096 bits 0.060842s 0.000894s 16.4 1118.6

for single core and

sign verify sign/s verify/s
rsa 512 bits 0.000102s 0.000008s 9819.3 121491.3
rsa 1024 bits 0.000445s 0.000021s 2248.1 47191.4
rsa 2048 bits 0.002439s 0.000063s 410.0 15773.7
rsa 4096 bits 0.015309s 0.000224s 65.3 4465.6

for all four cores. That's quite strange since I come nowhere near the
148 signatures per second, and my CPU is more recent. Maybe my system
(XP64 with cygwin) is not running optimally, but here not even memory
gets used, since everything fits into the cache.
From: Clark Smith on
On Fri, 21 May 2010 08:13:14 -0700, Maaartin wrote:

> On May 21, 3:46 pm, Clark Smith <noaddr...(a)nowhere.net> wrote:
>> On Fri, 21 May 2010 13:18:29 +0000, Thomas Pornin wrote:
>> > According to Clark Smith  <noaddr...(a)nowhere.net>:
>> >>        According to this link
>>
>> >>http://www.phoronix.com/scan.php?
>>
>> page=article&item=intel_corei3_530&num=6
>>
>>
>>
>> >> one can compute some 148 signatures per second using 4096-bit RSA
>> >> moduli on an AMD Phenom II X3 710 CPU.
>>
>> > With OpenSSL, my older Core2 Quad Q6600, clocked at 2.4 GHz, can
>> > produce 35 4096-bit RSA signatures per second on each core. This
>> > means 140 signatures per second using all four cores. The Phenom II
>> > is more recent and clocked at higher frequencies, so my guess is that
>> > the "148 sig/s" figure is for a single core. That's what OpenSSL
>> > benches anyway.
>>
>>         By default. It's very easy to get it to use all cores in
>>         parallel:
>>
>>         openssl speed rsa -multi 2
>>
>> will use two cores in parallel, if you have them. I have an old dual
>> core box and this what I get with -multi 2:
>>
>>                   sign    verify    sign/s verify/s
>> rsa  512 bits 0.000425s 0.000038s   2353.3  25991.6 rsa 1024 bits
>> 0.002138s 0.000102s    467.8   9830.9 rsa 2048 bits 0.012373s 0.000339s
>>     80.8   2948.5 rsa 4096 bits 0.081385s 0.001201s     12.3    832.9
>>
>> Without it:
>>
>>                   sign    verify    sign/s verify/s
>> rsa  512 bits 0.000786s 0.000069s   1272.3  14407.9 rsa 1024 bits
>> 0.004018s 0.000199s    248.9   5015.6 rsa 2048 bits 0.024223s 0.000665s
>>     41.3   1503.5 rsa 4096 bits 0.159683s 0.002351s      6.3    425.4
>>
>>         The link above does not specify what approach is used and,
>>         sadly,
>> nobody in this forum seems know for sure either.
>
> Mine is AMD Phenom(tm) II X4 920 Processor, 2800 MHz, and I get
>
> sign verify sign/s verify/s
> rsa 512 bits 0.000390s 0.000032s 2564.1 31250.0 rsa 1024 bits
> 0.001742s 0.000081s 574.1 12345.7 rsa 2048 bits 0.009622s 0.000266s
> 103.9 3759.4 rsa 4096 bits 0.060842s 0.000894s 16.4 1118.6
>
> for single core and
>
> sign verify sign/s verify/s
> rsa 512 bits 0.000102s 0.000008s 9819.3 121491.3 rsa 1024 bits
> 0.000445s 0.000021s 2248.1 47191.4 rsa 2048 bits 0.002439s 0.000063s
> 410.0 15773.7 rsa 4096 bits 0.015309s 0.000224s 65.3 4465.6
>
> for all four cores. That's quite strange since I come nowhere near the
> 148 signatures per second, and my CPU is more recent. Maybe my system
> (XP64 with cygwin) is not running optimally, but here not even memory
> gets used, since everything fits into the cache.

Thanks for your feedback. This makes me think that the number
reported in the article I mentioned was obtained with -multi 4 (or -multi
3, or however many cores are available in that CPU.) Still, I can't help
but feeling that something is wrong with your platform: With those specs
I would have expected way more than 574 1024-bit signatures per second
per core.