From: Charles Oliver Nutter on
On Fri, Oct 30, 2009 at 10:14 AM, James M. Lawrence
<quixoticsycophant(a)gmail.com> wrote:
> == ruby 1.9.2dev (2009-10-18 trunk 25393) [i386-darwin9.8.0]
> Rehearsal -------------------------------------------------------------
> 1 thread, 1 interpreter     4.370000   0.020000   4.390000 (  4.389990)
> 2 threads, 1 interpreter    4.360000   0.030000   4.390000 (  4.385111)
> 2 threads, 2 interpreters   0.010000   0.010000   4.700000 (  2.460661)
> --------------------------------------------------- total: 13.480000sec
>
>                                user     system      total        real
> 1 thread, 1 interpreter     4.360000   0.020000   4.380000 (  4.376050)
> 2 threads, 1 interpreter    4.360000   0.030000   4.390000 (  4.380982)
> 2 threads, 2 interpreters   0.010000   0.010000   4.710000 (  2.465925)
>
>
> == jruby 1.4.0RC3 (ruby 1.8.7 patchlevel 174) (2009-10-30 1d7de2d) (Java
> HotSpot(TM) Client VM 1.5.0_20) [i386-java]
> Rehearsal ------------------------------------------------------------
> 1 thread, 1 interpreter    6.060000   0.000000   6.060000 (  6.060000)
> 2 threads, 1 interpreter   7.629000   0.000000   7.629000 (  7.629000)
> -------------------------------------------------- total: 13.689000sec
>
>                               user     system      total        real
> 1 thread, 1 interpreter    6.080000   0.000000   6.080000 (  6.080000)
> 2 threads, 1 interpreter   7.288000   0.000000   7.288000 (  7.288000)

JRuby benchmarking:

* Use Java 6+

Java 6 is much faster than Java 5. Java 7 is faster still in many cases.

* Pass --server if -v output says "client" VM

The Hotspot JVM has two modes: "server" and "client". The "server" VM
does runtime-profiled optimizations and can be 2x or more faster than
the "client" VM.

Results on my system (core 2 duo 2.66GHz):

ruby 1.9.2dev (2009-07-23 trunk 24248) [i386-darwin9.7.1]
Rehearsal -------------------------------------------------------------
1 thread, 1 interpreter 3.370000 0.020000 3.390000 ( 3.516261)
2 threads, 1 interpreter 3.330000 0.020000 3.350000 ( 3.412460)
2 threads, 2 interpreters 0.010000 0.000000 3.590000 ( 2.133313)
--------------------------------------------------- total: 10.330000sec

user system total real
1 thread, 1 interpreter 3.350000 0.010000 3.360000 ( 3.415410)
2 threads, 1 interpreter 3.350000 0.020000 3.370000 ( 3.423560)
2 threads, 2 interpreters 0.000000 0.010000 3.630000 ( 2.302965)

jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2009-10-30 eaa9e7f) (Java
HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]
Rehearsal ------------------------------------------------------------
1 thread, 1 interpreter 2.373000 0.000000 2.373000 ( 2.373000)
2 threads, 1 interpreter 1.733000 0.000000 1.733000 ( 1.733000)
--------------------------------------------------- total: 4.106000sec

user system total real
1 thread, 1 interpreter 2.145000 0.000000 2.145000 ( 2.145000)
2 threads, 1 interpreter 1.840000 0.000000 1.840000 ( 1.840000)

It would probably improve more with a longer run, but this is pretty good.

- Charlie

From: James M. Lawrence on
Charles Nutter wrote:
>
> JRuby benchmarking:
>
> * Use Java 6+
>
> Java 6 is much faster than Java 5. Java 7 is faster still in many cases.
>
> * Pass --server if -v output says "client" VM

I didn't consider it because the behavior I showed looks wrong for
either Java 5 or Java 6 in either client or server mode. Indeed I
obtained the same results with Java 6 Server VM.

A computation split into two parallel threads takes more time than the
same computation with one thread. 'top' reports 185% CPU and 100% CPU
respectively.

I was not concerned with comparing MRI and jruby. MRI was a baseline
to demonstrate that Pure's parallelism was working in the first place.

I was unable to find your eaa9e7f commit so I grabbed the latest
master branch.

jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2009-11-02 55366a1) (Java
HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]

Core 2 Duo 1.83GHz; all apps closed except Terminal; benchmarks made
without 'top' running.

Rehearsal ------------------------------------------------------------
1 thread, 1 interpreter 3.422000 0.000000 3.422000 ( 3.422000)
2 threads, 1 interpreter 4.008000 0.000000 4.008000 ( 4.008000)
--------------------------------------------------- total: 7.430000sec

user system total real
1 thread, 1 interpreter 2.942000 0.000000 2.942000 ( 2.942000)
2 threads, 1 interpreter 3.595000 0.000000 3.595000 ( 3.595000)

Results are the same with Pure removed:

require 'benchmark'

def left
(1..10_000_000).inject(0) { |acc, n| acc + n }
end

def right
(1..10_000_000).inject(0) { |acc, n| acc + n }
end

Benchmark.bmbm { |bm|
bm.report("1 thread") {
Thread.new {
[left, right]
}.value
}
bm.report("2 threads") {
[
Thread.new { left },
Thread.new { right },
].map { |t| t.value }
}
}

Rehearsal ---------------------------------------------
1 thread 6.726000 0.000000 6.726000 ( 6.726000)
2 threads 7.478000 0.000000 7.478000 ( 7.478000)
----------------------------------- total: 14.204000sec

user system total real
1 thread 6.636000 0.000000 6.636000 ( 6.636000)
2 threads 8.196000 0.000000 8.196000 ( 8.196000)

--
Posted via http://www.ruby-forum.com/.

From: Charles Oliver Nutter on
On Mon, Nov 2, 2009 at 11:47 AM, James M. Lawrence
<quixoticsycophant(a)gmail.com> wrote:
> Rehearsal ------------------------------------------------------------
> 1 thread, 1 interpreter    3.422000   0.000000   3.422000 (  3.422000)
> 2 threads, 1 interpreter   4.008000   0.000000   4.008000 (  4.008000)
> --------------------------------------------------- total: 7.430000sec
>
>                               user     system      total        real
> 1 thread, 1 interpreter    2.942000   0.000000   2.942000 (  2.942000)
> 2 threads, 1 interpreter   3.595000   0.000000   3.595000 (  3.595000)

This does not match my results. Are you sure both cores are being used?

> Rehearsal ---------------------------------------------
> 1 thread    6.726000   0.000000   6.726000 (  6.726000)
> 2 threads   7.478000   0.000000   7.478000 (  7.478000)
> ----------------------------------- total: 14.204000sec
>
>                user     system      total        real
> 1 thread    6.636000   0.000000   6.636000 (  6.636000)
> 2 threads   8.196000   0.000000   8.196000 (  8.196000)

Also does not match my results:

Rehearsal ---------------------------------------------
1 thread 4.795000 0.000000 4.795000 ( 4.739000)
2 threads 3.072000 0.000000 3.072000 ( 3.072000)
------------------------------------ total: 7.867000sec

user system total real
1 thread 4.081000 0.000000 4.081000 ( 4.081000)
2 threads 2.966000 0.000000 2.966000 ( 2.966000)

I'd love to hear from others trying this benchmark, since the results
you've given don't match my results on any of the systems I'm testing.

- Charlie

From: James M. Lawrence on
Charles Oliver Nutter:
> This does not match my results. Are you sure both cores are being used?

I am certain. I tried to head off this question when I said: all
applications are closed save Terminal; top reports 0% CPU usage
beforehand; top reports java at 100% CPU during the 1-thread test;
185% CPU during the 2-thread test; top was not running during the
posted benchmarks.

I should also mention this is my mp3 player co-opted into a Mac dev
machine--a Mac Mini. Maybe Java balks at the specs. System Profiler:

Model Name: Mac mini
Model Identifier: Macmini2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 1.83 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 2 MB
Memory: 1 GB
Bus Speed: 667 MHz

Darwin jl.local 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01
PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

It would be nice to match jruby versions. Can you try master 55366a1
or push eaa9e7f to a remote branch?

[quoting the rest in full due to ruby-forum gateway breakage]

> Also does not match my results:
>
> Rehearsal ---------------------------------------------
> 1 thread 4.795000 0.000000 4.795000 ( 4.739000)
> 2 threads 3.072000 0.000000 3.072000 ( 3.072000)
> ------------------------------------ total: 7.867000sec
>
> user system total real
> 1 thread 4.081000 0.000000 4.081000 ( 4.081000)
> 2 threads 2.966000 0.000000 2.966000 ( 2.966000)
>
> I'd love to hear from others trying this benchmark, since the results
> you've given don't match my results on any of the systems I'm testing.
>
--
Posted via http://www.ruby-forum.com/.