From: Robert on
On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote:

>In article <regif3d0b34nreavsckap09omqjhptnik8(a)4ax.com>,
>Robert <no(a)e.mail> wrote:
>>On Tue, 25 Sep 2007 09:25:04 +0000 (UTC), docdwarf(a)panix.com () wrote:
>>

>>>Now, Mr Wagner... is one to expect another dreary series of repetitions
>>>about how mainframers who said that indices were faster than subscripts
>>>were, in fact, right about something?
>>
>>I expected I-told-you-so from the mainframe camp.
>
>It may be interesting to see if you get one; my point - and pardon the
>obscure manner of its making - was that you made a series of repetitions
>which a demonstration has disproved and it may be interesting to see if an
>equally lengthy series of repetitions follows... or if it just Goes Away
>until you next get an idea about something... and begin another, similar
>series of repetitions.

We saw that subscript and index run at the same speed on three CPU families -- HP PA
(SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly Intel. I am
confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and SPARC, based on
tests I ran a few years ago. Thus the generalizaton. I was surprised to see zSeries did
not follow the pattern of the others.

My previous idea, that memory alignment no longer matters, turned out to be wrong. It does
matter on modern RISC machines.

There's a good chance I'll get another idea.
From: William M. Klein on
"Robert" <no(a)e.mail> wrote in message
news:uq8jf3pd3rq48eqio0hdtqo172nv2c16is(a)4ax.com...
> On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote:
<snip>
> We saw that subscript and index run at the same speed on three CPU families --
> HP PA
> (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly
> Intel. I am
> confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and
> SPARC, based on
> tests I ran a few years ago. Thus the generalizaton. I was surprised to see
> zSeries did
> not follow the pattern of the others.
>

Robert,
What you CONTINUE to seem to miss is that for MOST "performance"
recommendations from compiler vendors, it is their INTERNAL knowledge of the
code generated by their compiler and NOT the machine speed for specific
instructions.

You have repeatedly refered to "load" vs "multiply" machine instructions in the
discussionn of subscripts vs indexes - and yet have NOT posted the "generated"
code from a VARIETY of compilers to show that this is actually the "only" (or
even major) differences between the generated code.

I cannot tell you why there are as many differences as therree are in generated
code, but I do know from work that I have done over the years that there is
(was - and always will be) surprising differences for the "most simple" COBOL
source code. Often this has to do with options/features that are NOT used in
the specific code sequences - but are possible. Different compilers (and
especially different optimizer) simply do not always create the "expected - if
*I* were the compiler" object code.

So far, it does appear (to me) that Micro Focus (various platforms) *and* the HP
OpenVMS comiler create object code with "similar" performance (regardless of
platform) while IBM (at least on one mainframe) creates very different code
sequences. I don't know if Daniel has run a test for Unisys (given the
Standard-conforming problems with the Speed2 program) or if anyone else has run
with different compilers. Again, I would strongly GUESS that differences have
to do with generated object code and NOT speed of specific machine instructions.
But this is a guess without evidence to "prove" it.


--
Bill Klein
wmklein <at> ix.netcom.com


From: Pete Dashwood on


"Robert" <no(a)e.mail> wrote in message
news:uq8jf3pd3rq48eqio0hdtqo172nv2c16is(a)4ax.com...
> On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote:
>
>>In article <regif3d0b34nreavsckap09omqjhptnik8(a)4ax.com>,
>>Robert <no(a)e.mail> wrote:
>>>On Tue, 25 Sep 2007 09:25:04 +0000 (UTC), docdwarf(a)panix.com () wrote:
>>>
>
>>>>Now, Mr Wagner... is one to expect another dreary series of repetitions
>>>>about how mainframers who said that indices were faster than subscripts
>>>>were, in fact, right about something?
>>>
>>>I expected I-told-you-so from the mainframe camp.
>>
>>It may be interesting to see if you get one; my point - and pardon the
>>obscure manner of its making - was that you made a series of repetitions
>>which a demonstration has disproved and it may be interesting to see if an
>>equally lengthy series of repetitions follows... or if it just Goes Away
>>until you next get an idea about something... and begin another, similar
>>series of repetitions.
>
> We saw that subscript and index run at the same speed on three CPU
> families -- HP PA
> (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly
> Intel. I am
> confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and
> SPARC, based on
> tests I ran a few years ago. Thus the generalizaton. I was surprised to
> see zSeries did
> not follow the pattern of the others.

Well, Robert, I don't want to shake your confidence, and I deliberately
refrained from posting these results (I felt you were getting enough
flak...), but reconsidered when I saw your statement above :-)

Here are the results of "Speed2" from a genuine Intel Celeron Core 2 Duo
Vaio AR250G notebook with 2 GB of main memory, running under Windows XP with
SP2 applied, using your code (with the following amendments: all asterisks
and comments removed, exit perform cycle removed), compiled with no options
other than the defaults (which includes "Optimize"), with the Fujitsu
NetCOBOL version 6 compiler, compiled to .EXE:

Null test 1
Index 3
Subscript 25
Subscript comp-5 3
Index 1 3
Subscript 1 22
Subscript 1 comp-5 3

As you can see, indexing is between 7 and 8 times more efficient than
subscripting, unless you use optimized subscripts, in this environment.

(I was surprised that the figures are 3 times faster than the z/OS mainframe
figures posted by Charlie...:-) I've had this machine for around a year now,
bought it at Fry's in L.A a few days after Core 2 became available in the
marketplace, and have become blase about the speed of it. A few days ago I
was running a test on a P4 notebook that had to create a couple of million
rows on an ACCESS database. It ran for 20 minutes, then closed down due to a
thermal cutoff. (If the CPU runs at or near 100% for an extended period, the
machine closes down :-) It was made in Germany and I bought it in England.
It is an annoying, although valuable feature of this machine, that it
protects itself. Anyway, I transferred the job to the Vaio and tried again:
It never even broke into a sweat; no fans came on and the job was done in 7
minutes...(It does NOT have a high speed disk, runs at 5400 RPM but is well
buffered, and they claim it was the first disk drive for a notebook that had
200GB))

It is things like this that make me wonder why we even bother about
performance and have heated discussions about things like indexes and
subscripts, when the technology is advancing rapidly enough to simply take
care of it.

More importantly, I hope Robert you will accept that generalizations about
performance simply don't stack up. Sometimes the most unexpected results are
obtained. The only reliable way to check performance is empirically (I give
you credit for doing that, and publishing results even when they didn't tell
you what you wanted to hear) and, outside of test results, everything else
should be accorded the same degree of credibility that we accord glossy
marketing brochures ("MIGHT be true...but the person presenting it has a
definite axe to grind :-))

>
> My previous idea, that memory alignment no longer matters, turned out to
> be wrong. It does
> matter on modern RISC machines.
>
> There's a good chance I'll get another idea.

Let's hope it won't be lonely... :-)

Pete
--
"I used to write COBOL...now I can do anything."


From: Robert on
On Wed, 26 Sep 2007 02:29:59 GMT, "William M. Klein" <wmklein(a)nospam.netcom.com> wrote:

>"Robert" <no(a)e.mail> wrote in message
>news:uq8jf3pd3rq48eqio0hdtqo172nv2c16is(a)4ax.com...
>> On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote:
><snip>
>> We saw that subscript and index run at the same speed on three CPU families --
>> HP PA
>> (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly
>> Intel. I am
>> confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and
>> SPARC, based on
>> tests I ran a few years ago. Thus the generalizaton. I was surprised to see
>> zSeries did
>> not follow the pattern of the others.
>>
>
>Robert,
> What you CONTINUE to seem to miss is that for MOST "performance"
>recommendations from compiler vendors, it is their INTERNAL knowledge of the
>code generated by their compiler and NOT the machine speed for specific
>instructions.

If you prefer, treat the compiler and CPU as a black box. The results show the relative
speed of changing one variable, such as index versus subscript, and keeping everything
else unchanged.

>You have repeatedly refered to "load" vs "multiply" machine instructions in the
>discussionn of subscripts vs indexes - and yet have NOT posted the "generated"
>code from a VARIETY of compilers to show that this is actually the "only" (or
>even major) differences between the generated code.

Server Express will not show generated code for HP PA, using options ASM or ASMLIST. It
will for other platforms such as SPARC.

>So far, it does appear (to me) that Micro Focus (various platforms) *and* the HP
>OpenVMS comiler create object code with "similar" performance (regardless of
>platform) while IBM (at least on one mainframe) creates very different code
>sequences. I don't know if Daniel has run a test for Unisys (given the
>Standard-conforming problems with the Speed2 program) or if anyone else has run
>with different compilers. Again, I would strongly GUESS that differences have
>to do with generated object code and NOT speed of specific machine instructions.
>But this is a guess without evidence to "prove" it.

As you requested, I compiled speed2 with ENTCOBOL and NOMF at the end of the options
already there. Getting it to compile required three changes:
Change To
comp-5 binary
goback stop run
exit perform cycle continue

It did not object to single quotes nor free format, which don't affect performance.
Removing exit perform cycle made each of the tests run almost twice as fast, but their
relative speeds did not change.

>I cannot tell you why there are as many differences as there are in generated
>code, but I do know from work that I have done over the years that there is
>(was - and always will be) surprising differences for the "most simple" COBOL
>source code. Often this has to do with options/features that are NOT used in
>the specific code sequences - but are possible. Different compilers (and
>especially different optimizer) simply do not always create the "expected - if
>*I* were the compiler" object code.

I ran MANY speed tests in the '80s and early '90s, for Computer Language magazine, which
is long gone. I was comparing the relative speed of LANGUAGES as well as compilers and
machines. My technique was to take a single algorithm, The Sieve, and write it once in
well-written assembly language that used all the 'tricks' the compiler could have used,
write it a second time in the subject language using all of its features (not a
line-for-line translation). Then I expressed the efficiency of the compilerand language as
the ratio of its speed divided by the hypothetical speed of the machine. On one test, I
was stunned when GCC and another hot C compiler BEAT my hand-crafted assembly language.
Their generated code was wildly non-intuitive.

I used to have results from many more languages than shown below -- ALGOL, LISP, FORTH,
etc. Here are some notes I did keep. As you will see, GOOD compiler-generated code used to
run 3-4 times slower than assembly language while p-code ran 50-250 times slower. By the
early '90s, the ratio had dropped below 2.

Note how the original 4.77 MHz PC running Cobol beat an entry-level mainframe running
Cobol. It also beat the mainframe running an IO bound program which pit VSAM using 3310
against Realia's indexed file system using a slow 30 MB hard drive. When I saw that, I
took non-mainframe computers seriously. I started writing high-volume production
applications for the 'personal computer'.

Sieve Benchmark Times, mostly circa. 1984

Machine Language Time Ratio to asm
IBM 4331-2 S/370
Cobol 14 14
Cobol w/Capex optimizer 11 11
PL/I 4 4
Assembly 1

TRS80-2 Z80-4
Cobol RM 3390 242
Basic Int Integer 1210 86
Assembly 14

TI99/4A TMS9900
Basic Int Floating 3960 566
Assembly 7

TRS-2000 Intel 8086-8
Basic Int 890 445
Assembly 2

IBM PC Intel i8088-4.7
Cobol CIS 1700 425
Basic Int MS 1330 333
Cobol mbp 220 55
Pascal Turbo 2 28 7
Pascal Turbo 4 24 6 '90
C C-Systems 23 6
Fortran MS 3 20 5 '90
Basic QBasic 4 18 5 '92
Pascal Turbo 1 16 4
Basic BASCOM 1 16 4
C DeSmet 16 4 '90
C BC++ 3 12 3 '92
Cobol Realia 12 3
Fortran MS 4 11 3 '92
Assembly 4

PC Intel i486-33 (run in 1993)
xBase Force .881 9.3
Cobol Realia .167 1.8
Fortran MS 4 .112 1.2
Assembly .095 1.0
C GCC .089 .93
C EMX, OS/2 32bit .082 .86
Assembly w/loop unrolled .058 .61

Here's the Sieve in Realia Cobol, which had inline perform in 1984.

* Seive progrm using 1974 COBOL
* Benchmark. Finds primes between 3 and 16384
*
IDENTIFICATION DIVISION.
PROGRAM-ID. PRIMES.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SOURCE-COMPUTER. IBM-PC.
OBJECT-COMPUTER. IBM-PC.
DATA DIVISION.
WORKING-STORAGE SECTION.
77 TOTAL-PRIME-COUNT PIC S9(4) COMP.
77 PRIME PIC S9(4) COMP.
77 PRIME-MULTIPLE PIC S9(4) COMP.
01 PRIME-FLAGS-GROUP.
05 PRIME-FLAG PIC X OCCURS 8191 TIMES
INDEXED BY PRIME-INDEX.
01 FILLER.
05 TIME-AREA.
10 HH PIC 99.
10 MM PIC 99.
10 SS PIC 99.
10 HUN PIC 99.
05 MILLI-SECONDS PIC S9(8) COMP.
05 BGN-MILLI-SECONDS PIC S9(8) COMP.
05 DISPLAY-MILLI-SECONDS PIC Z(8).

PROCEDURE DIVISION.
PRIME-COUNT-ROUTINE.
*
* Compute the primes 10 times to increase timing accuracy.
* Indicate all odd numbers are potential primes
* Indicate no primes have been found.
*
PERFORM CALC-MILLI-SECONDS
MOVE MILLI-SECONDS TO BGN-MILLI-SECONDS.
PERFORM 10 TIMES
MOVE ALL '1' TO PRIME-FLAGS-GROUP
MOVE ZERO TO TOTAL-PRIME-COUNT
SET PRIME-INDEX TO 1
PERFORM COUNT-PRIMES
END-PERFORM
DISPLAY 'COUNT:' TOTAL-PRIME-COUNT
PERFORM CALC-MILLI-SECONDS.
SUBTRACT BGN-MILLI-SECONDS FROM MILLI-SECONDS
GIVING DISPLAY-MILLI-SECONDS.
DISPLAY 'Elapsed time was' DISPLAY-MILLI-SECONDS
' milliseconds'
STOP RUN.
*
* For each number which has not been flagged as a multiple
* of an earlier prime, indicate all of its multiples in
* the range being evaluated are not primes. Note that
* PRIME-FLAG(1) represents the integer 3, PRIME-FLAG(n)
* represents the integer 2n+1.
* Note that PRIME-MULTIPLE technically can only contain
* values through 9999, but will have values up to 24574.
* This works since ADD does not truncate,
* S9(4) COMP fields can be -32768 through 32767.
*
COUNT-PRIMES.
SEARCH PRIME-FLAG VARYING PRIME-INDEX
WHEN PRIME-FLAG (PRIME-INDEX) IS NOT EQUAL TO ZERO
ADD 1 TO TOTAL-PRIME-COUNT
SET PRIME-MULTIPLE TO PRIME-INDEX
ADD PRIME-MULTIPLE PRIME-MULTIPLE 1 GIVING PRIME
ADD PRIME TO PRIME-MULTIPLE
SET PRIME-INDEX UP BY 1
PERFORM UNTIL PRIME-MULTIPLE IS GREATER THAN 8191
MOVE ZERO TO PRIME-FLAG (PRIME-MULTIPLE)
ADD PRIME TO PRIME-MULTIPLE
END-PERFORM
GO TO COUNT-PRIMES.
CALC-MILLI-SECONDS.
ACCEPT TIME-AREA FROM TIME.
MULTIPLY HH BY 60 GIVING MILLI-SECONDS.
ADD MM TO MILLI-SECONDS.
MULTIPLY 60 BY MILLI-SECONDS.
ADD SS TO MILLI-SECONDS.
MULTIPLY 100 BY MILLI-SECONDS.
ADD HUN TO MILLI-SECONDS.
MULTIPLY 10 BY MILLI-SECONDS.

From: Arnold Trembley on
Pete Dashwood wrote:
> (snip)
>
> Here are the results of "Speed2" from a genuine Intel Celeron Core 2 Duo
> Vaio AR250G notebook with 2 GB of main memory, running under Windows XP with
> SP2 applied, using your code (with the following amendments: all asterisks
> and comments removed, exit perform cycle removed), compiled with no options
> other than the defaults (which includes "Optimize"), with the Fujitsu
> NetCOBOL version 6 compiler, compiled to .EXE:
>
> Null test 1
> Index 3
> Subscript 25
> Subscript comp-5 3
> Index 1 3
> Subscript 1 22
> Subscript 1 comp-5 3
>
> As you can see, indexing is between 7 and 8 times more efficient than
> subscripting, unless you use optimized subscripts, in this environment.

Here are the results of "Speed2" using a 2.60 GHz Pentium 4 with 512
MB of main memory, running under Windows XP with SP2 applied, using
Robert's code with EXIT PERFORM CYCLE commented out, compiled with a
1990 education version of Realia COBOL (equivalent to Realia 3):

Null test 5
Index 2
Subscript 8
Subscript comp-5 8
Index 1 2
Subscript 1 7
Subscript 1 comp-5 7

Directory of C:\dosboxc\rccob

09/25/2007 11:09 PM 14,438 SPEED2.ASM
09/25/2007 11:05 PM 5,949 speed2.cob
09/25/2007 11:09 PM 25,134 SPEED2.EXE
09/26/2007 12:21 AM 259 speed2.tst
4 File(s) 45,780 bytes
0 Dir(s) 59,246,034,944 bytes free

The generated assembler is available, if anyone is interested.

Kind regards,

--
http://arnold.trembley.home.att.net/