"We Never Use Assembly Language" [ASM]

Prev: "We Never Use Assembly Language"
Next: Multiple Statements per Line

From: o//annabee on 16 Mar 2006 11:21

P? Thu, 16 Mar 2006 16:57:09 +0100, skrev randyhyde(a)earthlink.net
<randyhyde(a)earthlink.net>:

> I see you've got your head buried in the same whole in the sand that
> Rene does. Ignoring reality just because you don't like it is a sign of
> insanity, you know?

Well. Dont know, but I think its spelled "Hole" anyway. "Whole" is meaning
more like "Complete".

Which reminds me. Where can we download the 6 non-trival masterful
applications you have written in assembly? I looked at at webster, but
couldnt find anything but christian resources, which I though wore odd.
And some book reviews. I keep asking incase you holyness missed that post.

>> That is only you. So no problem.
>
> So why are you complaining if you actually believe this?

I m not complaining. But I think when you have such wivid imagination, you
should make this daydream more realistic. Unless of course this "breaking
others code component" is important to you.

> Cheers,
> Randy Hyde
>

From: randyhyde@earthlink.net on 16 Mar 2006 11:57

o//annabee wrote:
> På Wed, 15 Mar 2006 17:49:51 +0100, skrev randyhyde(a)earthlink.net
> <randyhyde(a)earthlink.net>:
>
> > Closer, but still no cigar.
> > Care to try again?
>
> I am glad you noticed the error. I have corrected the code.

Still errors, see below.

>
> When doing the alignment properly, we get a speedup, of both loops.
> As you can see, yours is still slower.
> Also, as a by-product I get actually IDENTICAL timings for several runs.
> Which I havent really seen before.

Well, I don't have your particular CPU, but when I run this code on a
PIV, here are the results that I get:

My code Your Code
8418 c76c
8600 c76c
8590 c7d0
8568 c9b4
8648 c970

Your version seems to be about 50% slower than mine on a PIV. Again, I
don't have access to your CPU, so can I can't verify your numbers, but
if you look at the actual code generated by RosAsm for the two
routines:

My Version, disassembled from your RosAsm code:

start:
..text:00404000 cpuid
..text:00404002 rdtsc
..text:00404004 push eax
..text:00404005 mov ecx, dword_403000
..text:0040400B xor eax, eax
..text:0040400D jecxz short loc_404010
..text:0040400F nop
..text:00404010
..text:00404010 loc_404010: ; CODE XREF:
..text:0040400Dj
..text:00404010 ;
..text:00404013j
..text:00404010 add eax, ecx
..text:00404012 dec ecx
..text:00404013 jnz short loc_404010
..text:00404015 rdtsc
..text:00404017 pop ebx
..text:00404018 sub eax, ebx
..text:0040401A int 3 ; Trap to
Debugger

Your version, disassembled from your RosAsm code:

cpuid
..text:0040401D rdtsc
..text:0040401F push eax
..text:00404020 mov ecx, 2710h
..text:00404025 xor eax, eax
..text:00404027 jmp short loc_404030
..text:00404027 ;
---------------------------------------------------------------------------
..text:00404029 align 8
..text:00404030
..text:00404030 loc_404030: ; CODE XREF:
..text:00404027j
..text:00404030 ;
..text:00404038j
..text:00404030 cmp ecx, 0
..text:00404033 jbe short loc_40403A
..text:00404035 add eax, ecx
..text:00404037 dec ecx
..text:00404038 jmp short loc_404030
..text:0040403A ;
---------------------------------------------------------------------------
..text:0040403A
..text:0040403A loc_40403A: ; CODE XREF:
..text:00404033j
..text:0040403A rdtsc
..text:0040403C pop ebx
..text:0040403D sub eax, ebx
..text:0040403F int 3 ; Trap to
Debugger

Well, the difference becomes pretty obvious. What you're trying to tell
me is that a loop with 50% more instructions, that is,

..text:00404030 cmp ecx, 0
..text:00404033 jbe short loc_40403A
..text:00404035 add eax, ecx
..text:00404037 dec ecx
..text:00404038 jmp short loc_404030

versus

..text:00404010 loc_404010: ; CODE XREF:
..text:0040400Dj
..text:00404010 ;
..text:00404013j
..text:00404010 add eax, ecx
..text:00404012 dec ecx
..text:00404013 jnz short loc_404010

is actually *faster*? Hmmm... I sure seems like *my* measurements are
a lot more intuitive. That is, the code with 50% more instructions
(your's) runs 50% slower. That AMD CPU is quite amazing indeed, if
this is really the case.

>
> The diffrence _is_ in favor of my code. Whereas in the orginal post, you
> claimed mine (or rather RosAsm's) while macro to be slow, because it has
> the test at the top. Even if the timings wore in your favor, would not
> change the fact that the RosAsm macro does very well.

My measurements, and an inspection of the actual code that RosAsm
generates behind your back, seem to bear out my original claims.

>
> So even it is a small point, it proves you boasted out some definite
> error, in your attempt to scare people from using the RosAsm macros. This
> test definitly prove that there are no reason not to use the RosAsm While
> macro.

If you look at the two pieces of disassembled code, I think that this
alone should scare people away from using macros if they want the
fastest possible code. And, btw, I want to emphasize *macros*, not
*RosAsm macros*. You get the same problem whether the macro was written
for RosAsm, MASM, HLA, FASM, or whatever.

What I *have* claimed is that MASM's implementation of "if" statements
is *better* than the macros that come with RosAsm. This is because MASM
is a bit smarter about this stuff. You will also discover that HLA's
"while" loop generates the "test for loop at the end" rather than the
same code that RosAsm generates. Now perhaps that fails to be better
code on your particular AMD CPU, I cannot verify that as I do not have
access to that CPU. But an inspection of the code and measurements that
I've made suggest that putting the branch at the bottom of the loop and
removing an extra jump is *much* better coding indeed.

>
> Actually. since the timings are both steady, and the diffrence very small,
> this might even be due to an initial payment in your routine, because of a
> few misinterpeted jumps by the CPU.

Yes, not to mention your failure to serialize before the second rdtsc
in each example. But that still doesn't explain the 50% difference that
*I* see on a PIV. And the difference I see is right in line with the
number of instructions. Imagine that.

>
> hmm. I do not know why but seems a few branch mispredictions may be at the
> cause of this. And they happen in _your_ code. Not in the RosAsm While
> macro. ----

On *your* CPU, things like pairable instructions and branch prediction
*could* be why the two loops execute in a similar amount of time. It's
not like the PIV is a paragon of great microcoding. But it *really*
smells like you've made an error somewhere. I'd suggest that you try
putting several additional instructions in the loop and see what
happens then. That would counter any bizzarre instruction pairing
phenomenon that is going on.

>
> This was tested on an AMD 64, 3700+ running win2000,

And mine was tested on a PIV running XP.

> TestProc:
>
> cpuid
> rdtsc | push eax
> mov ecx D$n
> xor eax eax

;You've just discovered the problem with
; relative local labels here. Do you see
; the problem in this code? This is
; *exactly* why I refused to put this
; lame form of local labels into HLA.
; Earlier assemblers I'd written
; had relative local labels and I
; saw this problem *far* too often.

> jecxz L0>
> Align 16
> L0:
>
> add eax ecx
> dec ecx
> jnz L0<
> L0:
>
> rdtsc | pop ebx ;Is this rdtsc serialized?
> sub eax ebx
> int 3
> ;/4EBA
>
>
> cpuid
> rdtsc | push eax
> mov ecx 10000 ;This is different from above.
> xor eax eax
> Align 16
> while ecx > 0
> add eax ecx
> dec ecx
> End_While
> rdtsc | pop ebx ; is this rdtsc serialized?
> sub eax ebx
> int 3
> ;4E45
>

Another issue- Caching effects are not allowed for in this code. The
way you executed it, by running and the stopping, guarantees that the
code will *not* be in the cache when you run it. What you should
*really* do is run each code fragment in a loop a couple of times and
then use the last measurement. That way, everything is in cache and
you'll get more realistic readings. Indeed, the reason your timings may
be so close is because the memory subsystem on your PC is sub-par and
what you're really measuring is the amount of time it takes to read
data from main memory.
Cheers,
Randy Hyde

From: randyhyde@earthlink.net on 16 Mar 2006 12:05

sevagK wrote:
> >
> > I am not sure 9 is the true limit, but if it isnt it should be.

Yes. As is usual, 'bee, try and turn a limitation of RosAsm into some
bizzare kind of advantage.

You're the one who keeps telling us that assembly language removes all
the limitations. Why would you accept an assembler that places
limitations on you?

>
> A couple of questions:
>
> Is this limit just for if..endif macros nesting or does the limit
> include nesting a combination of macros?
> eg:
> if...
> if...
> while...
> forever....
> if..
> ...
> endif
> endfor
> endwhile
> endif
> endif

Sevag-
No. Not the way the current RosAsm macros are written. Rene has use a
*different* local symbol for each set of macros, e.g., something like
"I" for IF, "W" for WHILE, etc.

However, the story that you're missing is that because RosAsm doesn't
support true local symbols in macros, you can get into a *lot* of
trouble if you try something like this:

if eax > 0

cmp ebx 1
jne >I0
...
I0:
...
endif

Unfortunately, the IF macros have already defined I0 (which might be at
the endif clause). Alas, because of the nature of RosAsm macros (no
local macro symbols), the IF statement above transfers control to the
I0 label rather than to the endif. Rene will tell you that it's up to
the programmer to realize this and not use "I" labels, I say that's a
crock. None of the other assemblers have this problem.

>
>
> If one macro uses a variable &&0, and another macro also uses that same
> symbol, does Rosasm generate unique local symbols for each macro that
> uses the &&0 variable or does the whole thing get arsed when some macro
> nested somewhere in another macro overwrites an &&x variable?

&&x are global entities. There are no local macro symbols. And, in
particular, there is no way to correctly carry over information (such
as symbol names) from one macro to another. This is why Rene suggests
that people use the if, .if, ..if, ...if, etc. scheme. Because his
macro system cannot figure out if you're missing an endif or have an
extra endif, etc., when using the same if/else/endif scheme everyone
else does.
Cheers,
Randy Hyde

From: randyhyde@earthlink.net on 16 Mar 2006 12:10

o//annabee wrote:
>
> > Rene, you should learn the difference between "abitrary" and "up to 10
> > levels".
> > As I said in my original post, 10 levels for an IF statement is
> > probably sufficient for most people, but it is *not* arbitrary.
>
> arbitrary, in this context is insane.

Perhaps to you.
But arbitrary in the sense of being able to write a recursive macro
that handles an abitrary depth (subject to reasonable memory
allocation, of course) is not insane.

The fact that *you* haven't reached the level where you can imagine
what a recursive macro might be used for doesn't mean that the whole
world is stuck at your level. Gee, you seem to have trouble figuring
out what conditional assembly is used for (or so I assume by reading
your posts elsewhere). I suspect you wouldn't even know what a
recursive macro is, much less what you would use it for.

>
> > And
> > although 10 levels may be sufficient for an IF, it most certainly is
> > not sufficient for other applications.
>
> Give an example.

Recursive macros. Such as the pattern matching macros in the HLA
Standard Library.

>
> > The fact that assemblers like
> > MASM, TASM, and HLA can handle an abitrary depth (well, subject to
> > internal memory constraints) and RosAsm cannot suggests that RosAsm is
> > less powerful than these other assemblers in this respect.
>
> Give an example of arbitrary levels please.

Pattern matching macros in the HLA standard library.
Cheers,
Randy Hyde

From: Betov on 16 Mar 2006 12:15

o//annabee <fack(a)szmyggenpv.com> ?crivait news:op.s6ik2cnqce7g4q(a)bonus:

> Which reminds me. Where can we download the 6 non-trival masterful
> applications you have written in assembly? I looked at at webster, but
> couldnt find anything but christian resources, which I though wore
> odd. And some book reviews. I keep asking incase you holyness missed
> that post.

Courage: Kill him ! Kill him ! At the end, he will point
you to the pathetic "HLA Advantures" game, that we suffer
since, now,... how many _years_ exactly?...

:]]]]]

Betov.

< http://rosasm.org >

First | Prev | Next | Last
Pages: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Prev: "We Never Use Assembly Language"
Next: Multiple Statements per Line