From: Betov on 9 Feb 2006 05:15
"Dragontamer" <prtiglao(a)gmail.com> ?crivait
> BU_ASM->Introduction to the Assembly Rebirth by Rene
>=========The sub-line of such claims is always that Assembly is just
> producing inline routines in order to improve the HLLs productions
> performance quality and to overcome their limitations, but that
> writing Applications in Assembly would be a strange idea coming from
> guys unable to use ''the proper tool for the proper thing''.
> These ideas are basically unfounded and stupid from every point of
> *If Assembly interest is the speed of produced Code, with the actual
> Processors performances, the argument is very weak, and the Processors
> evolutions will go on and on, pushing to stupid and dirty programming
> *If Assembly interest is at the size of produced Code, the argument is
> completely ridiculous, given the actual Hard Disks capacities and
> given the modern OSes Memory Manager's performances.
> But all that aside, what is it? In your last post, you imply
> that speed is a major advantage for writing in assembly.
> While in BU_ASM, you claim it it is a weak argument.
> And I know you say time and time again that you don't
> write in assembly for speed, but for readability.
> So which is it?
Speed is given for "almost free" to the Assembly Programmers
because they have a direct access to Strategy Optimization,
that is the only Optimization that matters.
As opposed to the ones playing the fool with HLLs and Code
Level Optimized Routines, what i claim is based on the
materials, i am giving away, that is, RosAsm, which is the
fastest of the actual Assemblers, whereas i never did any
kind of Code Level Optimization effort.
So, in order the significative points are:
* Readability, because if you can't read a Source you loose
control on the levels, faster than you wrote it.
* Strategy Optimization. With that one, speed gain is not
a matter of saving, say, 5% of time. It is a matter of
suppressing the bottles' necks, by human logic. What is not
theorically impossible to achieve in HLLs, but what is
de-facto almost never found in HLLs, [They use Languages
that _HIDE_. Period], for the very simple reason that 99%
of the HLLers have no idea about what they are doing.
* From that point, Asm speed is given for free, and is no more
a concern for the Assembly programmers, and when you see guys
posting messages about code speed-up, you can immidiately
conclude that these are not Assembly Programmer.
< http://rosasm.org >
From: Betov on 9 Feb 2006 05:17
"randyhyde(a)earthlink.net" <randyhyde(a)earthlink.net> ?crivait
> if you want to waste another chuck of your life
< http://rosasm.org >
From: Betov on 9 Feb 2006 05:21
"?a\\/b" <al(a)f.g> ?crivait news:8nslu1lhlso3abr6vgrdmcs8nnc53cbq8g(a)4ax.com:
> i think like Betov: 25 instruction are enough
One other day, i will implement an Instructions counter
in RosAsm Disassembler...
< http://rosasm.org >
From: firstname.lastname@example.org on 9 Feb 2006 12:51
> >If you have a "minimal x86 cpu", why not just use a C compiler?
> because i like more assembly than C, because i think there are many
> routines and functions that have to be written in assembly (e.g.
> numerical routines, OS routines, games etc)
But in those very areas you mention, far *more* routines and functions
are written in HLLs, like C. There are, of course, several special
cases that require assembly or benefit from coding in assembly, but in
the real world most of the code you mention is written in C or some
other HLL. So the question remains, why not just use a C compiler and
drop into assembly for those *few* routines that need it? You have, of
course, answered the question. The answer is "because I like assembly
more than C" Period. Stop there. The rest of your statement doesn't
support the answer you give.
> >In fact, I write in assembly to use the specifics of x86, the MMX
> >the SSE instructions.
> i think like Betov: 25 instruction are enough;
Theoretically, we can show that *one* instruction is enough (indeed,
one-instruction CPUs have actually been built as research projects
IIRC). Having fewer instructions makes the instruction set easier to
learn. It does not make it easier to program and it certainly doesn't
help you gain an advantage over HLLs or other CPUs with more
instructions. It just makes it easier to memorize the instruction set.
Now granted, a *typical* application may only use 25-50 instructions.
But the exact subset of the instruction set applications employ
*varies* across different applications. Multimedia apps are likely to
use instructions that database apps don't use, and vice-versa. If you
only write applications in one area, you *might* be able to get away
with using only a subset of the instruction set. But if you write a
large number of different applications, it's unlikely you'll be writing
the most efficient code if you do this. After all, you can stick to the
original 8086 instruction set (which probably has far more instructions
than you want). Surely there must be some reason for all the new 286,
386, 486, and Pentium * instructions that have been added over the
> than can change cpu
> registers size but the instructions are the same. to add more
> instructons is not an advantage for an assembly programmer
Then why have they added all these new instructions over the years?
Yes, it is more work to *learn* all these instructions. That could be
construed as a disadvantage, but the whole point of coding in assembly
language rather than a HLL like C is because you can take advantage of
all the instructions (which most HLLs cannot do). You lose this
advantage if you limit yourself to a subset of the instruction set.
Worse, by limiting yourself to learning only 25 or so instructions, you
are at a decided disadvantage compared with HLL compilers, as they have
no such synthetic limitation. Though compilers can rarely take
advantage of all instructions in an instruction set, I can assure you
that they use more than 25 instructions.
> >As for a "minimal portable cpu", one based on x86 or other CPUs,
> >powerful enough to represent anything and be efficient,
> >but still translate efficiently to other CPUs could simply be
> >a bytecode or semi-compiled language of some sort. Java for example,
> >or other p-code thingys. (too many languages compile to p-code to
> >list here... Java is a simple enough one for me to know)
> i think
> if "java-cpu" deal with *easy instructions* that can easily
> implemented in any cpu, if cpu makers change their cpu for optimise
> the "java-cpu" instructions then "java-cpu" will win
Been done. Didn't happen.
P.S. Why don't you list the 25 instructions you feel are important.
Perhaps some of us can point out why it would be nice to have some
additional instructions beyond the ones you list.
From: email@example.com on 9 Feb 2006 13:14
> > --Dragontamer
> Speed is given for "almost free" to the Assembly Programmers
> because they have a direct access to Strategy Optimization,
> that is the only Optimization that matters.
Please explain why this "strategy optimization" is available only to
assembly programmers and not HLL programmers. You've never really
defined "strategy optimization", but as best I can tell, it means
"selecting the best algorithm for the job". An optimization strategy
that HLL can certainly employ.
> As opposed to the ones playing the fool with HLLs and Code
> Level Optimized Routines, what i claim is based on the
> materials, i am giving away, that is, RosAsm, which is the
> fastest of the actual Assemblers, whereas i never did any
> kind of Code Level Optimization effort.
As you've never really compared your assembler's speed against a wide
variety of other assemblers, you cannot make this claim. Also,
comparing the speed of RosAsm self-compiling itself against FASM
compiling Fresh is *not* at all a valid comparison. It's apples and
oranges. Only you seem to think that the fact that RosAsm self-compiles
itself in 3-4 seconds "proves" that it is the fastest of all actual
And note to James: would you consider this a "shameless plug" for
Actually, I would not. It's actually a *shameful* plug and Rene is
lying out his teeth on this claim. It has been proven over and over
again that RosAsm is *not* the fastest assembler, yet he still
continues to make this claim.
> So, in order the significative points are:
By the way, the word you are looking for is "significant". If you're
going to talk about readability in the next sentence, it's a good idea
to improve your readability in your posts so people take you a *little*
> * Readability, because if you can't read a Source you loose
> control on the levels, faster than you wrote it.
>From your own code (the disassembler engine, which is recent code -- so
you can't claim that I'm posting older code)
mov bl B$esi | inc esi | DigitMask bl To al
.If al = 0 ; 0F 01 /0 SGDT m
mov D$edi 'sgdt', B$edi+4 ' ' | add edi 5 | jmp
.Else_If al = 1 ; 0F 01 /1 SIDT m
mov D$edi 'sidt', B$edi+4 ' ' | add edi 5 | jmp
.Else_If al = 2 ; LGDT m16&32
mov D$edi 'lgdt', B$edi+4 ' ' | add edi 5 | jmp
.Else_If al = 3 ; LIDT m16&32
mov D$edi 'lidt', B$edi+4 ' ' | add edi 5 | jmp
.Else_If al = 4 ; 0F 01 /4 SMSW r/m16 ; 0F 01 /4 SMSW
mov D$edi 'smsw', B$edi+4 ' ' | add edi 5 | jmp
.Else_If al = 6 ; LMSW r/m16
mov D$edi 'lmsw', B$edi+4 ' ' | add edi 5 | jmp
.Else_If al = 7 ; INVLPG m
mov D$edi 'invl', D$edi+4 'pg ' | add edi 7 | jmp
dec esi | ret
I will leave it up to others to determine whether they find this code
readable or not. Certainly, I feel it could be improved quite a bit in
the readability department.
> * Strategy Optimization. With that one, speed gain is not
> a matter of saving, say, 5% of time. It is a matter of
> suppressing the bottles' necks, by human logic. What is not
> theorically impossible to achieve in HLLs, but what is
> de-facto almost never found in HLLs, [They use Languages
> that _HIDE_. Period], for the very simple reason that 99%
> of the HLLers have no idea about what they are doing.
One of the first things you learn in the study of optimization is that
you begin with a decent algorithm. This is true regardless of language.
Algorithms are generally *independent* of the language. Nothing is
hidden to the designer of an algorithm. Whomever *implements* the
algorithm may choose to hide parts of it; and the compiler may
certainly hide parts of the machine code implementation; but that's all
part of generating *better* code, not worse.
Bottom line is that inefficient code occurs because people *fail* to
design decent algorithms to begin with. Not because the language makes
it impossible for them to implement good algorithms.
> * From that point, Asm speed is given for free,
No, it is not. You can employ bad algorithms in assembly just as easily
as you can in a HLL. Let's look back as your disassembler engine that
contains the following comment:
; This is the only one Table used in the Disassembler.
; Table of Pointers to each primary Opcode computation Routine:
In fact, there are a couple of prefix instructions (e.g., $0f) that
could also benefit from such a table. Whether the speed is actually
necessary or not is a good question, but if you were doing your
"strategy" optimization, I'd expect a lookup table for each of the
sub-instruction sets present (e.g., $0f, floating point, and so on).
Bottom line, you've given up some speed here. Maybe it's not necessary
to have that speed. The disassembler may be fast enough. Then again,
I'd argue that you probably don't need the lookup table for the main
opcode, either. A binary search might produce code that runs only
imperceptably slower (i.e., a maximum of eight in-line comparisons
rather than a lookup through a 1K table, whose elements may or may not
be in cache).
> and is no more
> a concern for the Assembly programmers, and when you see guys
> posting messages about code speed-up, you can immidiately
> conclude that these are not Assembly Programmer.
And what are we to believe about someone who claims that speed isn't an
issue just because you're using assembly language? Personally, I'd
believe that this person has never used modern optimizing compilers,
and doesn't really know how well they perform against assembly code
that was written without any thought to optimization.
In reality, the *size* of a program turns out to be *far* more
important than the micro-optimizations people do at the instruction
level. You lose *far* more cycles to cache misses than you do when you
fail to keep the pipes busy. The key to performance on modern machines
is keeping your code and data in the cache at all times (or most of the
time). This is why you get away with writing sloppy code and not
bothering to optimize it -- your applications are so small that they
fit completely in cache, so you get an order of magnitude advantage
over apps that thrash the cache. But if you start writing substantial
applications, that make use of a *large* amount of data, that
completely blows away the cache, then you'll see what happens to the
performance of your applications.