From: James Daughtry on
Sorry folks, but I just can't get over the whole assembly formatting
problem. ;-) On the assumption that the biggest problem with
readability is the traditional one column style, I wrote up a few
guidelines for fun.

@Randy:

Your style guidelines advocate the traditional style on the grounds
that assembly is unstructured and a block structured formatting style
is inappropriate. You didn't convince me since I can easily see that
simulating block structuring with a disciplined approach works just
fine and the result is more readable according to Code Complete.

Are you of the same opinion now? If so, why? I simply don't get why the
traditional style is "readable". Once code starts to get long, even if
written well according to your PDF, it's still an unruly mess. Too few
people will be encouraged to learn that, and those that do will believe
that only tiny bits of code can be effectively written in assembly.

@Everyone else:

Convince me that I'm wrong. :-) I want nothing more than for people to
poke holes in the following guidelines so that I can improve my own
programming.

=============================================================

Note: RosAsm syntax is used for code examples. Local labels are
designated by an @ prefix. Unadorned names signify an address and names
adorned with D§ signify the contents of the address in the form of a
dword.

=============================================================

The biggest problem with assembly is not the language, it's the
1960's throwback style everyone seems so hell bent on using. Let's
look at an example of a common assembly style:

insertion_sort:
mov esi,D§esp+4
mov ecx,1
@pass: ; Insert while there are more
cmp ecx,D§esp+8
jae @end_pass
push ecx ; Reuse ecx for the inner loop
mov eax,D§esi+ecx*4
@shift: ; Make a hole for the new value
jecxz @end_shift
cmp D§esi+ecx*4-4,eax
jle @end_shift
push D§esi+ecx*4-4
pop D§esi+ecx*4
dec ecx
jmp @shift
@end_shift:
mov D§esi+ecx*4,eax
pop ecx ; Restore ecx after the inner loop
inc ecx
jmp @pass
@end_pass:
ret 8

One can argue-and they have-that this is readable if you're
familiar with assembly language and anyone who finds it difficult to
read obviously needs to learn more about assembly. To some extent this
is true. If you know assembly, you can figure out what the code does
without too much trouble. However, that's true with any consistent
style, and the point is that readability specifies the ease with which
you can read code. The above code breaks the single most important
rule of readability:

1) The use of whitespace must clearly show the logical structure of
the code.

Code is formatted using vertical (blank lines) and horizontal
whitespace (indention) to improve readability. Blank lines are used to
break up logical groups of code where all of the statements are treated
as equals, and if this is done half heartedly, the effect is reduced
readability. Indention is used to break up logical groups of code
where all of the statements that are indented are logically subordinate
to one or more statements at a lesser indention level.

A consistent and effective combination of the two kinds of whitespace
will improve readability by several orders of magnitude. On the other
hand, if, as in the example above, either is consistently used but
ineffective at highlighting logical structure, it's almost as bad as
using no formatting at all.

Ignoring the changes to comments and whitespace between tokens, simply
changing the indention and adding blank lines for logical grouping
greatly improves readability. The result is not unlike the style used
for higher level languages like C, C++, or Java:

insertion_sort:
mov esi,D§esp+4
mov ecx,1

; Insert while there are more
@pass:
cmp ecx,D§esp+8
jae @end_pass

; Reuse ecx for the inner loop counter
push ecx
mov eax,D§esi+ecx*4

; Make a hole for the new value
@shift:
jecxz @end_shift
cmp D§esi+ecx*4-4,eax
jle @end_shift

push D§esi+ecx*4-4
pop D§esi+ecx*4

dec ecx
jmp @shift
@end_shift:

mov D§esi+ecx*4,eax
; Restore ecx after the inner loop
pop ecx

inc ecx
jmp @pass
@end_pass:

ret 8
; end insertion_sort

A key point is that readable code is not always short code. The first
example was certainly shorter, but the second example has been proven
by extensive research over the years to be far superior in terms of
comprehension. For a small procedure the difference may not be as
obvious, but imagine a procedure three times as long, and imagine
hundreds of them. The problems begin to show up as the code gets
longer, and with the traditional assembly style, the point at which
code becomes difficult to work with is very small. That's a bad
thing if you intend to write anything more than one or two tiny
procedures in assembly.

The improved version takes more work, as is the case with any readable
style. Labels can no longer be simply stuck on column 0 with all of
the instructions indented by one level. It should be immediately
obvious that this is effectively the same thing as no indention at all,
the only difference being that labels are easier to pick out. Here are
some label guidelines to get you started:

1. A non-nested procedure label should be placed at column 0 with a
comment acting as the end marker. The body of the procedure should be
indented 2 to 4 spaces. This places non-nested procedures at the top
lexical level (nested procedures would be suitably indented):

my_function:
; Procedure body goes here
ret
; end my_function

If macros are used, the declaration phrase and end marker would replace
the procedure label and terminating comment, respectively, with the
body still indented 2 to 4 spaces:

proc my_function:
; Procedure body goes here
ret
endp

2. Control structure labels (loops and conditionals) should be
indented to the logical level of subordination:

print_range:
[@fmt: '%d ',0]

mov ecx,D§esp+8

@print_num:
cmp ecx,D§esp+4
jl @end_print_num

push ecx
push @fmt
call 'msvcrt.printf'
add esp,4
pop ecx

dec ecx
jmp @print_num
@end_print_num:

ret (2*4)
; end print_range

Notice how @print_num and @end_print_num are indented because they are
subordinate to print_range. Also notice how every thing between
@print_num and @end_print_num are logically subordinate to @print_num.
Therefore, the instructions are all indented at least one level deeper
than @print_num. The indention between push ecx and pop ecx will be
explained shortly.

3. Location labels, such as an error handling block, should be used
judiciously (much like goto in C), but when used they should begin at
column 0 to better show a break from the logic. Code after a location
label is not subordinate to the location label, and should ignore the
location label and be indented in relation to the rest of the code:

my_function:
test eax,eax
jz @error
...
ret
@error:
call panic
; end my_function

my_function:
xor ecx,ecx

@get_input:
call read
test eax,eax
jnz @good_input

; Recover from failed input
call recover
@good_input:
call process_input

test eax,eax
jnz @get_input
@end_get_input:

ret
; end my_function

Location labels are a break from the structure of the code, and they
should be used carefully as they are difficult to format in structured
code. Traditional assembly style encourages location labels in such a
way as to create the infamous "spaghetti code", and avoiding them
will aid readability by forcing programmers to use a more structured
style. For example, the @good_input location label could be avoided
with a structured IF:

my_function:
xor ecx,ecx

@get_input:
call read

@bad_input:
test eax,eax
jnz @end_bad_input

; Recover from failed input
call recover
@end_bad_input:

call process_input

test eax,eax
jnz @get_input
@end_get_input:

ret
; end my_function

The instructions are identical, only the formatting has changed with an
extra label added. But it makes a big difference to write the
structured code that readers expect, will recognize, and that has been
proven time and again to aid comprehension.

4. Data labels used as variable names should be indented according
to the label that they are subordinate to. Global variables should
begin at column 0, but labels local to a procedure should be indented
to show they they are subordinate to the procedure. As an example,
@fmt in the print_range procedure. Note that if a local label isn't
used to restrict visibility to the procedure label, the variable is not
subordinate to the procedure and should be treated as a global
variable:

[fmt: '%d ',0]

print_range:
mov ecx,D§esp+8

@print_num:
cmp ecx,D§esp+4
jl @end_print_num

push ecx
push fmt
call 'msvcrt.printf'
add esp,4
pop ecx

dec ecx
jmp @print_num
@end_print_num:

ret (2*4)
; end print_range

Indentation can be used for more than just procedures and control
structures. In the print_range procedure shown previously, all
statements between push ecx and pop ecx were treated as subordinate to
the stack changes and indented accordingly. This was not an arbitrary
choice because a corrupt stack is a common problem in assembly.
Indentation was used to more clearly show the stack changes and the
code that relied on them as well as facilitate debugging by making the
symmetry obvious.

If a begin marker is needed but an end marker is not available for a
control structure, a comment can be provided in place of an end marker:

jmp @below_ten
....
@below_ten:
cmp eax,10
jae @done
; end @below_ten

If a beginning marker is not needed, treating the code as a control
structure is encouraged, but optional. The @below_ten example given
previously could be rewritten as such if there were no jump to the test
from elsewhere in the code:

cmp eax,10
jae @done

If the option is available to treat the test and jump as a structured
IF statement, two extra lines are needed. First, a label to mark the
beginning of the conditional, and either a label or a comment as the
end marker. Both methods are below:

@below_ten
cmp eax,10
jae @done
@end_below_ten

@below_ten
cmp eax,10
jae @done
; end @below_ten

Labels are cheap, so they should be used wherever it's felt they will
improve clarity. If a comment serves the purpose just as well, the
choice depends on programmer preference. However, it should be noted
that a label can be used during maintenance and debugging while a
comment has no such capability.

Individual instructions should be formatted carefully to show grouping
with other related instructions. For example, if one instruction is 3
characters but the next instruction in the group is five, the number of
spaces between the short instruction and its arguments should be
increased to align the operands:

; Bad example, operands not aligned
dec ecx
jecxz @done

; Good example, operands aligned to show grouping
dec ecx
jecxz @done

This alignment should be used for a single group only. If the longest
instruction in the next group is 3, then the alignment will be on that
instruction rather than on jecxz, and a blank line should separate
logical groups:

; First group, aligned with jecxz
dec ecx
jecxz @done

; Second group, aligned with cmp
cmp eax,2
ja @again

Some assemblers allow multiple instructions to be placed on a single
line. Avoid the temptation to use that feature, because a good use of
multiple instructions per line is not a great deal better than single
instructions per line, and it's very easy to get wrong.

From: hutch-- on
James,

The old stuff tended to use a label followed by an instruction on the
same line. It used to be two tabs for the mnemonics and you tried to
keep the label name short enough to stay under the length of two tabs.
The modern stuff tends to look like this. I added the register
preservation for ESI that you had left out.

insertion_sort:

push esi ; needed
mov esi, [esp+4]
mov ecx, 1

pass: ; Insert while there are more
cmp ecx, [esp+8]
jae end_pass
push ecx ; Reuse ecx for the inner loop
mov eax, [esi+ecx*4]

shift: ; Make a hole for the new value
jecxz end_shift
cmp [esi+ecx*4-4], eax
jle end_shift
push DWORD PTR [esi+ecx*4-4]
pop DWORD PTR [esi+ecx*4]
dec ecx
jmp shift

end_shift:
mov [esi+ecx*4], eax
pop ecx ; Restore ecx after the inner loop
inc ecx
jmp pass

end_pass:
pop esi ; needed
ret 8

Regards,

hutch at movsd dot com

From: Betov on
"James Daughtry" <mordock32(a)hotmail.com> ?crivait
news:1145237357.608271.42010(a)i39g2000cwa.googlegroups.com:

> insertion_sort:
> mov esi,D?esp+4
> mov ecx,1
>
> ; Insert while there are more
> @pass:
> cmp ecx,D?esp+8
> jae @end_pass
>
> ; Reuse ecx for the inner loop counter
> push ecx
> mov eax,D?esi+ecx*4
>
> ; Make a hole for the new value
> @shift:
> jecxz @end_shift
> cmp D?esi+ecx*4-4,eax
> jle @end_shift
>
> push D?esi+ecx*4-4
> pop D?esi+ecx*4
>
> dec ecx
> jmp ?shift
> @end_shift:
>
> mov D?esi+ecx*4,eax
> ; Restore ecx after the inner loop
> pop ecx
>
> inc ecx
> jmp @pass
> @end_pass:
>
> ret 8
> ; end insertion_sort


1) Local Symbols of RosAsm should not be used instead of
True Local Labels.

2) In all cases, Labels should be given at first column,
for readability.

3) Comment can be better aligned.

4) "Ret", or "EndP" should be aligned with the Main Label.

5) There is no reason to align Instruction Members with Tabs.

6) When comments are for the next Instructions, they should
end with ':'

7) Instructions that are one and same action should be on the
same line, like "push D?esi+ecx*4-4 | pop D?esi+ecx*4",
which is usually given in Macro form:

Move D?esi+ecx*4-4, D?esi+ecx*4

8) The commas "," should _MEAN_ something. Example, with the
above Macro, it is the separation in between two implied
Instructions. Other than this, this is a Continuation
Line marker.

9) Main Symbols should have Upper Cases, and not '_', but in
the CASES_OF_EQUATES.

10) ; Reuse ecx for the inner loop counter:
push ecx
; ...
; Restore ecx after the inner loop:
pop ecx

Are the typical absurdly useless comments.

11) The Local Labels should be on the targeted Line, and should
not polluate the blank lines, as long as those ones _have_ a
meaning of "separator".

-------------------------------------------------------

InsertionSort:
mov esi D?esp+4, ecx 1

; Insert while there are more:
L0: cmp ecx D?esp+8 | jae L9>>

push ecx
mov eax D?esi+ecx*4

; Make a hole for the new value:
L1: jecxz L2>

cmp D?esi+ecx*4-4 eax | jle L2>

push D?esi+ecx*4-4 | pop D?esi+ecx*4
dec ecx | jmp L1<

L2: mov D?esi+ecx*4 eax
pop ecx

inc ecx | jmp L0<

L9: ret 8

-------------------------------------------------------

As the arguments of mines, about my above points, have
already been developped previously, it saves me from
developing them again.

;)

Betov.

< http://rosasm.org >





From: hutch-- on
smile,

The Gospel according to Betov strikes again.

> cmp D§esi+ecx*4-4 eax | jle L2>
> push D§esi+ecx*4-4 | pop D§esi+ecx*4
> dec ecx | jmp L1<
> L2: mov D§esi+ecx*4 eax
> pop ecx

You need do no more than look at this stuff to see what is wrong with
it. Crayon scribbling style notation that looks like a control string
for a regular expression search gone wrong.

Its exactly being such a jumble of junk that makes it unreadable,
instead of clearly defined seperate concepts that each mnemonic is,
they are sequentially jumbled instead of being presented with the
original clarity of low level code with each instruction being on its
own line. Then there is the use of 1980 style labels L1: L2: etc ....

cmp [esi+ecx*4-4], eax
jle L2
push DWORD PTR [esi+ecx*4-4]
pop DWORD PTR [esi+ecx*4]
dec ecx
jmp L1
L2:
mov [esi+ecx], eax
pop ecx

The mistake with the code style that Betov has presented is it is
confusing a high level style where you construct a high level CONCEPT
on the same line or with trailing split lines with assembler mnemonic
code where EACH INSTRUCTION is a documented concept of its own.

Jumbling assembler instructions on the same line is like writing API
function calls on the same line one after another with them being
seperated by a line breaking notation and in fact various compilers
can do exactly that. Its just that there is little point because code
readability suffers very greatly by doing so.

I have seen both old style C and old style Basic written in this manner
complete with the occasional left aligned label and its generally
unintelligible.

Regards,

hutch at movsd dot com

From: Donkey on
hutch-- wrote:
> Its exactly being such a jumble of junk that makes it unreadable,
> instead of clearly defined seperate concepts that each mnemonic is,
> they are sequentially jumbled instead of being presented with the
> original clarity of low level code with each instruction being on its
> own line. Then there is the use of 1980 style labels L1: L2: etc ....

I believe that in RosAsm like in GoAsm and several other assemblers X#:
type labels are used when you do not wish the label added to the symbol
table, like @@: in MASM however allowing you more latitude in jumping,
that is you do not need to limit yourself to one label in each
direction. This makes for clearer code when examining it in a symbolic
debugger. I could be wrong of course and if I am I'm sure Betov will
correct me.

Donkey