|
From: James Daughtry on 16 Apr 2006 21:29 Sorry folks, but I just can't get over the whole assembly formatting problem. ;-) On the assumption that the biggest problem with readability is the traditional one column style, I wrote up a few guidelines for fun. @Randy: Your style guidelines advocate the traditional style on the grounds that assembly is unstructured and a block structured formatting style is inappropriate. You didn't convince me since I can easily see that simulating block structuring with a disciplined approach works just fine and the result is more readable according to Code Complete. Are you of the same opinion now? If so, why? I simply don't get why the traditional style is "readable". Once code starts to get long, even if written well according to your PDF, it's still an unruly mess. Too few people will be encouraged to learn that, and those that do will believe that only tiny bits of code can be effectively written in assembly. @Everyone else: Convince me that I'm wrong. :-) I want nothing more than for people to poke holes in the following guidelines so that I can improve my own programming. ============================================================= Note: RosAsm syntax is used for code examples. Local labels are designated by an @ prefix. Unadorned names signify an address and names adorned with D§ signify the contents of the address in the form of a dword. ============================================================= The biggest problem with assembly is not the language, it's the 1960's throwback style everyone seems so hell bent on using. Let's look at an example of a common assembly style: insertion_sort: mov esi,D§esp+4 mov ecx,1 @pass: ; Insert while there are more cmp ecx,D§esp+8 jae @end_pass push ecx ; Reuse ecx for the inner loop mov eax,D§esi+ecx*4 @shift: ; Make a hole for the new value jecxz @end_shift cmp D§esi+ecx*4-4,eax jle @end_shift push D§esi+ecx*4-4 pop D§esi+ecx*4 dec ecx jmp @shift @end_shift: mov D§esi+ecx*4,eax pop ecx ; Restore ecx after the inner loop inc ecx jmp @pass @end_pass: ret 8 One can argue-and they have-that this is readable if you're familiar with assembly language and anyone who finds it difficult to read obviously needs to learn more about assembly. To some extent this is true. If you know assembly, you can figure out what the code does without too much trouble. However, that's true with any consistent style, and the point is that readability specifies the ease with which you can read code. The above code breaks the single most important rule of readability: 1) The use of whitespace must clearly show the logical structure of the code. Code is formatted using vertical (blank lines) and horizontal whitespace (indention) to improve readability. Blank lines are used to break up logical groups of code where all of the statements are treated as equals, and if this is done half heartedly, the effect is reduced readability. Indention is used to break up logical groups of code where all of the statements that are indented are logically subordinate to one or more statements at a lesser indention level. A consistent and effective combination of the two kinds of whitespace will improve readability by several orders of magnitude. On the other hand, if, as in the example above, either is consistently used but ineffective at highlighting logical structure, it's almost as bad as using no formatting at all. Ignoring the changes to comments and whitespace between tokens, simply changing the indention and adding blank lines for logical grouping greatly improves readability. The result is not unlike the style used for higher level languages like C, C++, or Java: insertion_sort: mov esi,D§esp+4 mov ecx,1 ; Insert while there are more @pass: cmp ecx,D§esp+8 jae @end_pass ; Reuse ecx for the inner loop counter push ecx mov eax,D§esi+ecx*4 ; Make a hole for the new value @shift: jecxz @end_shift cmp D§esi+ecx*4-4,eax jle @end_shift push D§esi+ecx*4-4 pop D§esi+ecx*4 dec ecx jmp @shift @end_shift: mov D§esi+ecx*4,eax ; Restore ecx after the inner loop pop ecx inc ecx jmp @pass @end_pass: ret 8 ; end insertion_sort A key point is that readable code is not always short code. The first example was certainly shorter, but the second example has been proven by extensive research over the years to be far superior in terms of comprehension. For a small procedure the difference may not be as obvious, but imagine a procedure three times as long, and imagine hundreds of them. The problems begin to show up as the code gets longer, and with the traditional assembly style, the point at which code becomes difficult to work with is very small. That's a bad thing if you intend to write anything more than one or two tiny procedures in assembly. The improved version takes more work, as is the case with any readable style. Labels can no longer be simply stuck on column 0 with all of the instructions indented by one level. It should be immediately obvious that this is effectively the same thing as no indention at all, the only difference being that labels are easier to pick out. Here are some label guidelines to get you started: 1. A non-nested procedure label should be placed at column 0 with a comment acting as the end marker. The body of the procedure should be indented 2 to 4 spaces. This places non-nested procedures at the top lexical level (nested procedures would be suitably indented): my_function: ; Procedure body goes here ret ; end my_function If macros are used, the declaration phrase and end marker would replace the procedure label and terminating comment, respectively, with the body still indented 2 to 4 spaces: proc my_function: ; Procedure body goes here ret endp 2. Control structure labels (loops and conditionals) should be indented to the logical level of subordination: print_range: [@fmt: '%d ',0] mov ecx,D§esp+8 @print_num: cmp ecx,D§esp+4 jl @end_print_num push ecx push @fmt call 'msvcrt.printf' add esp,4 pop ecx dec ecx jmp @print_num @end_print_num: ret (2*4) ; end print_range Notice how @print_num and @end_print_num are indented because they are subordinate to print_range. Also notice how every thing between @print_num and @end_print_num are logically subordinate to @print_num. Therefore, the instructions are all indented at least one level deeper than @print_num. The indention between push ecx and pop ecx will be explained shortly. 3. Location labels, such as an error handling block, should be used judiciously (much like goto in C), but when used they should begin at column 0 to better show a break from the logic. Code after a location label is not subordinate to the location label, and should ignore the location label and be indented in relation to the rest of the code: my_function: test eax,eax jz @error ... ret @error: call panic ; end my_function my_function: xor ecx,ecx @get_input: call read test eax,eax jnz @good_input ; Recover from failed input call recover @good_input: call process_input test eax,eax jnz @get_input @end_get_input: ret ; end my_function Location labels are a break from the structure of the code, and they should be used carefully as they are difficult to format in structured code. Traditional assembly style encourages location labels in such a way as to create the infamous "spaghetti code", and avoiding them will aid readability by forcing programmers to use a more structured style. For example, the @good_input location label could be avoided with a structured IF: my_function: xor ecx,ecx @get_input: call read @bad_input: test eax,eax jnz @end_bad_input ; Recover from failed input call recover @end_bad_input: call process_input test eax,eax jnz @get_input @end_get_input: ret ; end my_function The instructions are identical, only the formatting has changed with an extra label added. But it makes a big difference to write the structured code that readers expect, will recognize, and that has been proven time and again to aid comprehension. 4. Data labels used as variable names should be indented according to the label that they are subordinate to. Global variables should begin at column 0, but labels local to a procedure should be indented to show they they are subordinate to the procedure. As an example, @fmt in the print_range procedure. Note that if a local label isn't used to restrict visibility to the procedure label, the variable is not subordinate to the procedure and should be treated as a global variable: [fmt: '%d ',0] print_range: mov ecx,D§esp+8 @print_num: cmp ecx,D§esp+4 jl @end_print_num push ecx push fmt call 'msvcrt.printf' add esp,4 pop ecx dec ecx jmp @print_num @end_print_num: ret (2*4) ; end print_range Indentation can be used for more than just procedures and control structures. In the print_range procedure shown previously, all statements between push ecx and pop ecx were treated as subordinate to the stack changes and indented accordingly. This was not an arbitrary choice because a corrupt stack is a common problem in assembly. Indentation was used to more clearly show the stack changes and the code that relied on them as well as facilitate debugging by making the symmetry obvious. If a begin marker is needed but an end marker is not available for a control structure, a comment can be provided in place of an end marker: jmp @below_ten .... @below_ten: cmp eax,10 jae @done ; end @below_ten If a beginning marker is not needed, treating the code as a control structure is encouraged, but optional. The @below_ten example given previously could be rewritten as such if there were no jump to the test from elsewhere in the code: cmp eax,10 jae @done If the option is available to treat the test and jump as a structured IF statement, two extra lines are needed. First, a label to mark the beginning of the conditional, and either a label or a comment as the end marker. Both methods are below: @below_ten cmp eax,10 jae @done @end_below_ten @below_ten cmp eax,10 jae @done ; end @below_ten Labels are cheap, so they should be used wherever it's felt they will improve clarity. If a comment serves the purpose just as well, the choice depends on programmer preference. However, it should be noted that a label can be used during maintenance and debugging while a comment has no such capability. Individual instructions should be formatted carefully to show grouping with other related instructions. For example, if one instruction is 3 characters but the next instruction in the group is five, the number of spaces between the short instruction and its arguments should be increased to align the operands: ; Bad example, operands not aligned dec ecx jecxz @done ; Good example, operands aligned to show grouping dec ecx jecxz @done This alignment should be used for a single group only. If the longest instruction in the next group is 3, then the alignment will be on that instruction rather than on jecxz, and a blank line should separate logical groups: ; First group, aligned with jecxz dec ecx jecxz @done ; Second group, aligned with cmp cmp eax,2 ja @again Some assemblers allow multiple instructions to be placed on a single line. Avoid the temptation to use that feature, because a good use of multiple instructions per line is not a great deal better than single instructions per line, and it's very easy to get wrong.
From: hutch-- on 17 Apr 2006 01:29 James, The old stuff tended to use a label followed by an instruction on the same line. It used to be two tabs for the mnemonics and you tried to keep the label name short enough to stay under the length of two tabs. The modern stuff tends to look like this. I added the register preservation for ESI that you had left out. insertion_sort: push esi ; needed mov esi, [esp+4] mov ecx, 1 pass: ; Insert while there are more cmp ecx, [esp+8] jae end_pass push ecx ; Reuse ecx for the inner loop mov eax, [esi+ecx*4] shift: ; Make a hole for the new value jecxz end_shift cmp [esi+ecx*4-4], eax jle end_shift push DWORD PTR [esi+ecx*4-4] pop DWORD PTR [esi+ecx*4] dec ecx jmp shift end_shift: mov [esi+ecx*4], eax pop ecx ; Restore ecx after the inner loop inc ecx jmp pass end_pass: pop esi ; needed ret 8 Regards, hutch at movsd dot com
From: Betov on 17 Apr 2006 05:43 "James Daughtry" <mordock32(a)hotmail.com> ?crivait news:1145237357.608271.42010(a)i39g2000cwa.googlegroups.com: > insertion_sort: > mov esi,D?esp+4 > mov ecx,1 > > ; Insert while there are more > @pass: > cmp ecx,D?esp+8 > jae @end_pass > > ; Reuse ecx for the inner loop counter > push ecx > mov eax,D?esi+ecx*4 > > ; Make a hole for the new value > @shift: > jecxz @end_shift > cmp D?esi+ecx*4-4,eax > jle @end_shift > > push D?esi+ecx*4-4 > pop D?esi+ecx*4 > > dec ecx > jmp ?shift > @end_shift: > > mov D?esi+ecx*4,eax > ; Restore ecx after the inner loop > pop ecx > > inc ecx > jmp @pass > @end_pass: > > ret 8 > ; end insertion_sort 1) Local Symbols of RosAsm should not be used instead of True Local Labels. 2) In all cases, Labels should be given at first column, for readability. 3) Comment can be better aligned. 4) "Ret", or "EndP" should be aligned with the Main Label. 5) There is no reason to align Instruction Members with Tabs. 6) When comments are for the next Instructions, they should end with ':' 7) Instructions that are one and same action should be on the same line, like "push D?esi+ecx*4-4 | pop D?esi+ecx*4", which is usually given in Macro form: Move D?esi+ecx*4-4, D?esi+ecx*4 8) The commas "," should _MEAN_ something. Example, with the above Macro, it is the separation in between two implied Instructions. Other than this, this is a Continuation Line marker. 9) Main Symbols should have Upper Cases, and not '_', but in the CASES_OF_EQUATES. 10) ; Reuse ecx for the inner loop counter: push ecx ; ... ; Restore ecx after the inner loop: pop ecx Are the typical absurdly useless comments. 11) The Local Labels should be on the targeted Line, and should not polluate the blank lines, as long as those ones _have_ a meaning of "separator". ------------------------------------------------------- InsertionSort: mov esi D?esp+4, ecx 1 ; Insert while there are more: L0: cmp ecx D?esp+8 | jae L9>> push ecx mov eax D?esi+ecx*4 ; Make a hole for the new value: L1: jecxz L2> cmp D?esi+ecx*4-4 eax | jle L2> push D?esi+ecx*4-4 | pop D?esi+ecx*4 dec ecx | jmp L1< L2: mov D?esi+ecx*4 eax pop ecx inc ecx | jmp L0< L9: ret 8 ------------------------------------------------------- As the arguments of mines, about my above points, have already been developped previously, it saves me from developing them again. ;) Betov. < http://rosasm.org >
From: hutch-- on 17 Apr 2006 07:44 smile, The Gospel according to Betov strikes again. > cmp D§esi+ecx*4-4 eax | jle L2> > push D§esi+ecx*4-4 | pop D§esi+ecx*4 > dec ecx | jmp L1< > L2: mov D§esi+ecx*4 eax > pop ecx You need do no more than look at this stuff to see what is wrong with it. Crayon scribbling style notation that looks like a control string for a regular expression search gone wrong. Its exactly being such a jumble of junk that makes it unreadable, instead of clearly defined seperate concepts that each mnemonic is, they are sequentially jumbled instead of being presented with the original clarity of low level code with each instruction being on its own line. Then there is the use of 1980 style labels L1: L2: etc .... cmp [esi+ecx*4-4], eax jle L2 push DWORD PTR [esi+ecx*4-4] pop DWORD PTR [esi+ecx*4] dec ecx jmp L1 L2: mov [esi+ecx], eax pop ecx The mistake with the code style that Betov has presented is it is confusing a high level style where you construct a high level CONCEPT on the same line or with trailing split lines with assembler mnemonic code where EACH INSTRUCTION is a documented concept of its own. Jumbling assembler instructions on the same line is like writing API function calls on the same line one after another with them being seperated by a line breaking notation and in fact various compilers can do exactly that. Its just that there is little point because code readability suffers very greatly by doing so. I have seen both old style C and old style Basic written in this manner complete with the occasional left aligned label and its generally unintelligible. Regards, hutch at movsd dot com
From: Donkey on 17 Apr 2006 08:09
hutch-- wrote: > Its exactly being such a jumble of junk that makes it unreadable, > instead of clearly defined seperate concepts that each mnemonic is, > they are sequentially jumbled instead of being presented with the > original clarity of low level code with each instruction being on its > own line. Then there is the use of 1980 style labels L1: L2: etc .... I believe that in RosAsm like in GoAsm and several other assemblers X#: type labels are used when you do not wish the label added to the symbol table, like @@: in MASM however allowing you more latitude in jumping, that is you do not need to limit yourself to one label in each direction. This makes for clearer code when examining it in a symbolic debugger. I could be wrong of course and if I am I'm sure Betov will correct me. Donkey |