|
Prev: Bootloader tutorial
Next: registers and types
From: rhyde on 1 Aug 2006 16:05 Hi all, For the past couple of years, Rene "Betov" Tournois has been trashing IDA Pro claiming that you cannot easily create source files that can be reassembled with it, and claiming that it has serious defects in the code generation. Recently, he suggested that someone show us all how "easy" IDA Pro is to use, if it's as wonderful as everyone claims. Normally, I write such a challenge off as a waste of time because you know quite well that the moment you demonstrate how easy IDA Pro is to use, he would just come back with the argument that the file was rigged in IDA's favor. Then it occurred to me -- why not use the "99 Bottles" application that the RosAsm disassembler fails miserably on? Let's see how hard it is to reconstruct a source file that can be reassembled with MASM using IDA Pro and the 99.exe object code file. To begin with, here is the original HLA source file that produced the 99.exe executable: program bottles; #include( "stdlib.hhf" ) // HLA version of the "99 Bottles of Beer" song // Cross-platform: Linux console or Windows console // Get HLA here: http://webster.cs.ucr.edu/AsmTools/HLA/index.html // // // Macro compile-time function that does a limited "unsigned to English" // conversion: const Tens :string[10] := [ "", // zero "", // one, handled as a special case "Twenty", "Thirty", "Forty", "Fifty", "Sixty", "Seventy", "Eighty", "Ninety" ]; Ones :string[20] := [ "", // zero is not used "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen", "Fifteen", "Sixteen", "Seventeen", "Eighteen", "Nineteen" ]; #macro uToEnglish( unsVal, NumberCase ); #if( unsVal >= 20 ) #if( unsVal mod 10 <> 0 ) #if( NumberCase ) Tens[unsVal div 10] + "-" + @lowercase( Ones[unsVal mod 10], 0) #else @lowercase( Tens[unsVal div 10] + "-" + Ones[unsVal mod 10], 0 ) #endif #else #if( NumberCase ) Tens[ unsVal div 10] #else @lowercase( Tens[ unsVal div 10], 0 ) #endif #endif #else #if( NumberCase ) Ones[ unsVal ] #else @lowercase( Ones[ unsVal ], 0) #endif #endif #endmacro static sOrNot :string[100] := [ "s", "", 98 dup ["s"] ]; Numbers :string[100] := [ "No more", #for( i := 1 to 98) uToEnglish( i, true ), #endfor uToEnglish( 99, true ) ]; lcNumbers :string[100] := [ "no more", #for( i := 1 to 98) uToEnglish( i, false ), #endfor uToEnglish( 99, false ) ]; begin bottles; mov( 99, ecx ); repeat stdout.put ( Numbers[ecx*4], " bottle", sOrNot[ecx*4], " of beer on the wall, ", lcNumbers[ecx*4], " bottle", sOrNot[ecx*4], " of beer." nl "Take one down and pass it around, ", lcNumbers[ecx*4-4], " bottle", sOrNot[ecx*4-4], " of beer on the wall" nl nl ); dec( ecx ); until( @z ); stdout.puts( "No more bottles of beer on the wall. No more bottles of beer..." nl ); stdout.puts( "Go to the store and buy some more... 99 bottles of beer for the wall." nl); end bottles; There is nothing special here, just a lot of data and some very straight-forward code. Essentially, it's just a sequence of PRINT statements in a loop. Shouldn't be too difficult to disassemble, despite the fact that RosAsm's disassembler seemed to have all kinds of problems with it. For this experiment, I chose to use IDA Pro v4.3 because it's free and people around here can download it to test out any claims I make about the product. It is important to note that IDA Pro (the commercial version) is now up to version 5.0 and some of the issues I will point out have probably been fixed. I loaded "99.exe" into IDA Pro. It crunched away for a few seconds, and produced a disassembled listing, along with a nice flowchart (not very interesting in this trivial application) and it built a data base that made it very easy to browse the disassembled listing. I told IDA to produce an ASM file and I loaded the file up into MASM. Now the assembly file by itself will *not* compile under MASM because MASM's syntax is slightly different from what IDA produces. But this is mostly fixed by adding the following lines to the beginning of the file: if @Version lt 612 .586p else .686p .mmx .xmm endif .model flat, syscall option noscoped large textequ <> small textequ <> offset32 equ <offset flat:> In particular, IDA Pro emits instructions like mov ebx, large fs:0 or pushd small 0 To tell you the displacement/operand size information that Intel syntax doesn't specify. The text equates above make these operand items disappear so that MASM can handle them properly. I also used IDA Pro's renaming facilities to change four Win32 API functions to their true external names, e.g., I changed GetStdHandle to __imp__GetStdHandle(a)4. IDA Pro's interactive editor made this a breeze. Beyond these cha
From: rhyde on 1 Aug 2006 18:05 I ran a quick test with the RosAsm debugger and attempted to compare the object file it produced against those produced by HLA with the original source file and MASM operating on the IDA Pro-disassembled file. While the HLA and MASM outputs were very close to one other (the major difference being the choice of "code NOPs" used to align procedures on 16-byte boundaries), RosAsm was completely different. In particular, the RosAsm disassembler seems to have decided to lift a bunch of data that was in the code segment and moved it elsewhere. This is not a desirable behavior from a disassembler (or the assembler, whichever is to blame). It's easy to see why the RosAsm disassembler will break on some straight-forward sections of code if this is indeed what it's doing with the disassembled file. Cheers, Randy Hyde
From: Titus on 1 Aug 2006 19:00 Randy, Hmmmm....looks good. I'll try it with IDA Pro 5.0 and see what I get. By, the way, have you thought of starting a disassembly newsgroup? A discusion of techniques, tools, etc. would be fantastic! If anyone is interested, here are some recommended forums and sites: http://home.online.no/~reopsahl/files/assem.htm by Greythorne the Technomancer http://remb.cjb.net/ by Mammon http://www.openrce.org/ http://www.reverse-engineering.net/ http://www.uninformed.org/ and of course, http://www.woodmann.com/fravia/blackbo.htm by fravia -Titus On 1 Aug 2006 15:05:43 -0700, rhyde(a)cs.ucr.edu wrote: >I ran a quick test with the RosAsm debugger and attempted to compare >the object file it produced against those produced by HLA with the >original source file and MASM operating on the IDA Pro-disassembled >file. While the HLA and MASM outputs were very close to one other (the >major difference being the choice of "code NOPs" used to align >procedures on 16-byte boundaries), RosAsm was completely different. In >particular, the RosAsm disassembler seems to have decided to lift a >bunch of data that was in the code segment and moved it elsewhere. This >is not a desirable behavior from a disassembler (or the assembler, >whichever is to blame). It's easy to see why the RosAsm disassembler >will break on some straight-forward sections of code if this is indeed >what it's doing with the disassembled file. >Cheers, >Randy Hyde
From: rhyde on 1 Aug 2006 19:10 rhyde(a)cs.ucr.edu wrote: > I ran a quick test with the RosAsm debugger and attempted to compare > the object file it produced against those produced by HLA with the > original source file and MASM operating on the IDA Pro-disassembled > file. The RosAsm disassembler also seems to inject some data of its own into the object file. Here, for example, is the beginning of the data segment emitted by HLA and MASM: RAW DATA #1 04001000: 01 00 00 00 01 00 00 00 73 00 00 00 00 00 00 00 .........s....... 04001010: 00 00 00 00 00 00 00 00 07 00 00 00 07 00 00 00 ................. 04001020: 4E 6F 20 6D 6F 72 65 00 03 00 00 00 03 00 00 00 No more......... 04001030: 4F 6E 65 00 03 00 00 00 03 00 00 00 54 77 6F 00 One.........Two. 04001040: 05 00 00 00 05 00 00 00 54 68 72 65 65 00 00 00 .........Three... 04001050: 04 00 00 00 04 00 00 00 46 6F 75 72 00 00 00 00 .........Four.... (Produced with the dumpbin utility) Here's the same section put out by the RosAsm disassembler/assembler: RAW DATA #2 00402000: 41 70 70 6C 69 63 61 74 69 6F 6E 20 42 61 73 65 Application Base 00402010: 00 00 00 00 20 20 20 20 20 20 20 20 68 00 48 69 .... h.Hi 00402020: 68 6F 48 69 68 6F 00 00 01 00 00 00 01 00 00 00 hoHiho.......... 00402030: 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 s............... 00402040: 07 00 00 00 07 00 00 00 4E 6F 20 6D 6F 72 65 00 ........No more. 00402050: 03 00 00 00 03 00 00 00 4F 6E 65 00 03 00 00 00 .........One..... 00402060: 03 00 00 00 54 77 6F 00 05 00 00 00 05 00 00 00 .....Two......... 00402070: 54 68 72 65 65 00 00 00 04 00 00 00 04 00 00 00 Three........... 00402080: 46 6F 75 72 00 00 00 00 04 00 00 00 04 00 00 00 Four............ 00402090: 46 69 76 65 00 00 00 00 03 00 00 00 03 00 00 00 Five............ Perhaps I'm expecting too much from a disassembler, but I don't think it's too cool for the disassembler (or assembler processing that output) to sneak in extra data like this. In particular, you will note that the alignment of the strings has been changed by the insertion of this data. That can have a *serious* impact on the correct operation of the program if it assumes that the data is aligned on a particular boundary. In addition to all this, the RosAsm disassembler/assembler rearranges the sections in the EXE file (and, therefore their addresses in memory). Maybe Rene hasn't figured out that some code actually *depends* on the order of sections in memory. Even if the rest of the disassembly were perfect, these are serious problems that must be addressed if someone were to even think about using RosAsm as a disassembler for real-world projects. Of course, we've not even considered the problem of multi-section programs (segments in MASM terminology). AFAIK, RosAsm provides no support for those at all, so it stands to reason that the disassembler isn't going to handle those properly, either. Cheers, Randy Hyde
From: Betov on 2 Aug 2006 04:09
rhyde(a)cs.ucr.edu crivait news:1154469943.795013.93460@ 75g2000cwc.googlegroups.com: > I ran a quick test with the RosAsm debugger and attempted to compare > the object file it produced against those produced by HLA with the > original source file and MASM operating on the IDA Pro-disassembled > file. While the HLA and MASM outputs were very close to one other (the > major difference being the choice of "code NOPs" used to align > procedures on 16-byte boundaries), RosAsm was completely different. In > particular, the RosAsm disassembler seems to have decided to lift a > bunch of data that was in the code segment and moved it elsewhere. This > is not a desirable behavior from a disassembler (or the assembler, > whichever is to blame). It's easy to see why the RosAsm disassembler > will break on some straight-forward sections of code if this is indeed > what it's doing with the disassembled file. If you had any understanding on how an actual Disassembler has to do its job, on how the Data vs Code recognition is to be considered, and on how the various PEs may be as well "widely organized", at a Sections point of view, you would not even had taken a look at this: Given the fact that RosAsm Disassembler provides Sources and materials that can be re-executed by RosAsm Assembler, it is much evident that the binary organization has no chance to be identical, once re-compiled. SO said, the thing that is REALLY different between a Disassembly source, produced by RosAsm and by another Disassembler, is that the first one can be considered correct, as long as it re-runs and does the same job as the original version, while the seconds ones show nothing but darkness... which should not be a problem for blind poeple, by the way... :) Betov. < http://rosasm.org > |