From: rhyde on
Hi all,

For the past couple of years, Rene "Betov" Tournois has been trashing
IDA Pro claiming that you cannot easily create source files that can be
reassembled with it, and claiming that it has serious defects in the
code generation. Recently, he suggested that someone show us all how
"easy" IDA Pro is to use, if it's as wonderful as everyone claims.
Normally, I write such a challenge off as a waste of time because you
know quite well that the moment you demonstrate how easy IDA Pro is to
use, he would just come back with the argument that the file was rigged
in IDA's favor.

Then it occurred to me -- why not use the "99 Bottles" application that
the RosAsm disassembler fails miserably on? Let's see how hard it is to
reconstruct a source file that can be reassembled with MASM using IDA
Pro and the 99.exe object code file.

To begin with, here is the original HLA source file that produced the
99.exe executable:

program bottles;

#include( "stdlib.hhf" )

// HLA version of the "99 Bottles of Beer" song
// Cross-platform: Linux console or Windows console
// Get HLA here: http://webster.cs.ucr.edu/AsmTools/HLA/index.html
//
//
// Macro compile-time function that does a limited "unsigned to
English"
// conversion:


const
Tens :string[10] :=
[
"", // zero
"", // one, handled as a special case
"Twenty",
"Thirty",
"Forty",
"Fifty",
"Sixty",
"Seventy",
"Eighty",
"Ninety"
];


Ones :string[20] :=
[
"", // zero is not used
"One",
"Two",
"Three",
"Four",
"Five",
"Six",
"Seven",
"Eight",
"Nine",
"Ten",
"Eleven",
"Twelve",
"Thirteen",
"Fourteen",
"Fifteen",
"Sixteen",
"Seventeen",
"Eighteen",
"Nineteen"
];


#macro uToEnglish( unsVal, NumberCase );


#if( unsVal >= 20 )


#if( unsVal mod 10 <> 0 )


#if( NumberCase )


Tens[unsVal div 10] + "-" + @lowercase( Ones[unsVal
mod 10], 0)


#else


@lowercase( Tens[unsVal div 10] + "-" + Ones[unsVal
mod 10], 0 )


#endif


#else


#if( NumberCase )


Tens[ unsVal div 10]


#else


@lowercase( Tens[ unsVal div 10], 0 )


#endif


#endif


#else


#if( NumberCase )


Ones[ unsVal ]


#else


@lowercase( Ones[ unsVal ], 0)


#endif


#endif


#endmacro


static
sOrNot :string[100] :=
[
"s",
"",
98 dup ["s"]
];


Numbers :string[100] :=
[
"No more",
#for( i := 1 to 98)


uToEnglish( i, true ),


#endfor
uToEnglish( 99, true )
];


lcNumbers :string[100] :=
[
"no more",
#for( i := 1 to 98)


uToEnglish( i, false ),


#endfor
uToEnglish( 99, false )
];


begin bottles;


mov( 99, ecx );
repeat


stdout.put
(
Numbers[ecx*4],
" bottle", sOrNot[ecx*4],
" of beer on the wall, ",
lcNumbers[ecx*4],
" bottle", sOrNot[ecx*4],
" of beer." nl
"Take one down and pass it around, ",
lcNumbers[ecx*4-4],
" bottle", sOrNot[ecx*4-4],
" of beer on the wall" nl nl
);


dec( ecx );


until( @z );
stdout.puts( "No more bottles of beer on the wall. No more bottles
of beer..." nl );
stdout.puts( "Go to the store and buy some more... 99 bottles of
beer for the wall." nl);


end bottles;



There is nothing special here, just a lot of data and some very
straight-forward code. Essentially, it's just a sequence of PRINT
statements in a loop. Shouldn't be too difficult to disassemble,
despite the fact that RosAsm's disassembler seemed to have all kinds of
problems with it.

For this experiment, I chose to use IDA Pro v4.3 because it's free and
people around here can download it to test out any claims I make about
the product. It is important to note that IDA Pro (the commercial
version) is now up to version 5.0 and some of the issues I will point
out have probably been fixed.

I loaded "99.exe" into IDA Pro. It crunched away for a few seconds, and
produced a disassembled listing, along with a nice flowchart (not very
interesting in this trivial application) and it built a data base that
made it very easy to browse the disassembled listing. I told IDA to
produce an ASM file and I loaded the file up into MASM.

Now the assembly file by itself will *not* compile under MASM because
MASM's syntax is slightly different from what IDA produces. But this is
mostly fixed by adding the following lines to the beginning of the
file:

if @Version lt 612
.586p
else
.686p
.mmx
.xmm
endif
.model flat, syscall
option noscoped

large textequ <>
small textequ <>

offset32 equ <offset flat:>


In particular, IDA Pro emits instructions like

mov ebx, large fs:0
or
pushd small 0

To tell you the displacement/operand size information that Intel syntax
doesn't specify. The text equates above make these operand items
disappear so that MASM can handle them properly.

I also used IDA Pro's renaming facilities to change four Win32 API
functions to their true external names, e.g., I changed GetStdHandle to
__imp__GetStdHandle(a)4. IDA Pro's interactive editor made this a
breeze.

Beyond these cha
From: rhyde on
I ran a quick test with the RosAsm debugger and attempted to compare
the object file it produced against those produced by HLA with the
original source file and MASM operating on the IDA Pro-disassembled
file. While the HLA and MASM outputs were very close to one other (the
major difference being the choice of "code NOPs" used to align
procedures on 16-byte boundaries), RosAsm was completely different. In
particular, the RosAsm disassembler seems to have decided to lift a
bunch of data that was in the code segment and moved it elsewhere. This
is not a desirable behavior from a disassembler (or the assembler,
whichever is to blame). It's easy to see why the RosAsm disassembler
will break on some straight-forward sections of code if this is indeed
what it's doing with the disassembled file.
Cheers,
Randy Hyde

From: Titus on
Randy,

Hmmmm....looks good. I'll try it with IDA Pro 5.0 and see what I get.

By, the way, have you thought of starting a disassembly newsgroup? A
discusion of techniques, tools, etc. would be fantastic!

If anyone is interested, here are some recommended forums and sites:

http://home.online.no/~reopsahl/files/assem.htm by Greythorne the
Technomancer

http://remb.cjb.net/ by Mammon

http://www.openrce.org/

http://www.reverse-engineering.net/

http://www.uninformed.org/

and of course,

http://www.woodmann.com/fravia/blackbo.htm by fravia



-Titus

On 1 Aug 2006 15:05:43 -0700, rhyde(a)cs.ucr.edu wrote:

>I ran a quick test with the RosAsm debugger and attempted to compare
>the object file it produced against those produced by HLA with the
>original source file and MASM operating on the IDA Pro-disassembled
>file. While the HLA and MASM outputs were very close to one other (the
>major difference being the choice of "code NOPs" used to align
>procedures on 16-byte boundaries), RosAsm was completely different. In
>particular, the RosAsm disassembler seems to have decided to lift a
>bunch of data that was in the code segment and moved it elsewhere. This
>is not a desirable behavior from a disassembler (or the assembler,
>whichever is to blame). It's easy to see why the RosAsm disassembler
>will break on some straight-forward sections of code if this is indeed
>what it's doing with the disassembled file.
>Cheers,
>Randy Hyde
From: rhyde on

rhyde(a)cs.ucr.edu wrote:
> I ran a quick test with the RosAsm debugger and attempted to compare
> the object file it produced against those produced by HLA with the
> original source file and MASM operating on the IDA Pro-disassembled
> file.

The RosAsm disassembler also seems to inject some data of its own into
the object file.

Here, for example, is the beginning of the data segment emitted by HLA
and MASM:

RAW DATA #1
04001000: 01 00 00 00 01 00 00 00 73 00 00 00 00 00 00 00
.........s.......
04001010: 00 00 00 00 00 00 00 00 07 00 00 00 07 00 00 00
.................
04001020: 4E 6F 20 6D 6F 72 65 00 03 00 00 00 03 00 00 00 No
more.........
04001030: 4F 6E 65 00 03 00 00 00 03 00 00 00 54 77 6F 00
One.........Two.
04001040: 05 00 00 00 05 00 00 00 54 68 72 65 65 00 00 00
.........Three...
04001050: 04 00 00 00 04 00 00 00 46 6F 75 72 00 00 00 00
.........Four....

(Produced with the dumpbin utility)

Here's the same section put out by the RosAsm disassembler/assembler:

RAW DATA #2
00402000: 41 70 70 6C 69 63 61 74 69 6F 6E 20 42 61 73 65
Application Base
00402010: 00 00 00 00 20 20 20 20 20 20 20 20 68 00 48 69 ....
h.Hi
00402020: 68 6F 48 69 68 6F 00 00 01 00 00 00 01 00 00 00
hoHiho..........
00402030: 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
s...............
00402040: 07 00 00 00 07 00 00 00 4E 6F 20 6D 6F 72 65 00 ........No
more.
00402050: 03 00 00 00 03 00 00 00 4F 6E 65 00 03 00 00 00
.........One.....
00402060: 03 00 00 00 54 77 6F 00 05 00 00 00 05 00 00 00
.....Two.........
00402070: 54 68 72 65 65 00 00 00 04 00 00 00 04 00 00 00
Three...........
00402080: 46 6F 75 72 00 00 00 00 04 00 00 00 04 00 00 00
Four............
00402090: 46 69 76 65 00 00 00 00 03 00 00 00 03 00 00 00
Five............

Perhaps I'm expecting too much from a disassembler, but I don't think
it's too cool for the disassembler (or assembler processing that
output) to sneak in extra data like this. In particular, you will note
that the alignment of the strings has been changed by the insertion of
this data. That can have a *serious* impact on the correct operation of
the program if it assumes that the data is aligned on a particular
boundary.

In addition to all this, the RosAsm disassembler/assembler rearranges
the sections in the EXE file (and, therefore their addresses in
memory). Maybe Rene hasn't figured out that some code actually
*depends* on the order of sections in memory.

Even if the rest of the disassembly were perfect, these are serious
problems that must be addressed if someone were to even think about
using RosAsm as a disassembler for real-world projects.

Of course, we've not even considered the problem of multi-section
programs (segments in MASM terminology). AFAIK, RosAsm provides no
support for those at all, so it stands to reason that the disassembler
isn't going to handle those properly, either.
Cheers,
Randy Hyde

From: Betov on
rhyde(a)cs.ucr.edu crivait news:1154469943.795013.93460@
75g2000cwc.googlegroups.com:

> I ran a quick test with the RosAsm debugger and attempted to compare
> the object file it produced against those produced by HLA with the
> original source file and MASM operating on the IDA Pro-disassembled
> file. While the HLA and MASM outputs were very close to one other (the
> major difference being the choice of "code NOPs" used to align
> procedures on 16-byte boundaries), RosAsm was completely different. In
> particular, the RosAsm disassembler seems to have decided to lift a
> bunch of data that was in the code segment and moved it elsewhere. This
> is not a desirable behavior from a disassembler (or the assembler,
> whichever is to blame). It's easy to see why the RosAsm disassembler
> will break on some straight-forward sections of code if this is indeed
> what it's doing with the disassembled file.

If you had any understanding on how an actual Disassembler
has to do its job, on how the Data vs Code recognition is
to be considered, and on how the various PEs may be as
well "widely organized", at a Sections point of view, you
would not even had taken a look at this:

Given the fact that RosAsm Disassembler provides Sources
and materials that can be re-executed by RosAsm Assembler,
it is much evident that the binary organization has no
chance to be identical, once re-compiled.

SO said, the thing that is REALLY different between a
Disassembly source, produced by RosAsm and by another
Disassembler, is that the first one can be considered
correct, as long as it re-runs and does the same job
as the original version, while the seconds ones show
nothing but darkness... which should not be a problem
for blind poeple, by the way...

:)

Betov.

< http://rosasm.org >








 |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Bootloader tutorial
Next: registers and types