From: H. Peter Anvin on
Alexei A. Frounze wrote:
>>
>> Plain 90h, which would normally be XCHG EAX,EAX (and therefore
>> zero-extend EAX into RAX) is actually NOP.
>
> I might have misspoken, but the appropriate REX followed by 0x90 isn't
> a NOP, it's XCHG rAX, r8.
>

90 = nop = nop [would have been xchg eax,eax]
40 90 = rex nop = nop [would have been xchg eax,eax]
41 90 = rex.b nop = xchg eax,r8d
48 90 = rex.w nop = xchg rax,rax
49 90 = rex.wb nop = xchg rax,r8

-hpa
From: Willow on
I have xchg eax,eax and xhg rax,rax as nop. Is this wrong?

Below is the output from my new intelligent disassembler. It's called
crudcom (Crude Decompiler -- that's what it aspires to be, not what it
is!) and it's based on crudasm, only the output actually assembles
this time! You must specify the entrypoint. It will flag with a star
(*) comment anything it was unable to follow -- far control transfer
instructions, indirect control transfer instructions, etc. You have to
manually help it by provided a script file (currently unimplemented)
when it flags something.

crudcom uses fn_<address> and loc_<address> labels and I disassembled
a real program and reassembled it--nasm shrank it a little bit but
there were no assembler errors! (Unfortunately the resulting binary is
worthless because offsets aren't exactly the same owing to the
shrinkage).

You can find crudcom1.exe and associated source code bundled (as well
as the latest version of crudasm) in the latest version of VmDec here:
http://code.google.com/p/vm64dec/downloads/list

I also added semantics to the script file so data flow analysis can be
added to later versions of crudcom (the intelligent disassembler /
decompiler, successor to crudasm).
For example, this is the semantics for test:
asgn(x86_of, 0);
asgn(x86_af, undefined);
asgn(x86_cf, 0);
asgn(tmp(result), bitand(arg(0), arg(1)));
asgn(x86_sf, sign(tmp(result)));
asgn(x86_zf, is_zero(tmp(result)));
asgn(x86_pf, _x86_parity(trunc$byte(tmp(result))));
See x86s/in_script.txt and x86s/x86s_semantics.h for details.

I have developed a program (not included currently) that converts 32-
bit DLLs/EXEs into binary image files and provides a list of
entrypoints, so crudasm/crudcom can be used on non-raw binary files. I
want to incorporate a good loader into a future version that takes
into account relocations and external invokations, e.g. so we get some
'off_<address>' labels, and so that e.g. a call to
RtlSomethingOrAnother will be understood.

I plan to add data flow analysis to crudcom2, following in the
foosteps of dcc. It will still disassemble not decompile, but the
comments will now report what registers and flags are input/output by
any given function. So-called decompilable functions will be listed in
depth-first-search order in the output assembly listing, so you can
tell what's going on. That is what's planned for the future.

crudcom3 will (supposedly) be the beginnings of a decompiler. It will
actually generate C-style code, and will support loading of non-raw
binary executable/shared libraries, and will support a helper script
file for global data (you can't decompile a static linked list very
easilly) and allow the user to specify other entrypoints (such as an
array of function pointers). It will recognize indirect jumps (switch
statements) automatically however. The output of crudcom3 will be
generated from the semantics part of the script file. It will still
make use of registers and flags, but crudcom4 will remove these and
replace them with identifiers and expressions respectively.

At least that's how I have it planned :-)
I understand there are already some decompilers out there, but I
wanted to make my own.

Willow

--- begin code ---

org 0x100
; Calls: fn_119
fn_100:
jmp word loc_10d

loc_103: db 0x48, 0x69, 0x21, 0xd, 0xa, 0x24

loc_109:
int 0x21
loc_10b:
jmp short loc_10b
loc_10d:
mov dx,0x0103
call word fn_119
mov ah,0x4c
jmp short loc_109

loc_117: db 0xeb, 0xfe

; Calls:
fn_119:
mov ah,0x09
int 0x21
ret

loc_11e: db 0xb8, 0xc3, 0x90, 0x90, 0x90, 0x90, 0xd5, 0x8, 0xd4, 0xa,
0xd9, 0xf4
, 0xe8, 0xfd, 0xff, 0xff
loc_12e: db 0x10, 0xeb, 0xfe, 0xe9, 0xfd, 0xff, 0xff, 0x20
From: Alexei A. Frounze on
On Aug 20, 5:43 pm, "H. Peter Anvin" <h...(a)zytor.com> wrote:
> Alexei A. Frounze wrote:
>
> >> Plain 90h, which would normally be XCHG EAX,EAX (and therefore
> >> zero-extend EAX into RAX) is actually NOP.
>
> > I might have misspoken, but the appropriate REX followed by 0x90 isn't
> > a NOP, it's XCHG rAX, r8.
>
> 90 = nop = nop [would have been xchg eax,eax]
> 40 90 = rex nop = nop [would have been xchg eax,eax]

Yep.

> 41 90 = rex.b nop = xchg eax,r8d

Yep.

> 48 90 = rex.w nop = xchg rax,rax

Should be a NOP effectively.

> 49 90 = rex.wb nop = xchg rax,r8

Same here.

So, did you do the last two under a debugger? If you did, on what CPU
brand? Intel, AMD or both?

Alex
From: H. Peter Anvin on
Alexei A. Frounze wrote:
>
>> 48 90 = rex.w nop = xchg rax,rax
>
> Should be a NOP effectively.
>
>> 49 90 = rex.wb nop = xchg rax,r8
>
> Same here.

Not a NOP, certainly...

>
> So, did you do the last two under a debugger? If you did, on what CPU
> brand? Intel, AMD or both?
>

Not under a debugger, but yes, I executed them (inside a small C
program). AMD Athlon X2 4200+ (socket 939).

-hpa
From: Willow on
On Aug 20, 9:38 pm, "Alexei A. Frounze" <alexfrun...(a)gmail.com> wrote:
> On Aug 20, 5:43 pm, "H. Peter Anvin" <h...(a)zytor.com> wrote:
>
> > Alexei A. Frounze wrote:
>
> > >> Plain 90h, which would normally be XCHG EAX,EAX (and therefore
> > >> zero-extend EAX into RAX) is actually NOP.
>
> > > I might have misspoken, but the appropriate REX followed by 0x90 isn't
> > > a NOP, it's XCHG rAX, r8.
>
> > 90 = nop = nop [would have been xchg eax,eax]
> > 40 90 = rex nop = nop [would have been xchg eax,eax]
>
> Yep.
>
> > 41 90 = rex.b nop = xchg eax,r8d
>
> Yep.
>
> > 48 90 = rex.w nop = xchg rax,rax
>
> Should be a NOP effectively.
>
> > 49 90 = rex.wb nop = xchg rax,r8
>
> Same here.
>
This is where you're wrong. r8 is not rax, so xchg rax,r8 is not a no-
op instruction!
> So, did you do the last two under a debugger? If you did, on what CPU
> brand? Intel, AMD or both?
>
> Alex

Willow