From: Wolfgang Kern on

Willow wrote:
....
> Here's how it is supposed to work
> machine code ---> icode --> assmebly language

Yes, I worked it out in a similar way which ended up as x86 code
converted into a strict defined 128 bit (16 byte) RISC pattern.
Most of the bit fields in there are used as index into LUTs lateron.

> The decoder core (x86s) reads machine code and produces intermediate
> low-level code (icode). The icode contains the instruction number and
> any arguments. Note that this process works in reverse too:

> assmebly language --> icode --> machine code
> That is how it works on paper. It's a bit more messy in practice !

Sure possible, even much more work than disassembling.

> I plan to add semantics to the script file -- right now it covers
> disassembling (generating text from icode) and decoding icode from
> machine code.

> The idea is to build a tree (actually directed acyclic graph) as you
> walk through code. An icode structure is only 28 bytes, so it's easy
> to convert a whole binary image into icode if you can multiply the
> file size by 28 and it still fits in memory. I do not plan to do this,
> it's just possible.

28 bytes per code line may not be enough for code analyse, just think
about value tracking during loops that contain conditional branches
and all the copies of stack images for these 'damned procedures'.
Not to mention the required memory to cover task-switches and VM86...

> One of my objectives is to get Windows 3.11 to run on FreeDOS 1.0.
> Somehow I think an intelligent disassembler will come in handy
> here :-)

Sure, I wish you good luck and 'plenty' of time for this :)

__
wolfgang



From: Wolfgang Kern on

Nathan said:

> Your "pasta straightener" is interesting. RosAsm definitely needs
> such a tool. :)

Nudels were shipped in parallel, but we usually eat them one by one :)

__
wolfgang



From: Rod Pemberton on
"Willow" <wrschlanger(a)gmail.com> wrote in message
news:995fd23b-dc80-4760-b80a-fef07ac020d2(a)t54g2000hsg.googlegroups.com...
> If you have time, can you repeat the experiment on the latest version
> and let me know how it goes?
> Thanks a bunch!!!
>

There are some trivial things I didn't list, e.g., "aad 0x0a" vs. "aad"...

First, there are some differences between 0.10 and 0.11 you should note:
--
0.10 has many more size keywords: "byte", "word", "dword"
0.10 doesn't have some "far" and "near" keywords on call, jmp, etc. and
other issues those instructions

0.11 has "<unsupported size>" on lgdt,lidt,sidt
0.11 has the string instructions correct

Second, differences between 0.10 and Ndisasm (2.03.01):
--
0.10 has similar problems as above versus ndisasm
0.10 has "xchg ax,ax" (actually correct...) for "nop"

Third, differences between 0.11 and Ndisasm (2.03.01):
--
0.11 has differences in size keywords, and "short" and "near"

0.11 has problem with size keyword for "bound":
bound ax,word:word [0xffff]
bound eax,dword:dword [0xffff]

0.11 has qword:
cmpxhg8b qword [0xffff]

0.11 has <unsupported size> on lgdt,lidt,sidt:
lgdt word:<unsupported size> [0xffff]
lidt word:<unsupported size> [0xffff]
lidt word:<unsupported size> [0xffff]

0.11 has "xchg ax,ax" (correct...) for "nop"

0.11 has sgdt as:
sgdt word:qword [0xffff]

Except for some of the size and "short"/"near" keywords differences versus
Ndisasm, I'd say looking pretty good for 16-bit! At this point, you can
throw some random binaries at Ndisasm 0.98.39 and 2.03.01 and your versions
of crudasm and look for differences or problems such as bad or invalid
decodes.


Rod Pemberton

From: Willow on
The latest version is here: http://code.google.com/p/vm64dec/downloads/list

I think those size problems are fixed now. Here is a list of known
erratta:

1. It never prints "near". This is the default so it shouldn't matter.
2. aad, aam have an argument. This is actually valid.
3. xchg needs to be turned into nop when applicable. Need to do this!
4. Need to add support for extended opcodes such as 'd9 f4' (fxtract)
and '66 0f 38 01' (phaddw). How to do this? Do any extended opcodes
have a modr/m? Can we pretend the opcode byte is an immediate byte?

Aside from these issues, the project should be "done" -- I certainly
hope so!

And thanks for testing it!

Willow
From: Alexei A. Frounze on
On Aug 16, 8:59 pm, Willow <wrschlan...(a)gmail.com> wrote:
> The latest version is here:http://code.google.com/p/vm64dec/downloads/list
>
> I think those size problems are fixed now. Here is a list of known
> erratta:
>
> 1. It never prints "near". This is the default so it shouldn't matter.
> 2. aad, aam have an argument. This is actually valid.
> 3. xchg needs to be turned into nop when applicable. Need to do this!

There's also a special case in 64-bit mode. Depending on the rex
prefix (AFAIR, bit B) it can be either NOP or XCHG.

> 4. Need to add support for extended opcodes such as 'd9 f4' (fxtract)
> and '66 0f 38 01' (phaddw). How to do this? Do any extended opcodes
> have a modr/m? Can we pretend the opcode byte is an immediate byte?

Normally, almost every instruction that has mod<3 has a memory operand
(some exceptions: 3dNow! instructions have a dummy memory operand (as
well as some multibyte NOP instructions) and MOV CR/DR according to
the documentation ignore mod). Instructions that have an opcode
extension in the reg field (denoted as /number) can have the following
operands according to the ModR/M byte value (there may be implicit,
non-ModR/M ones, too):
- register AND register/memory, e.g. ADD
- memory AND register, e.g. BOUND
- memory OR register, e.g. LTR
- memory only, e.g. XRSTOR
- none, e.g. LFENCE, VMCALL

FPU instructions are generally encoded the same way as non-FPU
instructions if there's a memory operand (i.e. mod<3). The same is
often true about FPU instructions that don't have a memory operand,
but not always. E.g. there's FSTSW AX that seems to be valid/existent
only for AX and the r/m field often denotes not some register but a
particular instruction (i.e. further extends the opcode), e.g. F2XM1
through FCOS.

Alex