From: NathanCBaker on
On Aug 12, 12:56 am, Willow <wrschlan...(a)gmail.com> wrote:
> On Aug 11, 7:41 pm, NathanCBa...(a)gmail.com wrote:> On Aug 11, 11:08 pm, Willow <wrschlan...(a)gmail.com> wrote:
>
> > > Now that you can actually run it... what do you think?
>
> > I like it.  But I'd like the columns to be closer together.  Here is
> > what I tested it with:
>
> Which columns? From the script file?
>

In the output. I suggest shortening the space between the "00000100"
and the "mov" and the "ax,0x0013" so that it is easier on the eyes in
a monospaced font.

>
> I modified the script file so it now produces this output for the same
> input (notice most of the redundant sizes are no longer there):
>
> 00000100        mov     ax,0x0013
> 00000103        int     0x10
> 00000105        mov     di,0xa000
> 00000108        mov     es,di
> 0000010a        mov     di,0x7c5f
> 0000010d        mov     byte [es:di],0x0f
> 00000111        mov     ah,0x00
> 00000113        int     0x16
> 00000115        cmp     ah,0x01
> 00000118        je      short 0x0144
> 0000011a        cmp     ah,0x1f
> 0000011d        jne     short 0x0124
> 0000011f        mov     ax,0x0013
> 00000122        int     0x10
> 00000124        cmp     ah,0x1e
> 00000127        jne     short 0x012d
> 00000129        sub     di,0x0140
> 0000012d        cmp     ah,0x2c
> 00000130        jne     short 0x0136
> 00000132        add     di,0x0140
> 00000136        cmp     ah,0x33
> 00000139        jne     short 0x013c
> 0000013b        dec     di
> 0000013c        cmp     ah,0x34
> 0000013f        jne     short 0x0142
> 00000141        inc     di
> 00000142        jmp     short 0x010d
> 00000144        mov     ax,0x0003
> 00000147        int     0x10
> 00000149        ret     word 0x0000

That is better. Looking good! I think 'crudasm' has potential to
give 'ndisasm' some serious competition. But, shh, don't let anyone
on the Nasm team know that I said that. :)

Nathan.
From: Alexei A. Frounze on
On Aug 11, 7:05 pm, Willow <wrschlan...(a)gmail.com> wrote:
> I just finished my very own disassembler, written from scratch. It
> takes a 750-line input script file that specifies the x86 and x86-64
> instruction set, and produces a disassembler. Unlike other
> disassemblers, mine is enjoyable to work on because it is coherent,

Nope, because ... it's yours :)

> you have a script file that makes sense (to me at least :-) rather
> than a bunch of incoherent and often buggy opcode tables copied from
> an Intel manual.

You have AMD manuals too. You can always crosscheck them.

> You should check it out and let me know what you think!

Not too bad. Although I wouldn't throw exceptions in a disassembler
and print errors (IMO, the errors should translate to opcode bytes and
question marks (for the instruction) and error return codes, which
then the caller can use as they wish (continue disassembly or print an
error or whatever)). I would also avoid unreadable things like
x86c_decoder_table[]. And... would make it output to a buffer first so
the output can be reused immediately without the need to capture
stdout.

If you want this code to be reusable, it needs to have very clear APIs
(especially for the input and output), and the logic that controls the
disassembly engine should be flexible and controllable (or better yet
specifiable) by the caller. Furthermore, I'd move the script into the
main program so it doesn't get corrupted or exploited and the
disassembler can be used in any environment (including debugger with
possibly limited disk access (say, the debugger runs at a priority
level higher than the disk interrupts and hence using the disk I/O
would just hang the OS)).

> It's called crudasm, the crude disassembler. Right now it only works
> in 16 and 32 bit mode, and only supports raw binary files (e.g. no PE
> etc. files).
>
> You can find it here:http://code.google.com/p/vm64dec/downloads/list
>
> I plan to update crudasm to make it more intelligent in the next
> release.
> In the future I will add floating point, MMX, SSE, etc. instructions
> but they're not supported yet. I will also update the script file to
> contain semantics not just syntax so the disassembler can be like
> sourcer, e.g. it knows mov ax,5 loads ax to 5, etc.

More intelligent? :) How much more? I think the first goal would be to
make it disassembly everything and make the output reassembleable
(does this word exist?:). Then you'd probably want to disambiguate the
disassembly (maybe through a special command option) so that the
reassembly gives you the exact binary that you disassembled.

> Although I am proud of this,

I'm sure you are. No doubt.

> and I hope I don't get flamed for being a
> newbie or something...It took a lot of work to get to this point.
> Hopefully it's all downhill from here. You're probably wondering, why
> another diassembler? There is no good reason, I wrote this for the
> experience of developing my own tool not because the world needs
> another disassembler.

Good that you understand it. :)

> Mine is not as good as the one that comes with
> nasm (less opcodes) or anything but it's my very own program!

Yep. Compare them against yours.

> If you do download it, check out x86c/script.txt and let me know what
> you think... if you have any questions about what the fields mean (the
> script is a space-separated list) then ask me.

Alex
From: Wolfgang Kern on

Willow posted:

> I just finished my very own disassembler, written from scratch. It
> takes a 750-line input script file that specifies the x86 and x86-64
> instruction set, and produces a disassembler.

Are you sure to cover all instructions incl 64's with 750 lines ?
I never counted all prototype variations, I think there are much more.

> Unlike other disassemblers, mine is enjoyable to work on because it
> is coherent, you have a script file that makes sense (to me at least :-)

Fine, my own 'DisAss' (see HEXTUTOR) uses replaceable tables instead,
so it can be used for other architectures too.

> rather than a bunch of incoherent and often buggy opcode tables copied
> from an Intel manual.

AMD manuals are the better source for this.

> You should check it out and let me know what you think!
> It's called crudasm, the crude disassembler. Right now it only works
> in 16 and 32 bit mode, and only supports raw binary files (e.g. no PE
> etc. files).

> You can find it here: http://code.google.com/p/vm64dec/downloads/list

Oh, unzipped 2.3 MB show immediate that it's written HLL-styled :)

> I plan to update crudasm to make it more intelligent in the next
> release.
> In the future I will add floating point, MMX, SSE, etc. instructions
> but they're not supported yet. I will also update the script file to
> contain semantics not just syntax so the disassembler can be like
> sourcer, e.g. it knows mov ax,5 loads ax to 5, etc.

When I look at the ins/...c, is your final target a C-resourcer ?

> Although I am proud of this, and I hope I don't get flamed for being a
> newbie or something...

We all are proud of our own work :)

> It took a lot of work to get to this point.

I can confirm this from own experience.

> Hopefully it's all downhill from here. You're probably wondering, why
> another diassembler? There is no good reason, I wrote this for the
> experience of developing my own tool not because the world needs
> another disassembler. Mine is not as good as the one that comes with
> nasm (less opcodes) or anything but it's my very own program!

Yeah, and a very good method for learning CPU internals anyway.

> If you do download it, check out x86c/script.txt and let me know what
> you think... if you have any questions about what the fields mean (the
> script is a space-separated list) then ask me.

Seems you took a similar approach as I started my DisAss, with enough
information to later feed a fully automated but static code analyser.

Not to pick on your code style, but my whole disassembler core is
shorter (20 Kbyte machine code incl.tables, FPU/XMM/MMX/SSE2, text-
buffers, register values/stack-trace and a detailed info struct)
than your 27 KB script :)

__
wolfgang



From: Karel Lejska on
On Aug 12, 4:05 am, Willow <wrschlan...(a)gmail.com> wrote:
> I just finished my very own disassembler, written from scratch. It
> takes a 750-line input script file that specifies the x86 and x86-64
> instruction set, and produces a disassembler. Unlike other
> disassemblers, mine is enjoyable to work on because it is coherent,
> you have a script file that makes sense (to me at least :-) rather
> than a bunch of incoherent and often buggy opcode tables copied from
> an Intel manual.
>
> You should check it out and let me know what you think!
> It's called crudasm, the crude disassembler. Right now it only works
> in 16 and 32 bit mode, and only supports raw binary files (e.g. no PE
> etc. files).
>
> You can find it here:http://code.google.com/p/vm64dec/downloads/list
>
> I plan to update crudasm to make it more intelligent in the next
> release.
> In the future I will add floating point, MMX, SSE, etc. instructions
> but they're not supported yet. I will also update the script file to
> contain semantics not just syntax so the disassembler can be like
> sourcer, e.g. it knows mov ax,5 loads ax to 5, etc.
>
> Although I am proud of this, and I hope I don't get flamed for being a
> newbie or something...It took a lot of work to get to this point.
> Hopefully it's all downhill from here. You're probably wondering, why
> another diassembler? There is no good reason, I wrote this for the
> experience of developing my own tool not because the world needs
> another disassembler. Mine is not as good as the one that comes with
> nasm (less opcodes) or anything but it's my very own program!
>
> If you do download it, check out x86c/script.txt and let me know what
> you think... if you have any questions about what the fields mean (the
> script is a space-separated list) then ask me.

Hi Willow,

so you developed another format to store the information? If you know
XML/XSL, you could consider generating the disassembler out of this
file:

http://ref.x86asm.net/x86reference.xml

I spent quite a lot time with this.

Or you could be interested in the HTML editions at least:

http://ref.x86asm.net/coder.html
http://ref.x86asm.net/geek.html

They should answer all your questions regarding validity/support for
any opcode.
From: Herbert Kleebauer on
Willow wrote:

> I just finished my very own disassembler, written from scratch. It
>
> You should check it out and let me know what you think!


Just dissasembled the different addressing modes, but you generate the
same source for different binaries:


not word [es:bx+0x0002] 26 f7 57 02
not word [es:bx+0x0002] 26 f7 97 0002
not word [es:dword eax+0x00000002] 67 26 f7 50 02
not word [es:dword eax+0x00000002] 67 26 f7 90 00000002



I think it would be better to use:

not word [es:bx+0x02] 26 f7 57 02
not word [es:bx+0x0002] 26 f7 97 0002
not word [es:dword eax+0x02] 67 26 f7 50 02
not word [es:dword eax+0x00000002] 67 26 f7 90 00000002





not al f6 d0 not.b r0
not ah f6 d4 not.b m0
not ax f7 d0 not.w r0
not eax 66 f7 d0 not.l r0
not byte [es:0x0064] 26 f6 16 0064 not.b 100{s1}
not word [es:bx] 26 f7 17 not.w (r3.w){s1}
not dword [es:si] 66 26 f7 14 not.l (r5.w){s1}
not byte [es:di] 26 f6 15 not.b (r6.w){s1}
not word [es:bx+si] 26 f7 10 not.w (r3.w,r5.w){s1}
not dword [es:bx+di] 66 26 f7 11 not.l (r3.w,r6.w){s1}
not byte [es:bp+si] 26 f6 12 not.b (r4.w,r5.w){s1}
not word [es:bp+di] 26 f7 13 not.w (r4.w,r6.w){s1}
not word [es:bx+0x0002] 26 f7 57 02 not.w 2.b(r3.w){s1}
not word [es:si+0x0002] 26 f7 54 02 not.w 2.b(r5.w){s1}
not word [es:di+0x0002] 26 f7 55 02 not.w 2.b(r6.w){s1}
not word [es:bx+si+0x0002] 26 f7 50 02 not.w 2.b(r3.w,r5.w){s1}
not word [es:bx+di+0x0002] 26 f7 51 02 not.w 2.b(r3.w,r6.w){s1}
not word [es:bp+si+0x0002] 26 f7 52 02 not.w 2.b(r4.w,r5.w){s1}
not word [es:bp+di+0x0002] 26 f7 53 02 not.w 2.b(r4.w,r6.w){s1}
not word [es:bx+0x0002] 26 f7 97 0002 not.w 2(r3.w){s1}
not word [es:si+0x0002] 26 f7 94 0002 not.w 2(r5.w){s1}
not word [es:di+0x0002] 26 f7 95 0002 not.w 2(r6.w){s1}
not word [es:bx+si+0x0002] 26 f7 90 0002 not.w 2(r3.w,r5.w){s1}
not word [es:bx+di+0x0002] 26 f7 91 0002 not.w 2(r3.w,r6.w){s1}
not word [es:bp+si+0x0002] 26 f7 92 0002 not.w 2(r4.w,r5.w){s1}
not word [es:bp+di+0x0002] 26 f7 93 0002 not.w 2(r4.w,r6.w){s1}
not word [es:dword 0x00000064] 67 26 f7 15 00000064 not.w 100.l{s1}
not word [es:dword eax] 67 26 f7 10 not.w (r0){s1}
not word [es:dword eax+0x00000002] 67 26 f7 50 02 not.w 2.b(r0){s1}
not word [es:dword eax+0x00000002] 67 26 f7 90 00000002 not.w 2(r0){s1}
not word [dword eax+edx] 67 f7 14 10 not.w (r0,r1)
not word [dword eax+edx*2] 67 f7 14 50 not.w (r0,r1*2)
not word [dword eax+edx*4] 67 f7 14 90 not.w (r0,r1*4)
not word [dword eax+edx*8] 67 f7 14 d0 not.w (r0,r1*8)
not word [dword eax+0x00000002] 67 f7 90 00000002 not.w 2(r0)
not word [dword eax*2+0x00000002] 67 f7 14 45 00000002 not.w 2(r0*2)
not word [dword eax*4+0x00000002] 67 f7 14 85 00000002 not.w 2(r0*4)
not word [dword eax*8+0x00000002] 67 f7 14 c5 00000002 not.w 2(r0*8)
not word [dword eax+edx+0x00000002] 67 f7 54 10 02 not.w 2.b(r0,r1)
not word [dword eax+edx*2+0x00000002] 67 f7 54 50 02 not.w 2.b(r0,r1*2)
not word [dword eax+edx*4+0x00000002] 67 f7 54 90 02 not.w 2.b(r0,r1*4)
not word [dword eax+edx*8+0x00000002] 67 f7 54 d0 02 not.w 2.b(r0,r1*8)
not word [dword eax+edx+0x00000002] 67 f7 94 10 00000002 not.w 2(r0,r1)
not word [dword eax+edx*2+0x00000002] 67 f7 94 50 00000002 not.w 2(r0,r1*2)
not word [dword eax+edx*4+0x00000002] 67 f7 94 90 00000002 not.w 2(r0,r1*4)
not word [dword eax+edx*8+0x00000002] 67 f7 94 d0 00000002 not.w 2(r0,r1*8)