Prev: announce: my very first disassembler now available (GPL)
Next: Win32 non blocking console input?
From: Frank Kotler on 18 Aug 2008 21:03 Rod Pemberton wrote: > "Frank Kotler" <fbkotler(a)verizon.net> wrote in message > news:g8an22$v6v$1(a)aioe.org... >> Yeah, I realize I'm saying the manual's wrong... but that's what it >> seems to do... > > Does "what it seems to do" affect how it should be disassembled? Well, yes... I guess. Ideally, I suppose a disassembly ought to look as much as possible like the original author's source code - which we may or may not have. This assumes that the author of the original code was using the same assembler. I disassemble executables made from Herbert's Lindela code all the time to see what it "really looks like"... :) Ultimately, machine code ought to disassemble the way the author of the disassembler *wants* it to disassemble! > If an old > cpu uses reg32 in the instruction operation and a new cpu uses reg16, how > should you represent the disassembly of the instruction? > > Or, should instructions be disassembled to match the manuals? If to match > the manuals, which manual when there is a discrepancy or operational change? Ah! Here we get into "preferred cpu". Ndisasm has the "-p" switch. It has just come to my attention that this is totally undocumented! I ought to be writing that up, instead of arguing with you. Maybe later. Parameters seem to be "intel", "amd", "cyrix", or "idt" == "centuar" == "winchip", from what I can see (case sensitive!). I'll have to delve further into the code to see exactly what it does... (the current Nasm development team is good about "the job isn't done until the documentation is written" - hasn't always been so). That's one way a disassembler could handle such discrepancies. Being script-driven, Willow's "Crudasm" might have even greater possibilities for flexibility, I dunno... This is a different issue from whether 2.03.01 is "wrong" about lar/lsl... which I will get back to shortly. Best, Frank
From: Frank Kotler on 18 Aug 2008 21:59 Rod Pemberton wrote: > "Frank Kotler" <fbkotler(a)verizon.net> wrote in message > news:g8an22$v6v$1(a)aioe.org... >> Interesting. What would be the "meaning" of a (16-bit!) selector in a >> 32-bit reg? > > Is this a trick question? Not intended to be. >> If the upper bits are "garbage", it won't work? > > Are you implying that 16-bit selectors in 32-bit mode as implemented > currently don't actually work? I certainly didn't intend to imply that! Let me try this again... Assume 32-bit code... You write: lsl eax, ebx I write: lsl eax, bx What's the difference? Same machine code, so obviously same behavior. (FWIW, ald disassembles this a "ebx" - like Ndisasm 0.98.39 - objdump and gdb disassemble it as "%bx"...) You can say it's "reading" all of ebx, and discarding the high 16 bits if you want. No difference... >> This looks >> like a change from m32 to m16 somewhere between 2003 and 2006. > > Little endian with 16-bit selectors. Basically, irrelevant if the cpu > ignores retrieving the upper 16-bits of a 32-bit value from memory... .... Ah, but here it *does* make a difference if the instruction reads 32 bits and discards 16! Only if the 16 bits in question is the last 16 bits of readable memory, to be sure... This is the experiment that Phil proposed. On the only processor I've tested it on - P4 - it definitely reads only 16-bits. >> Did the >> behavior of the processor change, > > According to the manual for memory source operands, yes. Right. >> or did they fix an error in the manual? > > No, I don't believe so. If I see results of an *experiment* that shows lar/lsl reading 32 bits with some CPU, I'll agree with you. Until then, I suspect they fixed an error in the manual. >> I would "expect" the source operand to be 16-bits, > > ABSOLUTELY NOT! If the cpu is in 32-bit mode and the source operand is a > register, the register size is either 32-bits or 8-bits. > >> regardless of the >> processor mode or size of the destination register, a selector being 16 >> bits. > > In 16-bit mode, you have two register sizes: 8-bit and 16-bit. In 32-bit > mode, you have two register sizes: 8-bit and 32-bit. ABSOLUTELY NOT! Are you implying that: lsl ax, bx won't work in 32-bit code? (As Wolfgang points out, it's probably not useful). >> As Phil suggests, we could conduct the experiment. I have done so, >> both loading upper bits of ebx with garbage, and putting a 16-bit >> variable in the last two bytes of valid memory. I conclude that the >> source operand is 16 bits... > > I think that's a wrong conclusion. You can conclude that only 16-bits of > the source operand are used as a selector. But, you can't determine what > size 32-bits or 16-bits was read in order to obtain the 16-bit selector. I certainly can! If it were reading 32 bits, it'd segfault! > My whole point is how do you get "bx", a 16-bit register, instead of "ebx" > for an instruction decode which can only return an 8-bit or 32-bit register? You got me! What CPU would that be??? Best, Frank (ugly, but I believe it proves what I claim it proves...) global _start section .text _start: nop mov ebx, -1 mov bx, ds lsl eax, bx lsl eax, ebx lsl eax, [last_word] mov eax, [last_word] ; segfault! (but not with ax) mov eax, 1 int 80h align 16 code_size equ $ - $$ section .data times 4096 - code_size + 20h - 2 db 0 last_word dw 2Bh
From: Frank Kotler on 18 Aug 2008 23:16 Rod Pemberton wrote: > "Rod Pemberton" <do_not_have(a)nohavenot.cmm> wrote in message > news:g8b8t9$bg0$1(a)aioe.org... >> My whole point is how do you get "bx", a 16-bit register, instead of "ebx" >> for an instruction decode which can only return an 8-bit or 32-bit > register? > > > For arpl, lldt, lmsw, ltr, str, verr, and verw instructions, the manuals > consistently present one instruction decode line (16-bit only). One > exception is that new AMD manuals presents 16-bit, 32-bit, and 64-bit > decodes for str. > > For lar and lsl, the manuals consistently present at least two instruction > decode lines (16-bit, 32-bit). Of course! lar and lsl have *two* operands. The destination is 16 or 32 bits depending on processor mode (or disassembler mode). The source is 16 bit... > Given the presentation of the instruction decode lines and the 8/16 and 8/32 > etc. register separation of the cpu, I really question this! > I don't believe the decode for lsl, and > lar should be treated the same, i.e., using a fixed 16-bit register, as > those for arpl, lldt, lmsw, ltr, str, verr, and verw. > > AFAICT, although there are some instructions fixed to 16-bit registers only, > there currently is no 32-bit instruction that uses _both_ a 32-bit register > and a 16-bit register as Ndisasm is "doing to" lar and lsl... mov eax, ds ? Consider, in 32-bit code: mov ds, ax Does the assembler emit an operand size override byte? Some do, and make you write "mov ds, eax" to avoid it. But it's a "nop" - the instruction behaves exactly the same with or without it. The Intel manual used to say "most assemblers" behave this way (emit the 66), but a quick survey found that Masm did it - Nasm used to, but doesn't - Tasm, Gas, Fasm(?) did not. If Masm hasn't "fixed" it (I'll bet they have! Someone with a recent Masm check!) I think they're "all alone". If the Intel manual still says "most assemblers", that's *another* error! :) mov ds, ax mov ds, eax ; same thing Incidentally, "mov ds, [last_word]" doesn't segfault - haven't tried that one myself (lately), but that was the result of research the Nasm development team did... mov ax, ds mov eax, ds ; two different instructions! I really think lar/lsl are the "same idea". From 8E D8 I would disassemble (in 32-bit mode) ax, not eax. And from 0F 03 C3 I would disassemble bx, not ebx. Since it doesn't make the slightest difference, it's pretty much "disassembler author's choice" - like je vs jz... How would you disassemble EC? "in al, edx"??? (another case where we don't need a 66 to use a 16-bit register in 32-bit mode...) "in ax, dx" takes a 66, "in eax, dx" does not (in 32-bit mode - opposite in 16-bit mode) - source is still 16-bits! Best, Frank
From: Nathan Baker on 18 Aug 2008 23:32 Frank Kotler wrote: >> Given the presentation of the instruction decode lines and the 8/16 >> and 8/32 >> etc. register separation of the cpu, > > I really question this! > Me too. He must have had Betov for a teacher! :) Nathan.
From: Rod Pemberton on 19 Aug 2008 06:24
"Nathan Baker" <Nathan(a)aioe.org> wrote in message news:g8des6$3ct$1(a)aioe.org... > Frank Kotler wrote: > >> Given the presentation of the instruction decode lines and the 8/16 > >> and 8/32 > >> etc. register separation of the cpu, > > > > I really question this! > > > > Me too. > "Executable code segment. The flag is called the D flag and it indicates the default length for effective addresses and operands referenced by instructions in the segment. If the flag is set, 32-bit addresses and 32-bit or 8-bit operands are assumed; if it is clear, 16-bit addresses and 16-bit or 8-bit operands are assumed." IA-32 Intel(R) Architecture Software Developer's Manual Volume 3A:System Programming Guide, Part 1, pg 3-14, 2006 Mixed mode programming (i.e., overrides) is one way to change the operational operand and address sizes, but this doesn't change the way the instructions are *encoded*, as 8/16 or 8/32 etc. Rod Pemberton |