NASM 0.98.39 vs. NASM 2.03.01 disassembly [ASM]

Prev: announce: my very first disassembler now available (GPL)
Next: Win32 non blocking console input?

From: H. Peter Anvin on 2 Sep 2008 01:44

Frank Kotler wrote:
>
> Okay... Now I see the int 80h... So if I've got the "second simplest" C
> program - instead of returning from "main", it does "exit(42);"... this
> goes through "__syscall_common" (arch/i386/syscall.S - in the library)?
> This copies our parameter into ebx, and the sys_call number into eax,
> and does int 80h, I guess.
>
> Maybe I'm confused about this, too, but I think the int 80h vector takes
> us into entry.S (arch/i386/entry.S - in the kernel code)? This looks
> like it eventually does "call [sys_call_table + eax * 4]" (after putting
> ebx back on the stack...). In this case, that's "sys_exit" - in
> kernel/exit.c (?). That shifts our parameter up by 8, and calls
> "do_exit". I'm guessing that it's the call to "schedule()" after marking
> our task "dead" that actually makes us go away...(?).
>
> Whew! I was under the impression that the C library called "sys_exit"
> (say) directly, without going through the int 80h rigamarole. I was
> mistaken!
>

Indeed. sys_exit() is a function in kernel space... the *ONLY* ways you
can call a kernel space function from user space is by doing a system
call. On i386, you can do that via int $0x80, or by calling into the
vdso (a piece of memory managed by the kernel, but available readonly to
userspace), where the kernel will have placed either "int $0x80",
"sysenter" or "syscall", depending on what the fastest way to do a
system call is on that particular CPU.

On x86-64, the vdso will typically contain the syscall instruction, but
some system calls (like gettimeofday) are actually implemented fully in
user space using kernel-managed tables under certain circumstances,
which are controlled by the kernel.

> So there *is* an "advantage" to using the int 80h interface ourselves?

Sort of - kind of. First of all, you lose the advantage of the vdso
(sysenter is usually a lot faster than int $0x80 on the CPUs that have
it, but because it loses the return address, it *must* be called from
the vdso and not directly.) Second, you lose the functionality that is
provided by the C library, some of which is more of a system call layer.
There is some preliminary discussions about splitting glibc into libc,
which contains the C runtime, and libkernel, which would be maintained
by the kernel team and contain the code immediately surrounding the
system calls.

> This brings us back to Herbert's question (sorry, Herbert... and Rod...
> you were right), where do library developers find documentation on the
> int 80h interface? Where did you "first" learn this stuff, Peter?

Well, I have done Linux kernel development for 16 years and so I'm quite
familiar with the code; I have been involved in architecting some of it
and so forth. That being said, the three main sources of documentation
are the kernel source itself, man pages section 2, and the ELF psABI for
the specific architecture.

When I wrote klibc I specifically wanted to minimize the effort to port
it between Linux architectures, and so I ended up abstracting out as
much as possible. This makes the klibc code hopefully useful as a
secondary source.

-hpa

From: Rod Pemberton on 3 Sep 2008 09:47

"Herbert Kleebauer" <klee(a)unibwm.de> wrote in message
news:48B4540F.25C76D0C(a)unibwm.de...
> Rod claimed, that "lsl eax, ebx" should be used instead of
> "lsl eax, bx" because in 32 bit mode 32 bit registers are used
> by default.

As have all of you! Do you remember saying this?

Herbert Kleebauer:
"I don't see any difference in ldsl and for example the
add instruction."

Frank Kotler:
"... in the reg, reg forms [of LSL], the size of the registers
must match."

Chuck Crayne:
"...disassembler might be show case 1 as lsl eax,ebx
[instead of lsl eax,bx]."

Randall Hyde:
"...you specify the 16-bit version in a 16-bit
segment and the 32-bit version in a 32-bit segment."

Those were from 2006 on clax:
http://groups.google.com/group/comp.lang.asm.x86/browse_thread/thread/15837b30374c5746/a13111f0554abe81?#a13111f0554abe81

> Frank proofed[sic], that when "lsl" is used with a memory
> operand only a 16 bit memory access occurs

Frank/Chuck/et al. proved on newer processors that the instruction uses
16-bits for the memory read. This is what the current Intel/AMD manuals
state. They didn't prove that older processors don't read/write 32-bits from
memory as their older manuals describe. Nor, did they prove anything about
the size of the source register which was the entire issue. But, even if
one could prove that all cpu's only accessed a 16-bit register, it still has
nothing to do with the fact that the assembly syntax should be 32-bit for
32-bit mode for these two instructions as all manuals indicate.

> and it doesn't make
> much sense to claim that in the register case the full 32 bit
> register is accessed and the higher halve then is discarded.

What claim? This is _exactly_ what is stated in the Intel manuals from at
least 2003 to 2008 (and likely longer). Under LAR/LSL:

"1. For all loads (regardless of source or destination sizing) only bits
16-0 are used. Other bits are ignored."

So, you're saying that the Intel manuals have been continuously wrong for at
least five years?... They've rewritten the manuals over and over and
produced new manuals for 64-bit, but haven't noticed this specific error in
half a decade? To me, "it doesn't make much sense to claim that in that in
the register case the full 32 bit register" isn't accessed since the 386
manual explicitly indicates this is so...

Rod Pemberton

PS. Who is Nasm64developer?

From: Frank Kotler on 3 Sep 2008 13:47

Rod Pemberton wrote:

....
> Frank Kotler:
> "... in the reg, reg forms [of LSL], the size of the registers
> must match."

Immediately preceded by "Nasm does it differently"... If you continue
down this thread, you'll see that the Nasm development team concluded
that this was a "bug":

--------------------
Nasm64developer has submitted a patch to "correct" Nasm's behavior. With
this patch applied, "lsl eax, bx" assembles correctly, but "lsl eax,
ebx" is an error... breaking existing code... if any...
---------------------

(current versions accept either, disassemble to "bx")

....
> Frank/Chuck/et al. proved on newer processors that the instruction uses
> 16-bits for the memory read. This is what the current Intel/AMD manuals
> state. They didn't prove that older processors don't read/write 32-bits from
> memory as their older manuals describe. Nor, did they prove anything about
> the size of the source register which was the entire issue.

This is true. (Merely out of curiosity, I'd like to see it done on an
older processor. Doesn't "make sense" to me that it would read 32 bits,
but maybe... I don't think "must make sense to Frank" is in Intel's
design criteria... I'd bet on "error in manual"... but I wouldn't bet
more than I could afford to lose...)

....
> PS. Who is Nasm64developer?

I wish I could tell ya, but he wishes to remain anonymous. You asked
earlier in this thread if I believed this "just because Nasm64developer
said so". Yes, pretty much...

For those who don't follow the Nasm bugtracker, here's Nasm64developer's
latest entry in response to your bug report:

-------------------------
While the dst register follows opsize (i.e. is affected by
mode, 66h, and REX.W), only the lowest 16 bits of the src
operand are accessed. (In case of memory, one can prove it
by means of #DB, #GP, or #PF. In case of a register, it's
impossible to see the difference, unless the processor has
a partial register access stall.)

That is, the src operand is not affected by opsize.

The assembler permits 16/32/64-bit src registers -- purely
for courtesy reasons.

What the disassembler should emit, is a matter of debate.
Currently it picks the first of the 3 insns.dat entries --
that is, the one with the 16-bit src register. This is an
intentional choice on my part; it reflects the Ew notation
which is used as the most concise form of representation.

Various manuals do suggest that src reg follow the size of
dst reg. If you prefer that choice, despite the fact that
it doesn't reflect the instruction's actual operation, you
can easily shuffle the ND flag of the reg,reg entries. If
they work as advertised, they will suppress disassembly of
an entry.
--------------------------------------

(doesn't quite say what will happen if the ND flag *doesn't* work as
advertised... probably won't make smoke... :)

I've been arguing the other side of this, but I should say that I think
your viewpoint has a good deal of merit as well. That *is* how we would
normally disassemble "those bits", and there's something to be said for
doing it the same way all the time. I just think it's "even better" to
treat lsl/lar as an exception to the general rule. Since only 16 bits
are *used* (for src), we should only "say" 16 bits...

Wouldn't break much (any) of *my* code if Nasm left out lar/lsl
entirely, how 'bout you? If it *is* a bug, seems like a pretty harmless one.

Best,
Frank

From: Herbert Kleebauer on 3 Sep 2008 15:24

Rod Pemberton wrote:
> "Herbert Kleebauer" <klee(a)unibwm.de> wrote in message

> > Rod claimed, that "lsl eax, ebx" should be used instead of
> > "lsl eax, bx" because in 32 bit mode 32 bit registers are used
> > by default.

> As have all of you!

What have we all claimed? That in 32 bit mode 32 bit registers are
used by default (which surely is true) or that therefore "lsl eax, ebx"
should be used instead of "lsl eax, bx" (which surely is not true).

> Do you remember saying this?

> Herbert Kleebauer:
> "I don't see any difference in ldsl and for example the
> add instruction."

I don't understand what this quote has to do with it. Here the full
quote (with the corrected opcodes):

====================================================================

I don't understand what you want to say. I don't see any difference
in ldsl and for example the add instruction. If you want a 32 bit
operation in a 16 bit segment or a 16 bit operation in a 32 bit
segment, then you have to prefix the instruction by 66.

seg16

00000000: 0f 03 d0 ldsl.w r0,r1
00000003: 66 0f 03 d0 ldsl.l r0,r1

00000007: 01 c2 add.w r0,r1
00000009: 66 01 c2 add.l r0,r1

seg32

0000000c: 66 0f 03 d0 ldsl.w r0,r1
00000010: 0f 03 d0 ldsl.l r0,r1

00000013: 66 01 c2 add.w r0,r1
00000016: 01 c2 add.l r0,r1

And for an address size switch you have to use 67:

seg16

00000000: 0f 03 07 ldsl.w (r3.w),r0
00000003: 67 0f 03 03 ldsl.w (r3.l),r0
00000007: 66 0f 03 07 ldsl.l (r3.w),r0
0000000b: 67 66 0f 03 03 ldsl.l (r3.l),r0

seg32

00000010: 67 66 0f 03 07 ldsl.w (r3.w),r0
00000015: 66 0f 03 03 ldsl.w (r3.l),r0
00000019: 67 0f 03 07 ldsl.l (r3.w),r0
0000001d: 0f 03 03 ldsl.l (r3.l),r0

====================================================================

This says, that with ldsl you can use the data and address size
prefix the same way as with the add instruction. In a 32 bit
segment the data size prefix makes the operation a 16 bit
operation which only modifies the lower halve of the result
register. But in any case, the source operand is a 16 bit
operand, independent of any size prefix.

> > Frank proofed[sic], that when "lsl" is used with a memory
> > operand only a 16 bit memory access occurs
>
> Frank/Chuck/et al. proved on newer processors that the instruction uses
> 16-bits for the memory read. This is what the current Intel/AMD manuals
> state. They didn't prove that older processors don't read/write 32-bits from
> memory as their older manuals describe.

I checked it on a 845 MHz Celeron which surely isn't a newer processor.

> Nor, did they prove anything about
> the size of the source register which was the entire issue. But, even if
> one could prove that all cpu's only accessed a 16-bit register, it still has

You can't prove this because there is absolutely no difference between
only accessing the lower halve of the register and accessing the the full
register and then discarding the upper halve.

> nothing to do with the fact that the assembly syntax should be 32-bit for
> 32-bit mode for these two instructions as all manuals indicate.

The manual describes what happens: a 16 bit selector stored in the lower
halve of a general purpose register is used to access a segment descriptor,
extract the 32 bit size of the segment from this descriptor and store this
32 bit value in the target register. Now the writer of the assembler has
to find a symbolic representation for this instruction. And if I'm asked which
one of these two lines

lsl eax, bx
lsl eax, ebx

better describes the instruction, then I surely would select the first one.

> > and it doesn't make
> > much sense to claim that in the register case the full 32 bit
> > register is accessed and the higher halve then is discarded.
>
> What claim? This is _exactly_ what is stated in the Intel manuals from at
> least 2003 to 2008 (and likely longer). Under LAR/LSL:
>
> "1. For all loads (regardless of source or destination sizing) only bits
> 16-0 are used. Other bits are ignored."
>
> So, you're saying that the Intel manuals have been continuously wrong for at
> least five years?... They've rewritten the manuals over and over and
> produced new manuals for 64-bit, but haven't noticed this specific error in
> half a decade? To me, "it doesn't make much sense to claim that in that in
> the register case the full 32 bit register" isn't accessed since the 386
> manual explicitly indicates this is so...

The manual describes the effect of the instruction (and both versions
are a correct description) but we are talking about a proper symbolic
representation of the instruction.

There is also an instruction which sign extends a 16 bit register (or
if you like, the lower halve of a 32 bit register) to a 32 bit value
and store it in a register: I use:

0f b7 c3 movu.wl r3,r0

Now, which one do you prefer in your syntax:

movsx eax,bx
movsx eax,ebx

I suppose the second one because this is the same as with the lsl instruction.

But we also can signed extend a byte value:

0f b6 c3 movu.bl r3,r0

movsx eax,bl
movsx eax,ebx

With the same argument you also have to chose the second one here, but you
can't use the same symbolic representation for different instructions.

From: Wolfgang Kern on 3 Sep 2008 15:42

Frank Kotler answered Rod Pemberton:

....[still about Ceasar's Beard Rod ?]

> (current versions accept either, disassemble to "bx")

Yeah, and my solution on this may confirm that NASM is right on:

LSL eax,bx

because LSL/LAR eax,ebx wont contain any sense anyway.
And I'd see everything different to the

LSL eax,bx ;load eax with the segm4ent limit of the segment
;given in BX (and nothting else)

interpetation as a bug.
What do you expect with:

LSL eax,ebx

to be different from

LSL eax,bx

the first above may lead to a (totally wrong) believe that the
highword in ebx may change any functionality of LSL/LAR or friends.

Ok if it's just a matter of the targeted assembler dialect,
this may be job of the disassembler option flags, and nothing more...

Ok Rod, you're known and I appreciate your abilities to detect
misinterpretations/inconsequent or contradictional wording much
faster that we oldies may see it, but here:
LSL/LAR and any other seg-mov/modify, you are just plain wrong.
SEG-reg-OPCODES behave really different to GP/MEM instructions by their
(vectored instead of pipelined) nature and please stop to tell us about
REG-Address-Size defaults, because they aren't valid on "specials".

__
wolfgang

First | Prev | Next | Last
Pages: 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Prev: announce: my very first disassembler now available (GPL)
Next: Win32 non blocking console input?