From: Steven Munroe on
On Mon, 2010-03-15 at 15:48 +1100, Benjamin Herrenschmidt wrote:
> Hoy there !
>
> This may have been discussed earlier (I have some vague memories...) but
> I just hit a problem with that again (Mark: hint, it's in hdparm's
> fallocate) so I'd like a bit of a refresh here on what is the "right
> thing" to do...
>
> So some syscalls want a 64-bit argument. Let's take fallocate() as our
> example. So we already know that we have to be extra careful since some
> 32-bit arch will pass this into 2 registers (or stack slots) which need
> to be aligned, and so we tend to already take care of making sure that
> the said 64-bit argument is either defined as 2x32-bit arguments, or
> defined as 1x64 bit argument aligned to 2x32-bit in the argument list.
>
> So far so good...
>
> The problem is when user space tries to use the same trick for calling
> those functions using glibc-provided syscall() function. In this
> example, hdparm does:
>
> err = syscall(SYS_fallocate, fd, mode, offset, len);
>
> With "offset" being a 64-bit argument.
>

The powerpc implementation of syscall is:


ENTRY (syscall)
mr r0,r3
mr r3,r4
mr r4,r5
mr r5,r6
mr r6,r7
mr r7,r8
mr r8,r9
sc
PSEUDO_RET
PSEUDO_END (syscall)

The ABI says:

"Long long arguments are considered to have 8-byte size and alignment.
The same 8-byte arguments that must go in aligned pairs or registers are
8-byte aligned on the stack."

This implies that the SYS_fallocate call will skip a register to get the
required alignment in the parameter save area.

for ppc32 on entry

r3 == SYS_fallocate
r4 == fd
r5 == mode
r6 == not used
r7, r8 == offset
r9 == len

This gets shifted to:

r0 == SYS_fallocate
r3 == fd
r4 == mode
r5 == not used
r6, r7 == offset
r8 == len

For syscall the vararg parms will be mirrored to the parameter save area
but will not be used. The ABI does not talk to LE for this case.

Ryan does the new ABI doc cover this?

> This will break because the first argument to syscall now shifts
> everything by one register, which breaks the register pair alignment
> (and I suppose archs with stack based calling convention can have
> similar alignment issues even if x86 doesn't).
>
> Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
> it's first argument to correct that ? Either that or making it some kind
> of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
>
> As it is, any 32-bit app using syscall() on any of the syscalls that
> takes 64-bit arguments will be broken, unless the app itself breaks up
> the argument, but the the order of the hi and lo part is different
> between BE and LE architectures ;-)
>
> So is there a more "correct" solution than another here ? Should powerpc
> glibc be fixed at least so that syscall() keeps the alignment ?
>
> Cheers,
> Ben.
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Benjamin Herrenschmidt wrote:
> err = syscall(SYS_fallocate, fd, mode, offset, len);
>
> With "offset" being a 64-bit argument.
>
> This will break because the first argument to syscall now shifts
> everything by one register, which breaks the register pair alignment
> (and I suppose archs with stack based calling convention can have
> similar alignment issues even if x86 doesn't).
>
> Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
> it's first argument to correct that ? Either that or making it some kind
> of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
>
> As it is, any 32-bit app using syscall() on any of the syscalls that
> takes 64-bit arguments will be broken, unless the app itself breaks up
> the argument, but the the order of the hi and lo part is different
> between BE and LE architectures ;-)
>
> So is there a more "correct" solution than another here ? Should powerpc
> glibc be fixed at least so that syscall() keeps the alignment ?

There are several problems with syscall(), not just this - because a
number of system calls in section 2 of the manual don't map directly
to kernel syscalls with the same function prototype.

Even fork() has become something complicated in Glibc that doesn't use
the fork syscall :-(

So anything using syscall() has to be careful on Linux already.
Changing the 64-bit alignment won't fix the other differences.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 03/15/2010 06:44 AM, Ralf Baechle wrote:
>
> Syscall is most often used for new syscalls that have no syscall stub in
> glibc yet, so the user of syscall() encodes this ABI knowledge. If at a
> later stage syscall() is changed to have this sort of knowledge we break
> the API. This is something only the kernel can get right.
>

One option would be to do a libkernel.so, with auto-generated stubs out
of the kernel build tree. As already discussed in #kernel this morning,
there are a number of sticky points with types and namespaces for this
this, but those aren't any worse than the equivalent problems for
syscall(3).

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/15/2010 08:13 AM, H. Peter Anvin wrote:
> One option would be to do a libkernel.so,

No need. Put it in the vdso. And name it something other than syscall.
The syscall() API is fixed, you cannot change it.

All this only if it makes sense for ALL archs. If it cannot work for
just one arch then it's not worth it at all.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkueWbcACgkQ2ijCOnn/RHRtBQCeP88S/0xei7CAt65AGboqsrC8
N7wAoK7Qbi+OZuQrgHTCgTA27TgY+gQU
=4tJ6
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Ulrich Drepper <drepper(a)redhat.com>
Date: Mon, 15 Mar 2010 09:00:55 -0700

> On 03/15/2010 08:13 AM, H. Peter Anvin wrote:
>> One option would be to do a libkernel.so,
>
> No need. Put it in the vdso. And name it something other than syscall.
> The syscall() API is fixed, you cannot change it.
>
> All this only if it makes sense for ALL archs. If it cannot work for
> just one arch then it's not worth it at all.

There are many archs that still lack VDSO.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/