From: Benjamin Herrenschmidt on
Hoy there !

This may have been discussed earlier (I have some vague memories...) but
I just hit a problem with that again (Mark: hint, it's in hdparm's
fallocate) so I'd like a bit of a refresh here on what is the "right
thing" to do...

So some syscalls want a 64-bit argument. Let's take fallocate() as our
example. So we already know that we have to be extra careful since some
32-bit arch will pass this into 2 registers (or stack slots) which need
to be aligned, and so we tend to already take care of making sure that
the said 64-bit argument is either defined as 2x32-bit arguments, or
defined as 1x64 bit argument aligned to 2x32-bit in the argument list.

So far so good...

The problem is when user space tries to use the same trick for calling
those functions using glibc-provided syscall() function. In this
example, hdparm does:

err = syscall(SYS_fallocate, fd, mode, offset, len);

With "offset" being a 64-bit argument.

This will break because the first argument to syscall now shifts
everything by one register, which breaks the register pair alignment
(and I suppose archs with stack based calling convention can have
similar alignment issues even if x86 doesn't).

Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
it's first argument to correct that ? Either that or making it some kind
of macro wrapper around a __syscall(int dummy, int sysno, ...) ?

As it is, any 32-bit app using syscall() on any of the syscalls that
takes 64-bit arguments will be broken, unless the app itself breaks up
the argument, but the the order of the hi and lo part is different
between BE and LE architectures ;-)

So is there a more "correct" solution than another here ? Should powerpc
glibc be fixed at least so that syscall() keeps the alignment ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Date: Mon, 15 Mar 2010 15:48:13 +1100

> As it is, any 32-bit app using syscall() on any of the syscalls that
> takes 64-bit arguments will be broken, unless the app itself breaks up
> the argument, but the the order of the hi and lo part is different
> between BE and LE architectures ;-)

I think it is even different on the same endian architectures,
f.e. mips I think.

There is no way to do this without some arch specific code
to handle things properly, really.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Sun, 2010-03-14 at 22:06 -0700, David Miller wrote:
> From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
> Date: Mon, 15 Mar 2010 15:48:13 +1100
>
> > As it is, any 32-bit app using syscall() on any of the syscalls that
> > takes 64-bit arguments will be broken, unless the app itself breaks up
> > the argument, but the the order of the hi and lo part is different
> > between BE and LE architectures ;-)
>
> I think it is even different on the same endian architectures,
> f.e. mips I think.
>
> There is no way to do this without some arch specific code
> to handle things properly, really.

Right, but to what extent ? IE. do we always need the callers using
syscall() directly to know it all, or can we to some extent handle some
of it inside glibc ?

For example, if powerpc glibc is fixed so that syscall() takes a 64-bit
first argument (or calls via some macro to add a dummy 32-bit argument),
the register alignment will be preserved, and things will work just
fine.

IE. It may not fix all problems with all archs, but in this case, it
will fix the common cases for powerpc at least :-) And any other arch
that has the exact same alignment problem.

Or is there any good reason -not- to do that in glibc ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Date: Mon, 15 Mar 2010 16:18:33 +1100

> Or is there any good reason -not- to do that in glibc ?

The whole point of syscall() is to handle cases where the C library
doesn't know about the system call yet.

I think it's therefore very much "buyer beware".

On sparc it'll never work to use the workaround you're proposing since
we pass everything in via registers.

So arch knowledge will always need to be present in these situations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ralf Baechle on
On Mon, Mar 15, 2010 at 04:18:33PM +1100, Benjamin Herrenschmidt wrote:

> On Sun, 2010-03-14 at 22:06 -0700, David Miller wrote:
> > From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
> > Date: Mon, 15 Mar 2010 15:48:13 +1100
> >
> > > As it is, any 32-bit app using syscall() on any of the syscalls that
> > > takes 64-bit arguments will be broken, unless the app itself breaks up
> > > the argument, but the the order of the hi and lo part is different
> > > between BE and LE architectures ;-)
> >
> > I think it is even different on the same endian architectures,
> > f.e. mips I think.

MIPS passes arguments in the endian order that is low/high for little
endian rsp high/low for big endian.

> > There is no way to do this without some arch specific code
> > to handle things properly, really.
>
> Right, but to what extent ? IE. do we always need the callers using
> syscall() directly to know it all, or can we to some extent handle some
> of it inside glibc ?
>
> For example, if powerpc glibc is fixed so that syscall() takes a 64-bit
> first argument (or calls via some macro to add a dummy 32-bit argument),
> the register alignment will be preserved, and things will work just
> fine.
>
> IE. It may not fix all problems with all archs, but in this case, it
> will fix the common cases for powerpc at least :-) And any other arch
> that has the exact same alignment problem.
>
> Or is there any good reason -not- to do that in glibc ?

Syscall is most often used for new syscalls that have no syscall stub in
glibc yet, so the user of syscall() encodes this ABI knowledge. If at a
later stage syscall() is changed to have this sort of knowledge we break
the API. This is something only the kernel can get right.

Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/