From: H. Peter Anvin on
On 03/15/2010 12:00 PM, David Miller wrote:
> From: Ulrich Drepper <drepper(a)redhat.com>
> Date: Mon, 15 Mar 2010 09:00:55 -0700
>
>> On 03/15/2010 08:13 AM, H. Peter Anvin wrote:
>>> One option would be to do a libkernel.so,
>>
>> No need. Put it in the vdso. And name it something other than syscall.
>> The syscall() API is fixed, you cannot change it.
>>
>> All this only if it makes sense for ALL archs. If it cannot work for
>> just one arch then it's not worth it at all.
>
> There are many archs that still lack VDSO.

Putting it into the vdso is also rather annoyingly heavyweight for what
is nothing other than an ordinary shared library. Just making it an
ordinary shared library seems a lot saner.

I don't see why syscall() can't change the type for its first argument
-- it seems to be exactly what symbol versioning is for.

Doesn't change the fact that it is fundamentally broken, of course.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Mon, 2010-03-15 at 14:44 +0100, Ralf Baechle wrote:
> Syscall is most often used for new syscalls that have no syscall stub in
> glibc yet, so the user of syscall() encodes this ABI knowledge. If at a
> later stage syscall() is changed to have this sort of knowledge we break
> the API. This is something only the kernel can get right.

Well, no. The change I propose would not break the ABI on powerpc and
would auto-magically fix thoses cases :-) But again, you don't have to
do the same thing on MIPS or sparc, it's definitely arch specific.

IE. What you are saying is that a syscall defined in the kernel as:

sys_foo(u64 arg);

To be called from userspace would require something like:

u64 arg = 0x123456789abcdef01;

#if defined(__powerpc__) && WORDSIZE == 32
syscall(SYS_foo, (u32)(arg >> 32), (u32)arg);
#ese
syscall(SYS_foo, arg);

While with the trick of making syscall a macro wrapping an underlying
__syscall that has an added dummy argument, the register alignment is
"corrected" and thus -both- forms above suddenly work for me. That might
actually work for you too.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Sun, 2010-03-14 at 22:54 -0700, David Miller wrote:
> From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
> Date: Mon, 15 Mar 2010 16:18:33 +1100
>
> > Or is there any good reason -not- to do that in glibc ?
>
> The whole point of syscall() is to handle cases where the C library
> doesn't know about the system call yet.
>
> I think it's therefore very much "buyer beware".
>
> On sparc it'll never work to use the workaround you're proposing since
> we pass everything in via registers.
>
> So arch knowledge will always need to be present in these situations.

I'm not sure I follow. We also pass via register on powerpc, but the
offset introduced by the sysno argument breaks register pair alignment
which cannot be fixed up inside syscall().

However, if I change glibc's syscall to be something like

#define syscall(sysno, args...) __syscall(0 /* dummy */, sysno, args)

And make __syscall then do something like:

mr r0, r4
mr r3, r5
mr r4, r6
mr r5, r7
mr r6, r8
.../...
sc
blr

Then at least all that class of syscalls will be fixed. Of course this
has to be in glibc arch code. I was merely asking if that was something
our glibc folks would consider and whether somebody could think of a
better solution :-)

Cheers
,Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote:
> I don't see why syscall() can't change the type for its first argument
> -- it seems to be exactly what symbol versioning is for.
>
> Doesn't change the fact that it is fundamentally broken, of course.

No need to change the type of the first arg and go for symbol
versionning if you do something like I proposed earlier, there will be
no conflict between syscall() and __syscall() and both variants can
exist.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on

> The powerpc implementation of syscall is:
>
>
> ENTRY (syscall)
> mr r0,r3
> mr r3,r4
> mr r4,r5
> mr r5,r6
> mr r6,r7
> mr r7,r8
> mr r8,r9
> sc
> PSEUDO_RET
> PSEUDO_END (syscall)

And my proposal is to make it instead:

#define syscall(__sysno, __args...) __syscall(0,__sysno,__args)

ENTRY (__syscall)
mr r0,r4
mr r3,r5
mr r4,r6
mr r5,r7
mr r6,r8
mr r7,r9
mr r8,r10
sc
PSEUDO_RET
PSEUDO_END (__syscall)

> The ABI says:
>
> "Long long arguments are considered to have 8-byte size and alignment.
> The same 8-byte arguments that must go in aligned pairs or registers are
> 8-byte aligned on the stack."

Right, that's what I'm explaining too.

> This implies that the SYS_fallocate call will skip a register to get the
> required alignment in the parameter save area.
>
> for ppc32 on entry
>
> r3 == SYS_fallocate
> r4 == fd
> r5 == mode
> r6 == not used
> r7, r8 == offset
> r9 == len

len is 64-bit too afaik but let's ignore that for now

> This gets shifted to:
>
> r0 == SYS_fallocate
> r3 == fd
> r4 == mode
> r5 == not used
> r6, r7 == offset
> r8 == len

Which is not correct, as the kernel expects:

r0 == SYS_fallocate
r3 == fd
r4 == mode
r5, r6 == offset
r7, r8 == len

> For syscall the vararg parms will be mirrored to the parameter save area
> but will not be used. The ABI does not talk to LE for this case.

Right, but the fact that we shift all args by -1- register means that we
break the 64-bit register pair alignment compared to the real syscall
which uses r0 instead for the syscall number. Hence my proposal to add
a dummy argument to restore that alignment.

As it is there is userspace code that does:

syscall(SYS_fallocate, fd, mode, offset, len);

Which works on x86 but is broken on ppc32 unless we do that change.

Cheers,
Ben.

> Ryan does the new ABI doc cover this?
>
> > This will break because the first argument to syscall now shifts
> > everything by one register, which breaks the register pair alignment
> > (and I suppose archs with stack based calling convention can have
> > similar alignment issues even if x86 doesn't).
> >
> > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as
> > it's first argument to correct that ? Either that or making it some kind
> > of macro wrapper around a __syscall(int dummy, int sysno, ...) ?
> >
> > As it is, any 32-bit app using syscall() on any of the syscalls that
> > takes 64-bit arguments will be broken, unless the app itself breaks up
> > the argument, but the the order of the hi and lo part is different
> > between BE and LE architectures ;-)
> >
> > So is there a more "correct" solution than another here ? Should powerpc
> > glibc be fixed at least so that syscall() keeps the alignment ?
> >
> > Cheers,
> > Ben.
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/