From: Luca Barbieri on
> Depends on where on the stack you're going to save things, I through
> you'd take space in the thread_info struct, but I guess if you're simply
> going to push the reg onto the stack it should be fine.

Yes, this seems the best solution.
With frame pointers enabled, it's just a single andl $-8, %esp to
align the stack (otherwise, frame pointers are forced by gcc).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 02/18/2010 02:27 AM, Luca Barbieri wrote:
>> CR changes are slow and synchronize the CPU. The later is always slow.
>>
>> It sounds like you didn't time it?
> I didn't, because I think it strongly depends on the microarchitecture
> and I don't have a comprehensive set of machines to test on, so it
> would just be a single data point.
>
> The lock prefix on cmpxchg8b is also serializing so it might be as bad.

No. LOCK isn't serializing in the same way CRx writes are.


> Anyway, if we use this, we should keep TS cleared in kernel mode and
> lazily restore it on return to userspace.
> This would make clts/stts performance mostly moot.

This is what kernel_fpu_begin/kernel_fpu_end is all about. We
definitely cannot leave TS cleared without the user space CPU state
moved to its home location, or we have yet another complicated state to
worry about.

I really feel that without a *strong* use case for this, there is
absolutely no point.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 02/18/2010 02:27 AM, Luca Barbieri wrote:
>> So why don't you simply use normal asm inputs/outputs?
> I do, on the caller side.
>
> In the callee, I don't see any other robust way to implement parameter
> passing in ebx/esi other than global register variables (without
> resorting to pure assembly, which would prevent reusing the generic
> atomic64 implementation).

This really sounds like the wrong optimization. These functions aren't
exactly all that complex in assembly (which would also allow them to be
simple cli/do stuff/sti), and instead relying on gcc features which may
or may not be well supported on x86 is inviting breakage down the line.

That is particularly damaging, since the remaining 486-class users tend
to be deeply embedded and thus we only find problems later.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luca Barbieri on
> This is what kernel_fpu_begin/kernel_fpu_end is all about. �We
> definitely cannot leave TS cleared without the user space CPU state
> moved to its home location, or we have yet another complicated state to
> worry about.

It should be relatively simple to handle, since the current code
doesn't really rely on the TS flag but uses TS_USEDFPU.
It would mostly be a matter of making sure TS is restored on return to
userspace if necessary.

> I really feel that without a *strong* use case for this, there is
> absolutely no point.
For the specific 32-bit atomic64_t case, it is an improvement, but not
necessarily significant in the big picture.
Being able to efficiently use SSE in the kernel might however be more
broadly useful.
memcpy/memset/etc. (assuming SSE is the best option for these at least
on some processors) and checksums come to mind.
Also non-temporal SSE moves might be useful for things like memory
compaction without clobbering caches.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 02/18/2010 10:14 AM, Luca Barbieri wrote:
>
>> I really feel that without a *strong* use case for this, there is
>> absolutely no point.
> For the specific 32-bit atomic64_t case, it is an improvement, but not
> necessarily significant in the big picture.
> Being able to efficiently use SSE in the kernel might however be more
> broadly useful.
> memcpy/memset/etc. (assuming SSE is the best option for these at least
> on some processors) and checksums come to mind.
> Also non-temporal SSE moves might be useful for things like memory
> compaction without clobbering caches.

We already do that kind of stuff, using
kernel_fpu_begin()..kernel_fpu_end(). We went through some pain a bit
ago to clean up "private hacks" that complicated things substantially.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/