From: drepper on
On Tue, Apr 6, 2010 at 16:16, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> I know that you can do any weird stuff with the futex value, but I
> don't see the "dramatic" limitation. Care to elaborate ?

If we have to fill in the PID we can represent only three states in a futex: 0, PID, -PID. Today we can represent 2^32 states. Quite a difference.


> The per thread pinned page would be unconditional, right ?

Only if the process would be using these adaptive mutexes. It could be conditional.


> I agree that benchmarking would be interesting, but OTOH I fear that
> we open up a huge can of worms with exposing scheduler details and the
> related necessary syscalls like sys_yield_to: User space thread
> management/scheduling comes to my mind and I hope we agree that we do
> not want to revisit that.

I'm not sure. We never got to the bottom of this. Why are these details which should not be disclosed? It's clear that there is descheduling and the sys_yield_to syscall would require nothing to happen but indicate to the kernel execution dependencies the kernel cannot necessarily discover on its own, at least not efficiently.


> Useful for what ?

We already have places where we could spin a bit using sys_yield_to because be know what we are waiting on.


> What are the exact semantics of such a syscall ?

It gives the kernel the hint that the current thread is willing to hand over the remaining time on the timeslice to the target thread. This target thread, if sleeping, can immediately make progress. Yes, this might mean moving the target thread to the core executing yielding thread. Perhaps this doesn't make sense in some situations. In this case the syscall could be a no-op, perhaps indicating this in the return value.


> How does that fit into the various scheduling constraints ?

I don't know enough about all the constraints. As I said, it could be a hint. If the constraints forbid the timeslice transfer it need not happen.
From: drepper on
On Wed, Apr 7, 2010 at 20:41, Darren Hart <dvhltc(a)us.ibm.com> wrote:
> For general futexes sure, but not for robust or PI mutexes. Having adaptive
> futexes be based on the TID|WAITERS_FLAG policy certainly isn't breaking new
> ground, and is consistent with the other kernel-side futex locking
> implementations.

PI mutexes are really unimportant in the big world. I know your employer cares but overall it's a minute fraction. The focus should be primarily on the normal futexes.

BTW, you want to stuff a flag in the futex word? This doesn't work in general. For a mutex we need three distinct value. For PI futexes it's 0, TID and -TID. If we have 31 bit TID values there isn't enough room for another bit.


> What about the concern of this TLS approach only working for process private
> locks? I would very much like to make this work for both shared and private
> locks.

Again, hardly a general concern. It's a minute fraction of mutexes which is shared.

It should be clear that the kernel approach and the userlevel approach are complimentary and could even be used. If the userlevel variant proves significantly faster (and I assume it will) then the kernel variant could be used for shared mutexes etc.