From: Ulrich Drepper on
On Tue, Apr 6, 2010 at 01:48, Peter Zijlstra <peterz(a)infradead.org> wrote:
>  try
>  spin
>  try
>  syscall

This is available for a long time in the mutex implementation
(PTHREAD_MUTEX_ADAPTIVE_NP mutex type). It hasn't show much
improvement if any. There were some people demanding this support for
as far as I know they are not using it now. This is adaptive
spinning, learning from previous calls how long to wait. But it's
still unguided. There is no way to get information like "the owner
has been descheduled".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
On Tue, Apr 6, 2010 at 09:44, Alan Cox <alan(a)lxorguk.ukuu.org.uk> wrote:
> That gives you something along the lines of
>
>        runaddr = find_run_flag(lock);
>        do {
>                while(*runaddr == RUNNING) {
>                        if (trylock(lock))
>                                return WHOOPEE;
>                        cpu relax
>                }
>                yield (_on(thread));
>        } while(*runaddr != DEAD);

There still has to be an upper limit in the number of rounds of the
wait loop )some locks are held for a long time) since otherwise CPUs
are unnecessarily long tied up. And the DEAD case is only for robust
mutex handling. But in theory I agree.

We already have the set_tid_address syscall. This could be
generalized with a new syscall which can provide the kernel with more
than one pointer to store "stuff" in: TIDs, scheduling info, etc.

The non-swappable part will be tricky. One doesn't know how many
threads will be created in a process. This mechanism shouldn't put an
arbitrary limit in place. So where to allocate the memory? Perhaps
it's better to implicitly mark the memory page pointed to by the new
syscall as non-swappable? This could mean one page per thread...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
On Tue, Apr 6, 2010 at 12:31, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> We need to figure out a more efficient way to
> do the spinning in the kernel where we have all the necessary
> information already.

Really? The owner information isn't in general available in the
kernel. Futex operation doesn't require the value used to be the PID
(or negative of the PID). That is a dramatic limitation of the
usefulness of futexes.

At userlevel there is access to other fields of the data structure
which can contain the owner information.

I would like to see the method using a per-thread pinned page and an
update of a memory location on scheduling. For benchmarking at least.
I agree that a sys_yield_to() syscall would be at the very least
useful as well. But it's useful for other things already.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
On Sat, Apr 10, 2010 at 16:35, Alan Cox <alan(a)lxorguk.ukuu.org.uk> wrote:
> You only need one page per 4096 threads

Very expensive. Each cache line would be fought over by 64 threads.
Constant RFOs make context switches significantly slower.

At most 4096/64 = 64 threads should share one page. One page would
still cover almost all processes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/