sched: implement the exclusive wait queue as a LIFO queue [Kernel]

Prev: [git pull] PCMCIA bugfixes for 2.6.34-rc6
Next: linux-next: Apr 27 - S390 boot failure.

From: David Howells on 28 Apr 2010 10:10

Changli Gao <xiaosuo(a)gmail.com> wrote:

> I don't know how to do that, as maybe there are non-exclusive and
> exclusive wait queues in the same wait queue head. If we want to
> enqueue exclusive wait queues at the head of exclusive queues, we have
> to know where the head is, otherwise, we have to loop to find the head
> when enqueuing.

I suspect you really want to have the semantics defined per-queue. _Either_ a
queue is FIFO (such as processes waiting for a resource so they can do
something with it) _or_ it is LIFO (such as a pool of processes waiting to be
given work).

How often do the two actually mix? And if they do, is that really an error?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Changli Gao on 28 Apr 2010 11:00

On Wed, Apr 28, 2010 at 10:06 PM, David Howells <dhowells(a)redhat.com> wrote:
>
> I suspect you really want to have the semantics defined per-queue. _Either_ a
> queue is FIFO (such as processes waiting for a resource so they can do
> something with it) _or_ it is LIFO (such as a pool of processes waiting to be
> given work).
>
> How often do the two actually mix? And if they do, is that really an error?
>

The sock.sk_sleep is used by exclusive and non-exclusive wait queues.
exclusive and non-exclusive is identified by wait queues, not wait
queue heads. Maybe there is a historical reason. It is much like a
hack.

--
Regards，
Changli Gao(xiaosuo(a)gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Howells on 28 Apr 2010 11:10

Changli Gao <xiaosuo(a)gmail.com> wrote:

> > How often do the two actually mix? �And if they do, is that really an
> > error?
>
> The sock.sk_sleep is used by exclusive and non-exclusive wait queues.
> exclusive and non-exclusive is identified by wait queues, not wait
> queue heads. Maybe there is a historical reason. It is much like a
> hack.

But is it actually used in both modes at the same time?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jamie Lokier on 28 Apr 2010 11:30

Changli Gao wrote:
> On Wed, Apr 28, 2010 at 9:21 PM, Jamie Lokier <jamie(a)shareable.org> wrote:
> > Changli Gao wrote:
> >>
> >> fs/eventpoll.c: 1443.
> >> � � � � � � � � wait.flags |= WQ_FLAG_EXCLUSIVE;
> >> � � � � � � � � __add_wait_queue(&ep->wq, &wait);
> >
> > The same thing about assumptions applies here. �The userspace process
> > may be waiting for an epoll condition to get access to a resource,
> > rather than being a worker thread interchangeable with others.
>
> Oh, the lines above are the current ones. So the assumptions applies
> and works here.

No, because WQ_FLAG_EXCLUSIVE doesn't have your LIFO semantic at the moment.

Your patch changes the behaviour of epoll, though I don't know if it
matters. Perhaps all programs which have multiple tasks waiting on
the same epoll fd are "interchangeable worker thread" types anyway :-)

> > For example, userspace might be using a pipe as a signal-safe lock, or
> > signal-safe multi-token semaphore, and epoll to wait for that pipe.
> >
> > WQ_FLAG_EXCLUSIVE means there is no point waking all tasks, to avoid a
> > pointless thundering herd. �It doesn't mean unfairness is ok.
>
> The users should not make any assumption about the waking up sequence,
> neither LIFO nor FIFO.

Correct, but they should be able to assume non-starvation (eventual
progress) for all waiters.

It's one of those subtle things, possibly a unixy thing: Non-RT tasks
should always make progress when the competition is just other non-RT
tasks, even if the progress is slow.

Starvation can spread out beyond the starved process, to cause
priority inversions in other tasks that are waiting on a resource
locked by the starved process. Among other things, that can cause
higher priority tasks, and RT priority tasks, to block permanently.
Very unpleasant.

> > The LIFO idea _might_ make sense for interchangeable worker-thread
> > situations - including userspace. �It would make sense for pipe
> > waiters, socket waiters (especially accept), etc.
>
> Yea, and my following patches are for socket waiters.

Occasionally unix socketpairs are occasionally used in the above ways too.

I'm not against your patch, but I worry that starvation is a new
semantic, and it may have a significant effect on something - either
in the kernel, or in userspace which is harder to check.

> > Do you have any measurements which showing the LIFO mode performing
> > better than FIFO, and by how much?
>
> I didn't do any test yet. But some work done by LSE project years ago
> showed that it is better.
>
> http://lse.sourceforge.net/io/aionotes.txt
>
> " Also in view of
> better cache utilization the wake queue mechanism is LIFO by default.
> (A new exclusive LIFO wakeup option has been introduced for this purpose)"

I suspect it's possible to combine LIFO-ish and FIFO-ish queuing to
prevent starvation while getting some of the locality benefit.
Something like add-LIFO and increment a small counter in the next wait
entry, but never add in front of an entry whose counter has reached
MAX_LIFO_WAITERS? :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Changli Gao on 28 Apr 2010 11:40

On Wed, Apr 28, 2010 at 11:00 PM, David Howells <dhowells(a)redhat.com> wrote:
>
> But is it actually used in both modes at the same time?
>

It should be seldom in good applications. So I only need to add a
function, like this:

add_wait_queue_head_exclusive(wq, wait)
first_exclusive = find_the_first_exclusive(wq); // it is fast, as
there should no or few non-exclusive wait queues
list_add(wait, first_exclusive);

--
Regards，
Changli Gao(xiaosuo(a)gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [git pull] PCMCIA bugfixes for 2.6.34-rc6
Next: linux-next: Apr 27 - S390 boot failure.