From: Manfred Spraul on
On 04/12/2010 08:49 PM, Chris Mason wrote:
> I have a microbenchmark to test how quickly we can post and wait in
> bulk. With this change, semtimedop is able do to more than twice
> as much work in the same run. On a large numa machine, it brings
> the IPC lock system time (reported by perf) down from 85% to 15%.
>
>
Looking at the current code:
- update_queue() can be O(N^2) if only some of the waiting tasks are
woken up.
Actually: all non-woken up tasks are rescanned after a task that can be
woken up is found.

- Your test app tests the best case for the current code:
You wake up the tasks in the same order as the called semop().
If you invert the order (i.e.: worklist_add() adds to head instead of
tail), I would expect an even worse performance of the current code.

The O(N^2) is simple to fix, I've attached a patch.
For your micro-benchmark, the patch does not change much: you wake-up
in-order, thus the current code does not misbehave.

Do you know how Oracle wakes up the tasks?
FIFO, LIFO, un-ordered?

> while(unlikely(error == IN_WAKEUP)) {
> cpu_relax();
> error = queue.status;
> }
>
> - if (error != -EINTR) {
> + /*
> + * we are lock free right here, and we could have timed out or
> + * gotten a signal, so we need to be really careful with how we
> + * play with queue.status. It has three possible states:
> + *
> + * -EINTR, which means nobody has changed it since we slept. This
> + * means we woke up on our own.
> + *
> + * IN_WAKEUP, someone is currently waking us up. We need to loop
> + * here until they change it to the operation error value. If
> + * we don't loop, our process could exit before they are done waking us
> + *
> + * operation error value: we've been properly woken up and can exit
> + * at any time.
> + *
> + * If queue.status is currently -EINTR, we are still being processed
> + * by the semtimedop core. Someone either has us on a list head
> + * or is currently poking our queue struct. We need to find that
> + * reference and remove it, which is what remove_queue_from_lists
> + * does.
> + *
> + * We always check for both -EINTR and IN_WAKEUP because we have no
> + * locks held. Someone could change us from -EINTR to IN_WAKEUP at
> + * any time.
> + */
> + if (error != -EINTR&& error != IN_WAKEUP) {
> /* fast path: update_queue already obtained all requested
> * resources */
No: The code accesses a local variable. The loop above the comment
guarantees that the error can't be IN_WAKEUP.

> +
> +out_putref:
> + sem_putref(sma);
> + goto out_free;
>
Is it possible to move the sem_putref into wakeup_sem_queue()?
Right now, the exit path of semtimedop doesn't touch the spinlock.
You remove that optimization.

--
Manfred