From: Andi Kleen on
On Thu, Jun 24, 2010 at 01:42:11PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-24 at 13:20 +0200, Andi Kleen wrote:
> > Ok so going back to the original self-irq patchkit. Unfortunately the other
> > reviewer hated that. How to get out of that deadlock?
>
> Well, I didn't like your original patch either.
>
> What's wrong with going with the patch I posted today? (aside from me
> getting the barriers slightly wrong and not doing the arch
> implementation).

Well it would need to work.

Also I personally didn't see the point of the irq items list because
there's no good way to dynamically allocate it in a NMI, so the window
would be always "fixed size" anyways and you could as well just use
per cpu data.

That's why for simple self irq I preferred Ying's original patch.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
On Thu, Jun 24, 2010 at 01:57:29PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-24 at 13:55 +0200, Andi Kleen wrote:
> > > but we don't have anything else that does that.
> >
> > Actually we do, audit in syscalls and scheduling in interrupts and signals
> > all work this way. Probably more at some point adding more code to this
> > path was very popular.
>
> That's the return to user path, nothing to do with softirqs. Add a TIF
> flag and call your function there.

It does that, but there are some cases where it's not enough.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-06-24 at 13:58 +0200, Andi Kleen wrote:
> On Thu, Jun 24, 2010 at 01:42:11PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-06-24 at 13:20 +0200, Andi Kleen wrote:
> > > Ok so going back to the original self-irq patchkit. Unfortunately the other
> > > reviewer hated that. How to get out of that deadlock?
> >
> > Well, I didn't like your original patch either.
> >
> > What's wrong with going with the patch I posted today? (aside from me
> > getting the barriers slightly wrong and not doing the arch
> > implementation).
>
> Well it would need to work.

Look at kernel/perf_event.c:perf_pending_queue()/__perf_pending_run()

> Also I personally didn't see the point of the irq items list because
> there's no good way to dynamically allocate it in a NMI, so the window
> would be always "fixed size" anyways and you could as well just use
> per cpu data.
>
> That's why for simple self irq I preferred Ying's original patch.

I already told you that I have an irq_work in every perf_event structure
(its called perf_pending_entry), I cannot register an id for each
perf_event because:
1) there's potentially more than 32 of them
2) I'd need an id->perf_event map which is a waste of time




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-06-24 at 14:02 +0200, Andi Kleen wrote:
> On Thu, Jun 24, 2010 at 01:57:29PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-06-24 at 13:55 +0200, Andi Kleen wrote:
> > > > but we don't have anything else that does that.
> > >
> > > Actually we do, audit in syscalls and scheduling in interrupts and signals
> > > all work this way. Probably more at some point adding more code to this
> > > path was very popular.
> >
> > That's the return to user path, nothing to do with softirqs. Add a TIF
> > flag and call your function there.
>
> It does that, but there are some cases where it's not enough.

care to expand on that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Thu, 2010-06-24 at 13:23 +0200, Ingo Molnar wrote:
>
> > What might make sense is to offer two types of callbacks: one that is
> > immediate whenever an event triggers - and another that is sleepable and is
> > executed from process context.
>
> Trouble is waking that thread, you cannot wake tasks from NMI context,
> so whatever you do, you'll end up with a trampoline.
>
> You could of course offer that trampoline nicely packaged, but I'm not
> sure that's really worth the effort.

Right, so there's basically three clean solutions to the 'sleepable callback'
problem, in order of amount of state that needs to be passed to it:

- State-less (or idempotent) events/callbacks: use a hardirq callback to wake
up a well-known process context.

- If we want the task that generates an event to execute a sleeping callback:
use a TIF flag and state in the task itself to pass along the info.

- In the most generic case, if there's arbitrary target task and arbitrary
state that needs to be queued, then to achieve sleepable callbacks the
following solution can be used: the task allocates a perf ring-buffer and
uses a TIF flag to trigger consumption of it.

All memory allocation, wakeup, etc. is handled already by the regular perf
events and ring-buffer codepaths.

No special, open-coded trampolining needed - the ring-buffer is the trampoline
and the ring-buffer consumer can key off the events it receives. (and there
can be multiple consumers of the same event source so we can have in-task
kernel based action combined with a user-space daemon that get an event stream
as well.)

All of these solutions use the fact that perf events are a generic event
framework. If there's any missing details somewhere then fixes/enhancements
can be added - right now our in-kernel event consumers are simple. But the
design is sound.

And none of these solutions involves the incestous low level raping of
softirqs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/