From: Andi Kleen on
On Fri, Jun 25, 2010 at 10:12:43AM +0800, Huang Ying wrote:
> On Thu, 2010-06-24 at 14:35 +0800, Peter Zijlstra wrote:
> > Something like this, but filled out with some arch code that does the
> > self-ipi and calls irq_work_run() should do.
> >
> > No need to molest the softirq code, no need for limited vectors of any
> > kind.
>
> Now, as far as my understanding goes, hard IRQ based solution is
> acceptable for everyone.
>
> Ingo and Andi,
>
> Do you agree?

Yes that's fine for me. The error handling issues can be solved in other
ways too.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Huang Ying on
On Fri, 2010-06-25 at 15:48 +0800, Peter Zijlstra wrote:
> On Fri, 2010-06-25 at 10:12 +0800, Huang Ying wrote:
> >
> > It is better to add "void *data" field in this struct to allow same
> > function can be used for multiple struct irq_work.
>
> No, simply do:
>
> struct my_foo {
> struct irq_work work;
> /* my extra data */
> }
>
> void my_func(struct irq_work *work)
> {
> struct my_foo *foo = container_of(work, struct my_foo, work);
>
> /* tada! */
> }

Yes. This works too. But Adding "void *data" field is helpful if you do
not embed struct irq_work into another struct.

> > And I think IRQ is the implementation detail here, so irq_work is
> > probably not a good name. nmi_return_notifier or nmi_callback is better?
>
> Well, its ran in hard-irq context, so its an irq work. There's nothing
> that says it can only be used from NMI context.

It may be run in other contexts on some system (without APIC). And I
don't think it is useful to others except NMI handler. I think this is a
choice between naming after implementation and purpose.

> > > +void irq_work_run(void)
> > > +{
> > > + struct irq_work *list;
> > > +
> > > + list = xchg(&__get_cpu_var(irq_work_list), CALLBACK_TAIL);
> > > + while (list != CALLBACK_TAIL) {
> > > + struct irq_work *entry = list;
> > > +
> > > + list = list->next;
> > > + entry->func(entry);
> > > +
> > > + entry->next = NULL;
> >
> > entry->next = NULL should be put before entry->func(entry), so that we
> > will not lose a notification from NMI. And maybe check irq_work_list for
> > several times to make sure nothing is lost.
>
> But then _sync() will return before its done executing.

We can use another flag to signify whether it is executing. For example
the bit 0 of entry->next.

> I think clearing after the function is done executing is the only sane
> semantics (and yes, I should fix the current perf code).
>
> You can always miss an NMI since it can always happen before the
> callback gets done, and allowing another enqueue before the callback is
> complete is asking for trouble.

If we move entry->next = NULL before entry->func(entry), we will not
miss the NMI. Can you show how to miss it in this way?

> > > + /*
> > > + * matches the mb in cmpxchg() in irq_work_queue()
> > > + */
> > > + smp_wmb();
> > > + }
> > > +}
> >
> > I don't know why we need smp_wmb() here and smp_rmb() in
> > irq_work_pending(). The smp_<x>mb() in original perf_pending_xxx code is
> > not necessary too. Because smp_<x>mb is invoked in wake_up_process() and
> > __wait_event() already.
>
> The smp_wmb() wants to be before ->next = NULL; so that all writes are
> completed before we release the entry. To that same effect _sync() and
> _queue need the (r)mb.

It is reasonable in this way.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
2010/6/25 Huang Ying <ying.huang(a)intel.com>:
> On Fri, 2010-06-25 at 15:48 +0800, Peter Zijlstra wrote:
>> On Fri, 2010-06-25 at 10:12 +0800, Huang Ying wrote:
>> >
>> > It is better to add "void *data" field in this struct to allow same
>> > function can be used for multiple struct irq_work.
>>
>> No, simply do:
>>
>> struct my_foo {
>> � struct irq_work work;
>> � /* my extra data */
>> }
>>
>> void my_func(struct irq_work *work)
>> {
>> � struct my_foo *foo = container_of(work, struct my_foo, work);
>>
>> � /* tada! */
>> }
>
> Yes. This works too. But Adding "void *data" field is helpful if you do
> not embed struct irq_work into another struct.



That's what makes most sense. If you use work->data to put foo, then
you can also do the opposite. Now the best is to pick the choice that
gives you a real type and a typechecking, and not an error-prone and
obfuscated void *

This is the way things are made in the kernel. struct work_struct, struct list,
struct rcu_head, etc... are all embedded into a container, so that we can
use container_of.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Huang Ying on
On Fri, 2010-06-25 at 17:23 +0800, Frederic Weisbecker wrote:
> 2010/6/25 Huang Ying <ying.huang(a)intel.com>:
> > On Fri, 2010-06-25 at 15:48 +0800, Peter Zijlstra wrote:
> >> On Fri, 2010-06-25 at 10:12 +0800, Huang Ying wrote:
> >> >
> >> > It is better to add "void *data" field in this struct to allow same
> >> > function can be used for multiple struct irq_work.
> >>
> >> No, simply do:
> >>
> >> struct my_foo {
> >> struct irq_work work;
> >> /* my extra data */
> >> }
> >>
> >> void my_func(struct irq_work *work)
> >> {
> >> struct my_foo *foo = container_of(work, struct my_foo, work);
> >>
> >> /* tada! */
> >> }
> >
> > Yes. This works too. But Adding "void *data" field is helpful if you do
> > not embed struct irq_work into another struct.
>
>
> That's what makes most sense. If you use work->data to put foo, then
> you can also do the opposite. Now the best is to pick the choice that
> gives you a real type and a typechecking, and not an error-prone and
> obfuscated void *
>
> This is the way things are made in the kernel. struct work_struct, struct list,
> struct rcu_head, etc... are all embedded into a container, so that we can
> use container_of.

container_of has no full type checking too.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-06-25 at 17:17 +0800, Huang Ying wrote:
> On Fri, 2010-06-25 at 15:48 +0800, Peter Zijlstra wrote:
> > On Fri, 2010-06-25 at 10:12 +0800, Huang Ying wrote:
> > >
> > > It is better to add "void *data" field in this struct to allow same
> > > function can be used for multiple struct irq_work.
> >
> > No, simply do:
> >
> > struct my_foo {
> > struct irq_work work;
> > /* my extra data */
> > }
> >
> > void my_func(struct irq_work *work)
> > {
> > struct my_foo *foo = container_of(work, struct my_foo, work);
> >
> > /* tada! */
> > }
>
> Yes. This works too. But Adding "void *data" field is helpful if you do
> not embed struct irq_work into another struct.

No, embedding is the normal way we do this in the kernel.

> > > And I think IRQ is the implementation detail here, so irq_work is
> > > probably not a good name. nmi_return_notifier or nmi_callback is better?
> >
> > Well, its ran in hard-irq context, so its an irq work. There's nothing
> > that says it can only be used from NMI context.
>
> It may be run in other contexts on some system (without APIC).

I would consider that a BUG. Use a random IRQ source to process the
callbacks on these broken platforms.

> And I
> don't think it is useful to others except NMI handler. I think this is a
> choice between naming after implementation and purpose.

There is, although I'm sure people will yell at me for even proposing
this. You can raise the IPI from an IRQ disabled section to get
something done right after it.

> We can use another flag to signify whether it is executing. For example
> the bit 0 of entry->next.

There's no point.

> > I think clearing after the function is done executing is the only sane
> > semantics (and yes, I should fix the current perf code).
> >
> > You can always miss an NMI since it can always happen before the
> > callback gets done, and allowing another enqueue before the callback is
> > complete is asking for trouble.
>
> If we move entry->next = NULL before entry->func(entry), we will not
> miss the NMI. Can you show how to miss it in this way?

<NMI>
...
irq_work_queue(&my_work, func);
...
<EOI>
<IPI>
irq_work_run()

<NMI>
irq_work_queue(&my_work, func); <FAIL>
<EOI>

my_func.next = NULL;
<EOI>

Really not that hard. Now imagine wrapping irq_work in some state and
you reusing the state while the function is still running..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/