From: Jan-Bernd Themann on

Hi Thomas

> Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
>
> On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > Thought more about that. The case at hand (ehea) is nasty:
> > > >
> > > > The driver does _NOT_ disable the rx interrupt in the card in the
rx
> > > > interrupt handler - for whatever reason.
> > >
> > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > Jan-Bernd or Doug will chime in and enlighten us? :)
> >
> > From our perspective there is no need to disable interrupts for the
> > RX side as the chip does not fire further interrupts until we tell
> > the chip to do so for a particular queue. We have multiple receive
>
> The traces tell a different story though:
>
> ehea_recv_irq_handler()
> napi_reschedule()
> eoi()
> ehea_poll()
> ...
> ehea_recv_irq_handler() <---------------- ???
> napi_reschedule()
> ...
> napi_complete()
>
> Can't tell whether you can see the same behaviour in mainline, but I
> don't see a reason why not.

Is this the same interrupt we are seeing here, or do we see a second other
interrupt popping up on the same CPU? As I said, with multiple receive
queues
(if enabled) you can have multiple interrupts in parallel.

Pleaes check if multiple queues are enabled. The following module parameter
is used for that:

MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
");

you should also see the number of used HEA interrupts in /proc/interrupts


>
> > queues with an own interrupt each so that the interrupts can arrive
> > on multiple CPUs in parallel. Interrupts are enabled again when we
> > leave the NAPI Poll function for the corresponding receive queue.
>
> I can't see a piece of code which does that, but that's probably just
> lack of detailed hardware knowledge on my side.

If you mean the "re-enable" piece of code, it is not very obvious, you are
right.
Interrupts are only generated if a particular register for our completion
queues
is written. We do this in the following line:

ehea_reset_cq_ep(pr->recv_cq);
ehea_reset_cq_ep(pr->send_cq);
ehea_reset_cq_n1(pr->recv_cq);
ehea_reset_cq_n1(pr->send_cq);

So this is in a way an indirect way to ask for interrupts when new
completions were
written to memory. We don't really disable/enable interrupts on the HEA
chip itself.

I think there are some mechanisms build in the HEA chip that should prevent
that
interrupts don't get lost. But that is something that is / was completely
hidden from
us, so my skill is very limited there.

If more details are needed here we should involve the PHYP guys + eHEA HW
guys if not
already done. Did anyone already talk to them?

Regards,
Jan-Bernd

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nivedita Singhvi on
Thomas Gleixner wrote:

>> Pleaes check if multiple queues are enabled. The following module parameter
>> is used for that:
>>
>> MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
>> ");
>>
>> you should also see the number of used HEA interrupts in /proc/interrupts
>
> I leave that for Will and Darren, they have the hardware :)

16: 477477 ... XICS Level IPI
17: 129 ... XICS Level hvc_console
18: 0 ... XICS Level RAS_EPOW
33: 139232 ... XICS Level mlx4_core
256: 3 ... XICS Level ehea_neq
259: 0 ... XICS Level eth0-aff
260: 2082153 ... XICS Level eth0-queue0
289: 119166 ... XICS Level ipr
305: 0 ... XICS Level ohci_hcd:usb2
306: 0 ... XICS Level ohci_hcd:usb3
307: 2389839 ... XICS Level ehci_hcd:usb1


Nope, multiple rx queues not enabled.

thanks,
Nivedita
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Darren Hart on
On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
> On Thu, 20 May 2010, Jan-Bernd Themann wrote:
>>>> Thought more about that. The case at hand (ehea) is nasty:
>>>>
>>>> The driver does _NOT_ disable the rx interrupt in the card in the rx
>>>> interrupt handler - for whatever reason.
>>>
>>> Yeah I saw that, but I don't know why it's written that way. Perhaps
>>> Jan-Bernd or Doug will chime in and enlighten us? :)
>>
>> From our perspective there is no need to disable interrupts for the
>> RX side as the chip does not fire further interrupts until we tell
>> the chip to do so for a particular queue. We have multiple receive
>
> The traces tell a different story though:
>
> ehea_recv_irq_handler()
> napi_reschedule()
> eoi()
> ehea_poll()
> ...
> ehea_recv_irq_handler()<---------------- ???
> napi_reschedule()
> ...
> napi_complete()
>
> Can't tell whether you can see the same behaviour in mainline, but I
> don't see a reason why not.

I was going to suggest that because these are threaded handlers, perhaps
they are rescheduled on a different CPU and then receive the interrupt
for the other CPU/queue that Jan was mentioning.

But, the handlers are affined if I remember correctly, and we aren't
running with multiple receive queues. So, we're back to the same
question, why are we seeing another irq. It comes in before
napi_complete() and therefor before the ehea_reset*() block of calls
which do the equivalent of re-enabling interrupts.

--
Darren

>
>> queues with an own interrupt each so that the interrupts can arrive
>> on multiple CPUs in parallel. Interrupts are enabled again when we
>> leave the NAPI Poll function for the corresponding receive queue.
>
> I can't see a piece of code which does that, but that's probably just
> lack of detailed hardware knowledge on my side.
>
> Thanks,
>
> tglx


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Milton Miller on
On Thu, 20 May 2010 at 10:21:36 +0200 (CEST) Thomas Gleixner wrote:
> On Thu, 20 May 2010, Michael Ellerman wrote:
> > On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> > > On Wed, 19 May 2010, Darren Hart wrote:
> > >
> > > > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> >
> > > > > The result of the discussion about two years ago on this was that we
> > > > > needed a custom flow handler for XICS on RT.
> > > >
> > > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > > stack that I am ignorant of.
> > > >
> > > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > > masking on the XICS requires an RTAS call which takes a global lock.
> > >
> > > Right, I'd love to avoid that but with real level interrupts we'd run
> > > into an interrupt storm. Though another solution would be to issue the
> > > EOI after the threaded handler finished, that'd work as well, but
> > > needs testing.
> >
> > Yeah I think that was the idea for the custom flow handler. We'd reset
> > the processor priority so we can take other interrupts (which the EOI
> > usually does for you), then do the actual EOI after the handler
> > finished.
>
> That only works when the card does not issue new interrupts until the
> EOI happens. If the EOI is only relevant for the interrupt controller,
> then you are going to lose any edge which comes in before the EOI as
> well.

Well, the real MSIs have an extra bit to allow the eoi to dally behind
the mmio on another path and that should cover this race when the irq
is left enabled.

Jan-Bernd HEA has that change, right?

milton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/