From: Thomas Gleixner on
On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > Thought more about that. The case at hand (ehea) is nasty:
> > >
> > > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > > interrupt handler - for whatever reason.
> >
> > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > Jan-Bernd or Doug will chime in and enlighten us? :)
>
> From our perspective there is no need to disable interrupts for the
> RX side as the chip does not fire further interrupts until we tell
> the chip to do so for a particular queue. We have multiple receive

The traces tell a different story though:

ehea_recv_irq_handler()
napi_reschedule()
eoi()
ehea_poll()
...
ehea_recv_irq_handler() <---------------- ???
napi_reschedule()
...
napi_complete()

Can't tell whether you can see the same behaviour in mainline, but I
don't see a reason why not.

> queues with an own interrupt each so that the interrupts can arrive
> on multiple CPUs in parallel. Interrupts are enabled again when we
> leave the NAPI Poll function for the corresponding receive queue.

I can't see a piece of code which does that, but that's probably just
lack of detailed hardware knowledge on my side.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
On Thu, 20 May 2010, Michael Ellerman wrote:
> On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> > On Wed, 19 May 2010, Darren Hart wrote:
> >
> > > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
>
> > > > The result of the discussion about two years ago on this was that we
> > > > needed a custom flow handler for XICS on RT.
> > >
> > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > stack that I am ignorant of.
> > >
> > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > masking on the XICS requires an RTAS call which takes a global lock.
> >
> > Right, I'd love to avoid that but with real level interrupts we'd run
> > into an interrupt storm. Though another solution would be to issue the
> > EOI after the threaded handler finished, that'd work as well, but
> > needs testing.
>
> Yeah I think that was the idea for the custom flow handler. We'd reset
> the processor priority so we can take other interrupts (which the EOI
> usually does for you), then do the actual EOI after the handler
> finished.

That only works when the card does not issue new interrupts until the
EOI happens. If the EOI is only relevant for the interrupt controller,
then you are going to lose any edge which comes in before the EOI as
well.

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
Jan-Bernd,

On Thu, 20 May 2010, Jan-Bernd Themann wrote:

>
> Hi Thomas
>
> > Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
> >
> > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > >
> > > > > The driver does _NOT_ disable the rx interrupt in the card in the
> rx
> > > > > interrupt handler - for whatever reason.
> > > >
> > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > >
> > > From our perspective there is no need to disable interrupts for the
> > > RX side as the chip does not fire further interrupts until we tell
> > > the chip to do so for a particular queue. We have multiple receive
> >
> > The traces tell a different story though:
> >
> > ehea_recv_irq_handler()
> > napi_reschedule()
> > eoi()
> > ehea_poll()
> > ...
> > ehea_recv_irq_handler() <---------------- ???
> > napi_reschedule()
> > ...
> > napi_complete()
> >
> > Can't tell whether you can see the same behaviour in mainline, but I
> > don't see a reason why not.
>
> Is this the same interrupt we are seeing here, or do we see a second other
> interrupt popping up on the same CPU? As I said, with multiple receive
> queues
> (if enabled) you can have multiple interrupts in parallel.

According to the traces it's the very same interrupt number.

> Pleaes check if multiple queues are enabled. The following module parameter
> is used for that:
>
> MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
> ");
>
> you should also see the number of used HEA interrupts in /proc/interrupts

I leave that for Will and Darren, they have the hardware :)

> >
> > > queues with an own interrupt each so that the interrupts can arrive
> > > on multiple CPUs in parallel. Interrupts are enabled again when we
> > > leave the NAPI Poll function for the corresponding receive queue.
> >
> > I can't see a piece of code which does that, but that's probably just
> > lack of detailed hardware knowledge on my side.
>
> If you mean the "re-enable" piece of code, it is not very obvious,
> you are right. Interrupts are only generated if a particular
> register for our completion queues is written. We do this in the
> following line:
>
> ehea_reset_cq_ep(pr->recv_cq);
> ehea_reset_cq_ep(pr->send_cq);
> ehea_reset_cq_n1(pr->recv_cq);
> ehea_reset_cq_n1(pr->send_cq);
>
> So this is in a way an indirect way to ask for interrupts when new
> completions were written to memory. We don't really disable/enable
> interrupts on the HEA chip itself.

Ah, ok. That's after the napi_complete which looks correct.

> I think there are some mechanisms build in the HEA chip that should
> prevent that interrupts don't get lost. But that is something that
> is / was completely hidden from us, so my skill is very limited
> there.
>
> If more details are needed here we should involve the PHYP guys +
> eHEA HW guys if not already done. Did anyone already talk to them?

Will or Darren might have, but lets gather more information first
before we rack their nerves :)

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
On Thu, 20 May 2010, Darren Hart wrote:

> On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
> > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > >
> > > > > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > > > > interrupt handler - for whatever reason.
> > > >
> > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > >
> > > From our perspective there is no need to disable interrupts for the
> > > RX side as the chip does not fire further interrupts until we tell
> > > the chip to do so for a particular queue. We have multiple receive
> >
> > The traces tell a different story though:
> >
> > ehea_recv_irq_handler()
> > napi_reschedule()
> > eoi()
> > ehea_poll()
> > ...
> > ehea_recv_irq_handler()<---------------- ???
> > napi_reschedule()
> > ...
> > napi_complete()
> >
> > Can't tell whether you can see the same behaviour in mainline, but I
> > don't see a reason why not.
>
> I was going to suggest that because these are threaded handlers, perhaps they
> are rescheduled on a different CPU and then receive the interrupt for the
> other CPU/queue that Jan was mentioning.
>
> But, the handlers are affined if I remember correctly, and we aren't running
> with multiple receive queues. So, we're back to the same question, why are we
> seeing another irq. It comes in before napi_complete() and therefor before the
> ehea_reset*() block of calls which do the equivalent of re-enabling
> interrupts.

Can you slap a few trace points into that driver with a stock mainline
kernel and verify that ?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Will Schmidt on
On Thu, 2010-05-20 at 11:05 +0200, Jan-Bernd Themann wrote:
> Hi Thomas
>
> > Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
> >
> > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > >
> > > > > The driver does _NOT_ disable the rx interrupt in the card in the
> rx
> > > > > interrupt handler - for whatever reason.
> > > >
> > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > >
> > > From our perspective there is no need to disable interrupts for the
> > > RX side as the chip does not fire further interrupts until we tell
> > > the chip to do so for a particular queue. We have multiple receive
> >
> > The traces tell a different story though:
> >
> > ehea_recv_irq_handler()
> > napi_reschedule()
> > eoi()
> > ehea_poll()
> > ...
> > ehea_recv_irq_handler() <---------------- ???
> > napi_reschedule()
> > ...
> > napi_complete()
> >
> > Can't tell whether you can see the same behaviour in mainline, but I
> > don't see a reason why not.
>
> Is this the same interrupt we are seeing here, or do we see a second other
> interrupt popping up on the same CPU? As I said, with multiple receive
> queues
> (if enabled) you can have multiple interrupts in parallel.

Same interrupt number (260). Per the trace data, the first
ehea_recv_irq_handler (at 117.904525) was on cpu 0, the second (at
117.904689) was on cpu 1.



<...>-2180 [000] 117.904525: .ehea_recv_irq_handler: ENTER 0 c0000000e8bd08b0
<...>-2180 [000] 117.904527: .ehea_recv_irq_handler: napi_reschedule COMpleted c0000000e8bd08b0
<...>-2180 [000] 117.904528: .ehea_recv_irq_handler: EXIT reschedule(1) 1 c0000000e8bd08b0
<...>-2180 [000] 117.904529: .xics_unmask_irq: xics: unmask virq 260 772
<...>-2180 [000] 117.904547: .xics_unmask_irq: xics: unmask virq pre-xive 260 772 0 status:0 ff
<...>-2180 [000] 117.904586: .xics_unmask_irq: xics: unmask virq post-xive 260 772 0 D:11416 status:0 5
<...>-2180 [000] 117.904602: .handle_fasteoi_irq: 260 8004000
<...>-2180 [000] 117.904603: .xics_mask_irq: xics: mask virq 260 772
<...>-2180 [000] 117.904634: .xics_mask_real_irq: xics: before: mask_real 772 status:0 5
<...>-2180 [000] 117.904668: .xics_mask_real_irq: xics: after: mask_real 772 status:0 ff
<...>-2180 [000] 117.904669: .handle_fasteoi_irq: pre-action: 260 8004100
<...>-2180 [000] 117.904671: .handle_fasteoi_irq: post-action: 260 8004100
<...>-2180 [000] 117.904672: .handle_fasteoi_irq: exit. 260 8004000
<...>-7 [000] 117.904681: .ehea_poll: ENTER 1 c0000000e8bd08b0 poll_counter:0 force:0
<...>-7 [000] 117.904683: .ehea_proc_rwqes: ehea_check_cqe 0
<...>-2180 [001] 117.904689: .ehea_recv_irq_handler: ENTER 1 c0000000e8bd08b0
<...>-7 [000] 117.904690: .ehea_proc_rwqes: ehea_check_cqe 0
<...>-2180 [001] 117.904691: .ehea_recv_irq_handler: napi_reschedule inCOMplete c0000000e8bd08b0
<...>-2180 [001] 117.904692: .ehea_recv_irq_handler: EXIT reschedule(0) 1 c0000000e8bd08b0
<...>-2180 [001] 117.904694: .xics_unmask_irq: xics: unmask virq 260 772
<...>-7 [000] 117.904702: .ehea_refill_rq2: ehea_refill_rq2
<...>-7 [000] 117.904703: .ehea_refill_rq_def: ehea_refill_rq_def
<...>-7 [000] 117.904704: .ehea_refill_rq3: ehea_refill_rq3
<...>-7 [000] 117.904705: .ehea_refill_rq_def: ehea_refill_rq_def
<...>-7 [000] 117.904706: .napi_complete: napi_complete: ENTER state: 1 c0000000e8bd08b0
<...>-7 [000] 117.904707: .napi_complete: napi_complete: EXIT state: 0 c0000000e8bd08b0
<...>-7 [000] 117.904710: .ehea_poll: EXIT !cqe rx(2). 0 c0000000e8bd08b0
<...>-2180 [001] 117.904719: .xics_unmask_irq: xics: unmask virq pre-xive 260 772 0 status:0 ff
<...>-2180 [001] 117.904761: .xics_unmask_irq: xics: unmask virq post-xive 260 772 0 D:12705 status:0 5




> Pleaes check if multiple queues are enabled. The following module parameter
> is used for that:
>
> MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
> ");

No module parameters were used, should be plain old defaults.

>
> you should also see the number of used HEA interrupts in /proc/interrupts
>

256: 1 0 0 0 0 0 0 0 XICS Level ehea_neq
259: 0 0 0 0 0 0 0 0 XICS Level eth0-aff
260: 361965 0 0 0 0 0 0 0 XICS Level eth0-queue0




>
> >
> > > queues with an own interrupt each so that the interrupts can arrive
> > > on multiple CPUs in parallel. Interrupts are enabled again when we
> > > leave the NAPI Poll function for the corresponding receive queue.
> >
> > I can't see a piece of code which does that, but that's probably just
> > lack of detailed hardware knowledge on my side.
>
> If you mean the "re-enable" piece of code, it is not very obvious, you are
> right.
> Interrupts are only generated if a particular register for our completion
> queues
> is written. We do this in the following line:
>
> ehea_reset_cq_ep(pr->recv_cq);
> ehea_reset_cq_ep(pr->send_cq);
> ehea_reset_cq_n1(pr->recv_cq);
> ehea_reset_cq_n1(pr->send_cq);
>
> So this is in a way an indirect way to ask for interrupts when new
> completions were
> written to memory. We don't really disable/enable interrupts on the HEA
> chip itself.
>
> I think there are some mechanisms build in the HEA chip that should prevent
> that
> interrupts don't get lost. But that is something that is / was completely
> hidden from
> us, so my skill is very limited there.
>
> If more details are needed here we should involve the PHYP guys + eHEA HW
> guys if not
> already done. Did anyone already talk to them?
>
> Regards,
> Jan-Bernd
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/