From: Michael Ellerman on
On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> On 05/18/2010 02:52 PM, Brian King wrote:
> > Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...
>
> Yes, it basically says "don't make this handler threaded".

That is a good fix for EHEA, but the threaded handling is still broken
for anything else that is edge triggered isn't it?

The result of the discussion about two years ago on this was that we
needed a custom flow handler for XICS on RT.

Apart from the issue of loosing interrupts there is also the fact that
masking on the XICS requires an RTAS call which takes a global lock.

cheers


From: Thomas Gleixner on
On Wed, 19 May 2010, Darren Hart wrote:

> On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> > > On 05/18/2010 02:52 PM, Brian King wrote:
> > > > Is IRQF_NODELAY something specific to the RT kernel? I don't see it in
> > > > mainline...
> > >
> > > Yes, it basically says "don't make this handler threaded".
> >
> > That is a good fix for EHEA, but the threaded handling is still broken
> > for anything else that is edge triggered isn't it?
>
> No, I don't believe so. Edge triggered interrupts that are reported as edge
> triggered interrupts will use the edge handler (which was the approach
> Sebastien took to make this work back in 2008). Since XICS presents all
> interrupts as Level Triggered, they use the fasteoi path.

I wonder whether the XICS interrupts which are edge type can be
identified from the irq_desc->flags. Then we could avoid the masking
for those in the fasteoi_handler in general.

> >
> > The result of the discussion about two years ago on this was that we
> > needed a custom flow handler for XICS on RT.
>
> I'm still not clear on why the ultimate solution wasn't to have XICS report
> edge triggered as edge triggered. Probably some complexity of the entire power
> stack that I am ignorant of.
>
> > Apart from the issue of loosing interrupts there is also the fact that
> > masking on the XICS requires an RTAS call which takes a global lock.

Right, I'd love to avoid that but with real level interrupts we'd run
into an interrupt storm. Though another solution would be to issue the
EOI after the threaded handler finished, that'd work as well, but
needs testing.

> Right, one of may reasons why we felt this was the right fix. The other is
> that there is no real additional overhead in running this as non-threaded
> since the receive handler is so short (just napi_schedule()).

Yes, in the case at hand it's the right thing to do, as we avoid
another wakeup/context switch.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
On Wed, 19 May 2010, Thomas Gleixner wrote:
> > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > edge triggered as edge triggered. Probably some complexity of the entire power
> > stack that I am ignorant of.
> >
> > > Apart from the issue of loosing interrupts there is also the fact that
> > > masking on the XICS requires an RTAS call which takes a global lock.
>
> Right, I'd love to avoid that but with real level interrupts we'd run
> into an interrupt storm. Though another solution would be to issue the
> EOI after the threaded handler finished, that'd work as well, but
> needs testing.

Thought more about that. The case at hand (ehea) is nasty:

The driver does _NOT_ disable the rx interrupt in the card in the rx
interrupt handler - for whatever reason.

So even in mainline you get repeated rx interrupts when packets
arrive while napi is processing the poll, which is suboptimal at
least. In fact it is counterproductive as the whole purpose of NAPI
is to _NOT_ get interrupts for consecutive incoming packets while the
poll is active.

Most of the other network drivers do:

rx_irq()
disable rx interrupts on card
napi_schedule()

Now when the napi poll is done (no more packets available) then the
driver reenables the rx interrupt on the card.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Ellerman on
On Wed, 2010-05-19 at 07:16 -0700, Darren Hart wrote:
> On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> >> On 05/18/2010 02:52 PM, Brian King wrote:
> >>> Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...
> >>
> >> Yes, it basically says "don't make this handler threaded".
> >
> > That is a good fix for EHEA, but the threaded handling is still broken
> > for anything else that is edge triggered isn't it?
>
> No, I don't believe so. Edge triggered interrupts that are reported as
> edge triggered interrupts will use the edge handler (which was the
> approach Sebastien took to make this work back in 2008). Since XICS
> presents all interrupts as Level Triggered, they use the fasteoi path.

But that's the point, no interrupts on XICS are reported as edge, even
if they are actually edge somewhere deep in the hardware. I don't think
we have any reliable way to determine what is what.

> > The result of the discussion about two years ago on this was that we
> > needed a custom flow handler for XICS on RT.
>
> I'm still not clear on why the ultimate solution wasn't to have XICS
> report edge triggered as edge triggered. Probably some complexity of the
> entire power stack that I am ignorant of.

I'm not really sure either, but I think it's a case of a leaky
abstraction on the part of the hypervisor. Edge interrupts behave as
level as long as you handle the irq before EOI, but if you mask they
don't. But Milton's the expert on that.

> > Apart from the issue of loosing interrupts there is also the fact that
> > masking on the XICS requires an RTAS call which takes a global lock.
>
> Right, one of may reasons why we felt this was the right fix. The other
> is that there is no real additional overhead in running this as
> non-threaded since the receive handler is so short (just napi_schedule())..

True. It's not a fix in general though. I'm worried that we're going to
see the exact same bug for MSI(-X) interrupts.

cheers


From: Michael Ellerman on
On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
> On Wed, 19 May 2010, Thomas Gleixner wrote:
> > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > stack that I am ignorant of.
> > >
> > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > masking on the XICS requires an RTAS call which takes a global lock..
> >
> > Right, I'd love to avoid that but with real level interrupts we'd run
> > into an interrupt storm. Though another solution would be to issue the
> > EOI after the threaded handler finished, that'd work as well, but
> > needs testing.
>
> Thought more about that. The case at hand (ehea) is nasty:
>
> The driver does _NOT_ disable the rx interrupt in the card in the rx
> interrupt handler - for whatever reason.

Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)

cheers