From: Rob Warnock on
Rick Jones <rick.jones2(a)hp.com> wrote:
+---------------
| Noob <root(a)127.0.0.1> wrote:
| > There are situations where polling makes sense.
| > Consider a high-throughput NIC constantly interrupting the CPU.
| > Keywords: interrupt mitigation, interrupt coalescing
|
| I would not equate interrupt coalescing with polling, merely with
| being "smarter" about when to generate an interrupt. Alas, some NICs
| and their drivers don't do coalescing in what I would consider a
| "smart" way and so affect unloaded latency at the same time.
+---------------

Hah! Having worked on a design[1] in *1971* [and *many* similar designs
since] which IMHO did interrupt coalescing & holdoff "the right way", I
have found it fascinating in the decades since (and sometimes extremely
frustrating) that hardware designers tend to be (for the most part) *very*
reluctant to do it in that way. They always seem to want to put any holdoffs
*before* the interrupts, which seriously hurts latency!

IMHO, "the right way" is this:

0. The hardware (for a device or class of devices) shall contain a
pre-settable countdown timer that, when running, masks off (a.k.a.
"holds off") interrupts being produced from an "attention" states
in the device(s) -- *WITHOUT* preventing the underlying attention
state from being read!! That is, the interrupt holdoff shall *not*
prevent software from seeing that the device (still) wants attention.[2]

1. The default state of the system is that the interrupt holdoff
countdown timer is expired, and device interrups are enabled.
Thus, the CPU will be interrupted immediately upon any attention
condition in the device.

2. [Assumption:] The system will internally disable recursive interrupts
from a device until the device driver specifically re-enables them
and/or performs some explicit interrupt "dismiss" operation. If this
is not the case [as with some CPUs/systems] then a small amount of
additional hardware/software needs to be wrapped around the device.
Take it as read that this can be a bit tricky, but is usually cheap.

3. "Coalescing": The device driver interrupt service routine shall
*continue* to poll for attention status and continue to service the
device until the attention status goes false. [In fact, in some systems
it's a good idea if it polls the *first* time into the ISR as well, to
deal with potential spurious interrupts. But that's another story...]

4. "Dally": For some combinations of device/CPU/system/software/application,
it's a good idea for the device driver to *continue* polling for
an additional "dally" time *after* the attention status goes false,
just in case it comes true again "quickly" (for a system/app-dependent
definition of "quickly"), and if so, go back to step #3. [The dally
counter can trivially be merged into the coalescing poll loop, so
that #3 & #4 are one piece of code. Nevertheless, the dally time
should be a separate tunable, preferably dynamically.]

5. When the dally time has expired, the driver should write a (tunable)
holdoff time into the device's countdown timer. [If further interrupts
had to be explicitly disabled before entering the coalescing loop,
it will be convenient if this PIO Write *also* re-enables device
interrupts. Note, however, that no actual CPU interrupt will occur
until the countdown timer expires.]

6. The device driver now dismisses the interrupt.

7. "Holdoff": The countdown timer ticks away, suppressing any possible
new device interrupts, until it expires. If a device attention state
had occurred while the holdoff timer was running, then a new interrupt
is generated immediately upon timer expiration, and we transition to
state #2 above. Otherwise, the timer expiration is a silent, unobserved
event, with no effect other than putting us back in state #1 above (idle).
[And thus any subsequent device attention state generates a new interrupt
immediately.]

The latency on an idle system is a short as is possible, yet under heavy
load the efficiency is as high as possible [and latency is dominated by
queueing effects, since the CPU will be saturated]. The "holdoff" parameter
can be tuned for a smooth tradeoff between latency & efficiency across the
load range, and the "dally" parameter can be tuned to improve both latency
and efficiency in the presence of known bursty traffic patterns[3].

Thus endeth the reading of the lesson. ;-}


-Rob

[1] The DCA "SmartMux" family of terminal networking frontends & nodes.

[2] Yes, one 3rd-party device (which shall not be named) really did that:
made the attention status invisible if the holdoff was running!!
*ARGGGH!*

[3] If you know that there is a certain minimal device device response
time to driver actions -- or a certain minimal response on the far end
of an interconnect you're talking through (hint: MPI) -- then it can
be helpful to make the "dally" time just larger than this minimal
response time, to avoid unnecessary "holdoff" of responses in a
rapid flurry of exchanges. E.g., in one case a certain common control
command from the driver would, after a short delay, result in a flurry
of status change events from the device. Adding a dally time just longer
than that short delay lowered the CPU utilization by more than *half*!

In most typical applications, the optimal dally time will be a small
fraction of the holdoff time. And in any case, the peaks for "good"
values of both parameters are rather broad.

-----
Rob Warnock <rpw3(a)rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

From: Terje Mathisen "terje.mathisen at on
Rob Warnock wrote:
> Hah! Having worked on a design[1] in *1971* [and *many* similar designs
> since] which IMHO did interrupt coalescing& holdoff "the right way", I
> have found it fascinating in the decades since (and sometimes extremely
> frustrating) that hardware designers tend to be (for the most part) *very*
> reluctant to do it in that way. They always seem to want to put any holdoffs
> *before* the interrupts, which seriously hurts latency!
>
> IMHO, "the right way" is this:

What's fun is that I had to independently rediscover most of these in
order to get an early PC to handle fast serial comms:

> 0. The hardware (for a device or class of devices) shall contain a
> pre-settable countdown timer that, when running, masks off (a.k.a.
> "holds off") interrupts being produced from an "attention" states
> in the device(s) -- *WITHOUT* preventing the underlying attention
> state from being read!! That is, the interrupt holdoff shall *not*
> prevent software from seeing that the device (still) wants attention.[2]

The 16450 rs232 chip could be programmed to delay interrupts until a
given percentage of the 16-entry FIFO buffer had been consumed, but a
receive irq handler could still see that the buffer was non-empty by
polling the status.

> 1. The default state of the system is that the interrupt holdoff
> countdown timer is expired, and device interrups are enabled.
> Thus, the CPU will be interrupted immediately upon any attention
> condition in the device.

Yes, that was the default.
>
> 2. [Assumption:] The system will internally disable recursive interrupts
> from a device until the device driver specifically re-enables them
> and/or performs some explicit interrupt "dismiss" operation. If this
> is not the case [as with some CPUs/systems] then a small amount of
> additional hardware/software needs to be wrapped around the device.
> Take it as read that this can be a bit tricky, but is usually cheap.

The IRQ driver could selectively re-enable all other interrupt sources
as soon as possible, then the final IRET instruction would allow all
sources back in. This made recursive IRQs impossible while still
allowing the minimum possible latency for other devices.

>
> 3. "Coalescing": The device driver interrupt service routine shall
> *continue* to poll for attention status and continue to service the
> device until the attention status goes false. [In fact, in some systems
> it's a good idea if it polls the *first* time into the ISR as well, to
> deal with potential spurious interrupts. But that's another story...]

This was needed in case the previous polling loop had emptied out a
receive buffer which had been momentarily empty and then re-filled,
thereby causing another interrupt to be queued up.

>
> 4. "Dally": For some combinations of device/CPU/system/software/application,
> it's a good idea for the device driver to *continue* polling for
> an additional "dally" time *after* the attention status goes false,
> just in case it comes true again "quickly" (for a system/app-dependent
> definition of "quickly"), and if so, go back to step #3. [The dally
> counter can trivially be merged into the coalescing poll loop, so
> that #3& #4 are one piece of code. Nevertheless, the dally time
> should be a separate tunable, preferably dynamically.]

For the very fastest speeds (i.e. running 115 kbit/s) this was also
required.
>
> 5. When the dally time has expired, the driver should write a (tunable)
> holdoff time into the device's countdown timer. [If further interrupts
> had to be explicitly disabled before entering the coalescing loop,
> it will be convenient if this PIO Write *also* re-enables device
> interrupts. Note, however, that no actual CPU interrupt will occur
> until the countdown timer expires.]

This I couldn't do.
>
> 6. The device driver now dismisses the interrupt.

Right.
>
> 7. "Holdoff": The countdown timer ticks away, suppressing any possible
> new device interrupts, until it expires. If a device attention state
> had occurred while the holdoff timer was running, then a new interrupt
> is generated immediately upon timer expiration, and we transition to
> state #2 above. Otherwise, the timer expiration is a silent, unobserved
> event, with no effect other than putting us back in state #1 above (idle).
> [And thus any subsequent device attention state generates a new interrupt
> immediately.]

This had to be emulated by the delayed interrupt facility, allowing up
to N bytes to be received back-to-back, but generating an interrupt as
soon as the idle gap after a byte passed some fraction of the minimum
byte time.

I did get my file transfer/sync program to run consistently at full
(115k) speed this way.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Rick Jones on
Rob Warnock <rpw3(a)rpw3.org> wrote:
> 3. "Coalescing": The device driver interrupt service routine shall
> *continue* to poll for attention status and continue to service
> the device until the attention status goes false. [In fact, in
> some systems it's a good idea if it polls the *first* time into
> the ISR as well, to deal with potential spurious interrupts. But
> that's another story...]

I trust that is something other than a PIO Read?

> Thus endeth the reading of the lesson. ;-}

Thank you sensei :)

rick jones

> [3] If you know that there is a certain minimal device device
> response time to driver actions -- or a certain minimal response
> on the far end of an interconnect you're talking through (hint:
> MPI) -- then it can be helpful to make the "dally" time just
> larger than this minimal response time, to avoid unnecessary
> "holdoff" of responses in a rapid flurry of exchanges. E.g., in
> one case a certain common control command from the driver would,
> after a short delay, result in a flurry of status change events
> from the device. Adding a dally time just longer than that short
> delay lowered the CPU utilization by more than *half*!

> In most typical applications, the optimal dally time will be a
> small fraction of the holdoff time. And in any case, the peaks
> for "good" values of both parameters are rather broad.

Sounds a little like deciding how long to spin on a mutex before going
into the "other guy has it" path :)

rick jones
--
oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: robertwessel2 on
On Jun 4, 3:33 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
> The 16450 rs232 chip could be programmed to delay interrupts until a
> given percentage of the 16-entry FIFO buffer had been consumed, but a
> receive irq handler could still see that the buffer was non-empty by
> polling the status.


To be pedantic, that was the 16550A. The 16450 was basically an 8250
clone, with official support for higher speeds.

Of course programming the 16550 and using the buffer was complicated
by a number of bugs, not least being its propensity of the write FIFO
getting stuck if you put a single byte into it at just the wrong time.
From: FredK on

"Rick Jones" <rick.jones2(a)hp.com> wrote in message
news:hubfhf$o27$4(a)usenet01.boi.hp.com...
> Rob Warnock <rpw3(a)rpw3.org> wrote:
>> 3. "Coalescing": The device driver interrupt service routine shall
>> *continue* to poll for attention status and continue to service
>> the device until the attention status goes false. [In fact, in
>> some systems it's a good idea if it polls the *first* time into
>> the ISR as well, to deal with potential spurious interrupts. But
>> that's another story...]
>
> I trust that is something other than a PIO Read?
>

On a typical PCI device this is a PIO register read. On some devices, the
ISR read itself may ack the interrupt, but more often it's a write to the
ISR to reset it. Worse, is that it is often followed by another read which
both forces the write to be completed before the read (stalling the bus and
CPU) before you see the ISR to see if there is another interrupt that can be
serviced.

The attempt to fix this is changing from pin based interrupts to message
based interrupts (MSI) on PCIe. It allows for the possibility of a variety
of interrupts from a device that don't (necessarily) need to be acked simply
to discover what the interrupt itself was - which causes the bus to come to
a screeching halt. Which isn't good if you have multiple slots on a single
bus.

But while all this is nice, none of this has anything to do with
*eliminating* the interrupt mechanism or interrupts. Just ways to mimimize
the number and cost of IO interrupts. None of which is particularly new. I
wouldn't even call some of the things being described here as "polling". It
isn't unsual for a driver concerned about latency to sometimes spin "a
little" when it gets an input interrupt on the assumption that things come
in bursts. But unless you do it as a hard spin - it's useless. And a hard
spin wastes CPU time (of course unless you have CPUs to burn).