From: Terje Mathisen "terje.mathisen at on
Andrew Reilly wrote:
> On Wed, 26 May 2010 14:30:06 -0700, MitchAlsup wrote:
>
>> In a modern context--one could perform all the I/O of a typical PC in
>> the southbridge chip with an instruction set compatible CPU, and only
>> interrupt the main CPU(s) after performing the I/O and updating al the
>> queues. Here, the interrupt would stop one CPU, direct it to the run
>> queue, where it would pick up a new higher priority unit of work, and
>> context switch thereto.
>>
>> Such a CPU would still be miniscule compared to the size of these modern
>> Southbridge chips.
>
> At Uni I actually worked on a computer that nearly fitted that
> description. A Sony NEWS3860 workstation. I suspect that it was the
> evolution of an earlier 680x0 machine, and rather than replacing the
> 680x0, the designers kept it, and relegated it to running device drivers,

Norsk Data did the same with the ND100 and ND500 machines:

They had no proper OS for the original ND10, so lots of device drivers
and other low-level stuff had been written by users, in asm of course.

When they developed the new 100 series systems, with a brand new cpu
architecture, they bolted an ND10 on as a front-end processor, letting
it handle all the IO.

One of the problems was that if you wrote a program which was
essentially an external device to external device channel, it would run
pretty much exclusively in the front end:

With nearly zero main cpu use, the "operating system" (Sintran?) would
soon determine that the channel program was idle and stop scheduling
even that tiny amount of cpu time which was really needed in the core.

The only workaround was the classic one: Insert a dummy loop in the main
program that simply spun around eating core cpu time. :-(

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Robert Myers on
On May 26, 5:30 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
> On May 26, 4:08 pm, n...(a)cam.ac.uk wrote:
>
> >     2) The context of this wasn't interrupts versus something else,
> > but funnelling ALL such actions though a single mechanism that is
> > unsatisfactory for almost all of them.  For example, there is
> > absolutely NO reason why a floating-point fixup need execute a
> > FLIH in God mode, only to be restored to the mode of the process
> > that was interrupted.  The Ferranti Atlas/Titan and ICL 1900
> > didn't do it.
>
> Forgot the CDC6600 that did no interrupt processing whatsoever.
>
> The PPs (perifferal processors) performed the I/O (polling) and then
> scheduled the subsequent work for the CPU(s) and if the CPU was
> to be rescheduled, it was directed away from the taks at hand,
> immediately tothe subsequent task in a single context switch
> from usermode, to usermode in a single instruction!
>
> In a modern context--one could perform all the I/O of a typical PC
> in the southbridge chip with an instruction set compatible CPU,
> and only interrupt the main CPU(s) after performing the I/O and
> updating al the queues. Here, the interrupt would stop one CPU,
> direct it to the run queue, where it would pick up a new higher
> priority unit of work, and context switch thereto.
>
> Such a CPU would still be miniscule compared to the size of
> these modern Southbridge chips.

It was the CDC6600 and similar machines that tempted me to blame what
looked to me like a step backward on the attack of the killer micros.

It seems to me that what you are proposing would move not only
interrupts but also at least some bandwidth burden away from the CPU.

I speculate that no one does this because, in the end, it just isn't
worth it, but I don't really know. It's easy to see why there would
be a niche in the server space for offloading I/O processing, but I
keep trying to imagine ways in which the lowly pc (er, workstation)
could be more responsive without simply jacking up the frequency of
everything.

Robert.

From: Tim McCaffrey on
In article <htk6u3$tk0$1(a)usenet01.boi.hp.com>, rick.jones2(a)hp.com says...
>
>Tim McCaffrey <timcaffrey(a)aol.com> wrote:
>
>> Because we wrote all the software on the card we didn't have to
>> force fit what the hardware did with the model the OS wanted. The
>> neat thing was the busier the card got, the more efficient it was
>> (basically from queueing effects). It can handle 80K frames a
>> second, worst case (when the I/O requests didn't take advantage of
>> multiple frame sends). So, I guess that works out to ~140 bytes of
>> data per frame? (in one direction)
>
>I think that is incorrect - not that I'm immune to math mistakes :)
>but I think 80K, 140 byte frames per second would be 11200000 bytes
>per second or 89.6 Mbit/s. So, for ~GbE speed it would need to be
>either 800K frames per second, or 1400 bytes per frame.
>

Right, it was 1400 byte frames.

I would probably have gotten it to go faster, but we didn't get around to
doing a full optimization effort, as the performance was well above
requirements, at the time.

- Tim

From: Morten Reistad on
In article <htjt4h$pi0$1(a)usenet01.boi.hp.com>,
FredK <fred.nospam(a)dec.com> wrote:
>
><nmm1(a)cam.ac.uk> wrote in message
>news:htjphb$edj$1(a)smaug.linux.pwf.cam.ac.uk...
>> In article
>> <7b6474ee-81dd-4ab7-be1d-756a544ed515(a)u7g2000vbq.googlegroups.com>,
>> Robert Myers <rbmyersusa(a)gmail.com> wrote:

>You are on a PDP11 and you want to have IO. Propose the alternative to
>interrupts that provides low latency servicing of the device. Today you can
>create elaborate IO cards with offload engines, but ultimately you need to
>talk to the actual computer system which is otherwise engaged in general
>computing.

If that PDP11 has a good number of processors, dedicating one of them
to handle low-level I/O; and build a FIFO with hardware-provided
atomic reads and writes (not _that_ hard to do) and simple block
on read on that should solve that.

>On many (most? all?) of todays SMP systems you can generally (depending on
>the OS I suppose) direct IO interrupts to specific CPUs and have CPUs that
>may get no interrupts (or only limited interrupts like a clock interrupt).

You would still need to signal other cpu's, but that signal does not
have to be a very precise interrupt. That cpu can easily handle a
few instructions more before responding. It could e.g. easily run it's
pipeline dry first.

With lots and lots of almost-as-fast processors we need to rethink a
lot what we do.

-- mrr
From: FredK on

"Morten Reistad" <first(a)last.name> wrote in message
news:rpo2d7-ho3.ln1(a)laptop.reistad.name...
> In article <htjt4h$pi0$1(a)usenet01.boi.hp.com>,
> FredK <fred.nospam(a)dec.com> wrote:
>>
>><nmm1(a)cam.ac.uk> wrote in message
>>news:htjphb$edj$1(a)smaug.linux.pwf.cam.ac.uk...
>>> In article
>>> <7b6474ee-81dd-4ab7-be1d-756a544ed515(a)u7g2000vbq.googlegroups.com>,
>>> Robert Myers <rbmyersusa(a)gmail.com> wrote:
>
>>You are on a PDP11 and you want to have IO. Propose the alternative to
>>interrupts that provides low latency servicing of the device. Today you
>>can
>>create elaborate IO cards with offload engines, but ultimately you need to
>>talk to the actual computer system which is otherwise engaged in general
>>computing.
>
> If that PDP11 has a good number of processors, dedicating one of them
> to handle low-level I/O; and build a FIFO with hardware-provided
> atomic reads and writes (not _that_ hard to do) and simple block
> on read on that should solve that.
>

The DEC PDP-11 was a single processor minicomputer from the 70's (there was
a dual CPU 11/74 IIRC). On these systems it was not feasible from a
cost/complexity viewpoint to implement "IO channel" processors. Just as it
wasn't reasonable for those systems/CPUs that ultimately resulted in the
x86.

>>On many (most? all?) of todays SMP systems you can generally (depending
>>on
>>the OS I suppose) direct IO interrupts to specific CPUs and have CPUs that
>>may get no interrupts (or only limited interrupts like a clock interrupt).
>
> You would still need to signal other cpu's, but that signal does not
> have to be a very precise interrupt. That cpu can easily handle a
> few instructions more before responding. It could e.g. easily run it's
> pipeline dry first.
>

For an IO interrupt, the other CPU's don't need to be informed about the
interrupt itself. As to when the CPU is ready to handle an interrupt, that
is I imagine CPU architecture specific (running pipelines "dry", etc). I
believe the general gripe about interrupts (and by extension "faults" in
general which tend in many cases to be handled using the same mechanism) is
the overhead of saving and subsequently restoring the current context.

> With lots and lots of almost-as-fast processors we need to rethink a
> lot what we do.
>