VLIW pre-history [Computer Architecture]

Prev: Multiple Clock Domains on UP3
Next: Fast string functions

From: Alan Charlesworth on 16 May 2007 09:37

In article <1179300522.622550.274910(a)l77g2000hsb.googlegroups.com>,
Quadibloc <jsavard(a)ecn.ab.ca> wrote:

> Eric Smith wrote:
> > Quadibloc wrote:
> > > But I would think that there is a very simple definition of VLIW.
> >
> > Once upon a time I thought that of RISC, too. But that war has been
> > lost.
> >
> > > Instructions explicitly code for superscalar operation.
> >
> > Sounds reasonable to me. So an i860 qualifies?
> .
> I don't know enough about the i860 to answer that at the moment.
> .
> Looking at the manual for the AP-120B, it had 64-bit instructions that
> coded for an "S-pad" operation, a floating adder operation, and a
> floating multiplier operation. I would say it *definitely* qualifies
> as a VLIW architecture. Even if that term only arose later with
> machines like the Cyberplus, this is the same sort of thing, but in a
> smaller version. (The assembly language for it obscures the fact that
> adder and multiplier instructions are really part of the same
> instruction word.)
>
> John Savard

The AP-120B came in 1974, before he term VLIW was coined. We called it
"horizontal microcode". Each functional unit had its own op-code field
in the 64-bit instruction word for integer op, FADD, FMUL, memory fetch,
conditional branch. etc. Load-use latencies were visible, 2 cycles for
fadd, and 3 for fmul and memory fetch.

One could write a single-instruction loop that would do a dot-product.
One had to write appropriate prologue code to get into he loop, and an
epilogue to get out of he loop. We didn't originally to epilogues, since
we didn't care if we fetched data beyond the end of an array, since
there wasn't any memory protection then.

We called this style of coding "software pipelining." Much later, this
became relevant to superscalar microprocessors. Cydrome later added
opcode bits to their instructions to control getting into and out of a
loop. We always had explicit code, which used up the instruction memory
(later cache). By the late 70s, we had a Fortran compiler attempting
vectorization and SW-pipelined code generation

From: Quadibloc on 16 May 2007 10:00

Alan Charlesworth wrote:
> The AP-120B came in 1974, before he term VLIW was coined. We called it
> "horizontal microcode".

True enough. But unlike other things out there with what we *still*
call horizontal microcode, like a System/360 Model 85, the AP-120B had
these distinctive characteristics:

- Two of the functional units controlled in an instruction word, the
FP adder and the FP multiplier, both worked *directly* on that user
data calculations upon which were the object of an applications
program.

- This excludes operations related to address calculation - otherwise,
every computer with an indexing bit in its instructions would be VLIW,
and the term would lose all meaning.

- The instruction set in which the multiple operations were controlled
was documented and directly available to the applications programmer.

In general, a conventional computer which had high performance from
using horizontal microcode did not have these characteristics. So,
despite the fact that the instruction format of a VLIW machine - even
a canonical VLIW machine like the Cyberplus - looks a *lot* like
microcode, there are key differences.

If you tried to write microcode in the Cyberplus machine language, or
the AP-120B machine language, you would find that, except for some
exotic special-purpose instructions, usually you could only use one
functional unit at a time, because conventional instructions don't
call for multiple operations.

Instead of being an efficient way of implementing a conventional
instruction set, like horizontal microcode, VLIW is something whose
benefits would be lost if used for microprogramming. (Decoupled
microarchitecture in superscalar machines, though, lets the power of
VLIW shine through to conventional instructions - whether they're CISC
or RISC.)

John Savard

From: Eric P. on 16 May 2007 10:57

Quadibloc wrote:
>
> Thus, while some horizontal microcode machines might be said to in
> some way be precursors of VLIW, it's a stretch. With the AP-120B, on
> the other hand, the resemblance is clear and unmistakable.
>
> Superficially, VLIW instructions are microcode-like. You direct the
> movement of data between functional units and internal registers. But
> VLIW presents multiple functional units, applicable to user problems,
> in a manner not useful for the microcoding of conventional
> instructions, so they _can_ be distinguished.

An important aspect of horizontal microcode is that it usually
includes a next address field in the microword so the sequencer
control can be done in parallel, as this AP-120B apparently does.

As more 'pipeline stages' were added to the micro-sequencer
(using an older definition of the word pipeline where it refers
to register buffers that allows concurrency in the sequencer)
to overlap next address generation with instruction lookup,
then the next address selection had to occur more and more
instructions ahead of when the jump must actually take place.

Eric

From: Quadibloc on 16 May 2007 14:20

Eric P. wrote:
> An important aspect of horizontal microcode is that it usually
> includes a next address field in the microword so the sequencer
> control can be done in parallel, as this AP-120B apparently does.
..
It is true that normal AP-120B instructions do always include a
"branch group" section with a displacement. But there is also a
condition attached to that displacement, so normal operation has it
going to the next instruction word; a conditional jump instruction
simply is combined with other operations.

This design still requires a program counter that is incremented with
each instruction: one with a next address field would require a
_second_ address field for the case where branching is to take place.

John Savard

From: Quadibloc on 16 May 2007 14:56

Eric Smith wrote:
> Sounds reasonable to me. So an i860 qualifies?

Hunting around the web, although I could not find any detailed
description of the i860 instruction set, one "white paper" on VLIW
architectures from Philips noted that the i860 could switch to a mode
where it fetched instructions in pairs, such that one instruction had
to be for the integer unit, and the other for the floating-point unit.

That feature would bring it within my definition of VLIW much more
securely than, say, the parallelism bits on the TMS320, which is also
advertised as VLIW.

John Savard

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Prev: Multiple Clock Domains on UP3
Next: Fast string functions