From: John L on
>|> VLIW isn't just wide microcode with multiple functional units. It's
>|> also the compiler techniques with trace scheduling and speculative
>|> execution that let it keep large numbers of units busy.
>
>That sounds a bit religious! Remember that "RISC isn't just a
>reduced instruction set"?

Well, I admit to having been at Yale when Fisher, Ellis, Ruttenberg,
et al were doing their early VLIW work, although I didn't do anything
for the project other than hacking together a Fortran parser for
Ellis.

There wasn't anything particularly innovative about lashing up a lot
of functional units. The FPS machines had already done that, and
gotten to the point where they were nearly impossible to program
efficiently by hand. The interesting part of VLIW was the new
algorithms to both find enough parallelism in Fortran and C programs
to keep all those units busy, and to prove that doing the work in
parallel would get the right result and not mess up data dependencies.

>|> The top end Multiflow machine had 28 functional units directly
>|> controlled by VLIW code. My impression is that mainframe microcode
>|> has far fewer.

>Yes. But at least some were definitely VLIW.

Really? How did they write the microcode? If it was by hand, I'll
grant that they had a lot of functional units, but it wasn't VLIW.

R's,
John


From: Nick Maclaren on

In article <f25o20$23vn$1(a)gal.iecc.com>, johnl(a)iecc.com (John L) writes:
|>
|> >|> The top end Multiflow machine had 28 functional units directly
|> >|> controlled by VLIW code. My impression is that mainframe microcode
|> >|> has far fewer.
|>
|> >Yes. But at least some were definitely VLIW.
|>
|> Really? How did they write the microcode? If it was by hand, I'll
|> grant that they had a lot of functional units, but it wasn't VLIW.

So, if you provide me with a CPU that you swear is VLIW, and I program
it by hand, it stops being VLIW? That is definitely a religious
viewpoint!


Regards,
Nick Maclaren.
From: Quadibloc on
Nick Maclaren wrote:
> So, if you provide me with a CPU that you swear is VLIW, and I program
> it by hand, it stops being VLIW? That is definitely a religious
> viewpoint!
..
There is _definitely_ a distinction between horizontal microcode and
VLIW.

The meaning of VLIW includes *superscalar*: unless you are
independently executing separate microcode streams for fields in your
source instruction, or generating microcode on the fly - in which
case, it isn't microcode anymore, you have a decoupled
microarchitecture, which _can_ be superscalar - it's pretty hard for a
conventionally microcoded machine to be superscalar.

Unless, of course, the instructions are so complicated that the
microcode for them performs enough elementary operations as to permit
superscalar operation at that level. My point is: in general, if
you've got instructions like "add" and "multiply", and microcode is
handling them, it's hard to make effective use of any ability to add
and multiply at the same time.

John Savard

From: Anne & Lynn Wheeler on

Quadibloc <jsavard(a)ecn.ab.ca> writes:
> Unless, of course, the instructions are so complicated that the
> microcode for them performs enough elementary operations as to permit
> superscalar operation at that level. My point is: in general, if
> you've got instructions like "add" and "multiply", and microcode is
> handling them, it's hard to make effective use of any ability to add
> and multiply at the same time.

no, the instruction are so simple ... there are things like start
transfer of from register to functional unit. a single horizontal
microcode instruction is controlling possibly half-dozen or more
functional units that are operating in parallel w/o interlocks.

one of the reasons i mentioned that horizontal microcode was quoted in
avg. number of 370 instructions per machine cycle was that there could
be different operations for multiple 370 instructions going on in
parallel/overlapped ... so they counted the avg. number of 370
instructions that completed in some unit of machine cycles. ref
http://www.garlic.com/~lynn/2007j.html#84 VLIW pre-history

part of the complexity for the horizontal microcoder was that they had
to constantly keep track of which operations were in flight and
approximately how many instructions later could they start the next
operation (i.e. had transfer finished so add operation could be kicked
off, etc).

besides FS
http://www.garlic.com/~lynn/subtopic.html#futuresys

contributing to creation of early 801/RISC
http://www.garlic.com/~lynn/subtopic.html#801

.... i.e. exact opposite from future system in terms of hardware
complexity ... there were (801 related) comments in the 70s like all
operations being fixed single cycle (in part eliminating the type of
complexity and variability that the horizontal microcoders were having
to deal with) and no cache consistency (eliminating the significant
cache consistency overhead/slowdown that was going on in high-end 370s).
From: Nick Maclaren on

In article <1179070084.395808.299680(a)u30g2000hsc.googlegroups.com>,
Quadibloc <jsavard(a)ecn.ab.ca> writes:
|>
|> > So, if you provide me with a CPU that you swear is VLIW, and I program
|> > it by hand, it stops being VLIW? That is definitely a religious
|> > viewpoint!
|> .
|> There is _definitely_ a distinction between horizontal microcode and
|> VLIW.
|>
|> The meaning of VLIW includes *superscalar*: unless you are
|> independently executing separate microcode streams for fields in your
|> source instruction, or generating microcode on the fly - in which
|> case, it isn't microcode anymore, you have a decoupled
|> microarchitecture, which _can_ be superscalar - it's pretty hard for a
|> conventionally microcoded machine to be superscalar.

I am sure that IBM would love to know that the microcode of several
of the System/370 range was not really microcode, but that seems to
be an equally religious viewpoint.

Yes, it executed the different parts in parallel - that was precisely
the reason for using a VLIW format. Remember that mainframe CPUs had
lot of semi-independent functional units.


Regards,
Nick Maclaren.
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Multiple Clock Domains on UP3
Next: Fast string functions