The Perfect Computer - 36 bits? [Computer Architecture]

Prev: Searching for the PDP-3
Next: superscalar and superpipelined

From: Andrew Swallow on 6 Mar 2007 20:35

Morten Reistad wrote:
> In article <45edf250$0$24789$4c368faf(a)roadrunner.com>,
> Peter Flass <Peter_Flass(a)Yahoo.com> wrote:
>> Morten Reistad wrote:
>> ...
>>> For a while you were very close to reinventing the PDP10.
>>>
>>> The design only has two hurdles that limit deployment in modern
>>> silicon :
>>>
>>> * Effective address calculation through indirection that will lead to
>>> huge pipelining problems.
>>> * Too small memory space.
>>>
>> Sounds like a great design to me. I'm not an architecture person, but
>> there has to be a way to handle indirection that avoids most of the
>> pipeline problems. How about (and I'm *really* not a hardware person)
>> having the instruction decoder plug indirect addreses into an
>> associative memory when it finds them, and then computes the effective
>> address. If the indirect address is modified before it's used then the
>> pipeline somewhere can look for a hit in the associative array, and only
>> then stall to recompute the effective address. The usual case won't
>> suffer any degredation.
>
> I was commenting on the PDP-10 ISA. It is a regular, word-oriented
> architecture with 16 registers.
>
> So far so good. There are skip instructions there. They are pretty
> regular, and a dual specualtive instruction should be able to handle
> them. A hurdle, but nothing big compared to what has been done e.g.
> with the i386 ISA.
>
> Next is the effective address computation.
>
> It is indexed, indirect and can be recursive.
>
> move 1, foo ; moves the contents of foo into register 1
> movei 1,foo ; moves the address of foo into register 1
> move 1,foo(2) ; moves the word foo+contents of register 2 into 1
>
> So far so good.
>
> move 1, @foo ; dereferences foo as a pointer. Simple? Not
> quite. The indirect bit made by "@", bit 13,
> is again dereferenced in foo. So if foo has
> bit 13 set it will do a double dereferece,
> and will check index registers as well. Ad
> infinitum. It can loop easily.
>
> Try to pipeline that!. And this is a simplified version.

If your ram is on board does pipelining produce a large speed up?
Would a simpler design allow say a faster cycle time?

>
> Now, such indirections are pretty rare in most code so a pipeline
> stall is feasible, at least if any multiple indirection/indexing
> is happening.
>
> Next we have the memory size. But that can be solved with more
> bits without hurting much.
>
> One advantage about the tight "segment zero" is that it fits in
> L2 cache of a modern machine.
>
> As an experiment, make a fast PDP10 with L2 cache as ALL the memory,
> treat RAM as PDP10's treat disk, and treat disk as tape.
>
> A really large PDP-10 had 4kW/18Mb of ram. This is a little beyond current
> L2 caches, but 2kW/9MB should be doable. A large installation had 2-4G
> disk. Also doable as current RAM.
>
> Just a thought.

By using ram gates rather cache gates for your L2 can you get it
on chip? In ram, unlike cache gates, you do not have to store the
address.

From: Morten Reistad on 6 Mar 2007 23:32

In article <5cqdnSMmi_tGi3PYnZ2dnUVZ8tWnnZ2d(a)bt.com>,
Andrew Swallow <am.swallow(a)btopenworld.com> wrote:
>Morten Reistad wrote:
>> In article <45edf250$0$24789$4c368faf(a)roadrunner.com>,
>> Peter Flass <Peter_Flass(a)Yahoo.com> wrote:
>>> Morten Reistad wrote:
>>> ...
>>>> For a while you were very close to reinventing the PDP10.
>>>>
>>>> The design only has two hurdles that limit deployment in modern
>>>> silicon :
>>>>
>>>> * Effective address calculation through indirection that will lead to
>>>> huge pipelining problems.
>>>> * Too small memory space.
>>>>
>>> Sounds like a great design to me. I'm not an architecture person, but
>>> there has to be a way to handle indirection that avoids most of the
>>> pipeline problems. How about (and I'm *really* not a hardware person)
>>> having the instruction decoder plug indirect addreses into an
>>> associative memory when it finds them, and then computes the effective
>>> address. If the indirect address is modified before it's used then the
>>> pipeline somewhere can look for a hit in the associative array, and only
>>> then stall to recompute the effective address. The usual case won't
>>> suffer any degredation.
>>
>> I was commenting on the PDP-10 ISA. It is a regular, word-oriented
>> architecture with 16 registers.
>>
>> So far so good. There are skip instructions there. They are pretty
>> regular, and a dual specualtive instruction should be able to handle
>> them. A hurdle, but nothing big compared to what has been done e.g.
>> with the i386 ISA.
>>
>> Next is the effective address computation.
>>
>> It is indexed, indirect and can be recursive.
>>
>> move 1, foo ; moves the contents of foo into register 1
>> movei 1,foo ; moves the address of foo into register 1
>> move 1,foo(2) ; moves the word foo+contents of register 2 into 1
>>
>> So far so good.
>>
>> move 1, @foo ; dereferences foo as a pointer. Simple? Not
>> quite. The indirect bit made by "@", bit 13,
>> is again dereferenced in foo. So if foo has
>> bit 13 set it will do a double dereferece,
>> and will check index registers as well. Ad
>> infinitum. It can loop easily.
>>
>> Try to pipeline that!. And this is a simplified version.
>
>If your ram is on board does pipelining produce a large speed up?
>Would a simpler design allow say a faster cycle time?

Good point. With on-chip RAM the pipeline depth is no longer
so critical.

I am not a hardware person, so I will defer judgement on this.

>> Now, such indirections are pretty rare in most code so a pipeline
>> stall is feasible, at least if any multiple indirection/indexing
>> is happening.
>>
>> Next we have the memory size. But that can be solved with more
>> bits without hurting much.
>>
>> One advantage about the tight "segment zero" is that it fits in
>> L2 cache of a modern machine.
>>
>> As an experiment, make a fast PDP10 with L2 cache as ALL the memory,
>> treat RAM as PDP10's treat disk, and treat disk as tape.
>>
>> A really large PDP-10 had 4kW/18Mb of ram. This is a little beyond current
>> L2 caches, but 2kW/9MB should be doable. A large installation had 2-4G
>> disk. Also doable as current RAM.
>>
>> Just a thought.
>
>By using ram gates rather cache gates for your L2 can you get it
>on chip? In ram, unlike cache gates, you do not have to store the
>address.

I already accounted for that. 6 MB seems to be the current, std limit
for L2 cache, so 9 MB for "L2 RAM" is streatching it a bit.

-- mrr

From: Quadibloc on 7 Mar 2007 08:33

Walter Bushell wrote:
> How about 9 bit characters? Or even 12. One could get a great extended
> ASCII that would cover most of the world's languages with 12 bits.

Indeed. And wouldn't 18 bit characters be wide enough to cover most of
the existing Unicode extensions beyond the Basic Multilingual Plane?

John Savard

From: Quadibloc on 7 Mar 2007 08:36

Eugene Miya wrote:
> A step backward John.
> The high-end LISP hackers attempted a 72-bit design over 2 decades ago
> with the S-1 which was supposed to be DEC-10 compatible. Never finished.

I was waiting for someone to point out that, yes, the perfect computer
*does* have a 36-bit word, and it is the PDP-10.

Of course, *my* idea is to use a 360-like instruction set, but broken
up into 18-bit pieces instead of 16-bit pieces.

John Savard

From: Quadibloc on 7 Mar 2007 08:38

David Kanter wrote:
> Frankly, using non powers of 2 seems like a rather odd design choice,
> and I have trouble thinking of why you'd do it.

Actually, the only time that powers of 2 matter is if one is doing bit
addressing, and since hardly anyone does that, whether the width of a
word is a power of two or not doesn't matter.

The exponent widths I am using - 9 bits and 12 bits for sign, exponent
sign, and exponent - match those used in the IEEE 754 standard. So one
could indeed use IEEE 754 encoding with suppressed first bit, gradual
underflow, NaNs, and the rest.

John Savard

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Searching for the PDP-3
Next: superscalar and superpipelined