From: Peter Flass on
Morten Reistad wrote:
....
>
> For a while you were very close to reinventing the PDP10.
>
> The design only has two hurdles that limit deployment in modern
> silicon :
>
> * Effective address calculation through indirection that will lead to
> huge pipelining problems.
> * Too small memory space.
>

Sounds like a great design to me. I'm not an architecture person, but
there has to be a way to handle indirection that avoids most of the
pipeline problems. How about (and I'm *really* not a hardware person)
having the instruction decoder plug indirect addreses into an
associative memory when it finds them, and then computes the effective
address. If the indirect address is modified before it's used then the
pipeline somewhere can look for a hit in the associative array, and only
then stall to recompute the effective address. The usual case won't
suffer any degredation.

I'd love to see it running.

From: Peter Flass on
Torben �gidius Mogensen wrote:

> Walter Bushell <proto(a)oanix.com> writes:
>
>
>>In article <7zejo2fyar.fsf(a)app-0.diku.dk>,
>> torbenm(a)app-0.diku.dk (Torben AEgidius Mogensen) wrote:
>>
>>
>>>A more logical intermediate step between 32 and 64 bits is 48 bits --
>>>you have a whole number of 8 or 16 bit characters in a word, so you
>>>can still have byte addressability. But power-of-two words do have a
>>>clear advantage in alignment and fast scaling of indexes to pointers.
>>>
>>>If you want 36-bit word, you should consider 6-bit characters, so you
>>>have a whole number of characters per word -- that was done on some
>>>older computers (like the UNIVAC 1100), which used fieldata
>>>characters.
>>>
>>
>>How about 9 bit characters? Or even 12. One could get a great extended
>>ASCII that would cover most of the world's languages with 12 bits.
>
>
> UNIVAC 1100 also used a 9-bit ASCII. I don't recall what extra
> characters (if any) were added.
>

Honeywell systems also used 9-bit bytes. Multics mostly used eight,
except for a few programs like "compose" that used the extra bit for
internal control codes.

From: Andrew Swallow on
Walter Bushell wrote:
> In article <7zejo2fyar.fsf(a)app-0.diku.dk>,
> torbenm(a)app-0.diku.dk (Torben AEgidius Mogensen) wrote:
>
>> "Quadibloc" <jsavard(a)ecn.ab.ca> writes:
>>
>>> I have started a page exploring an imaginary 'perfect' computer
>>> architecture.
>>>
>>> Struggling with many opcode formats with which I was not completely
>>> satisfied in my imaginary architecture that built opcodes up from 16-
>>> bit elements, I note that an 18-bit basic element for an instruction
>>> solves the problems previously seen, by opening up large vistas of
>>> additional opcode space.
>>>
>>> Even more to the point, if one fetches four 36-bit words from memory
>>> in a single operation, not only do aligned 36-bit and 72-bit floats
>>> fit nicely in this, but so do 48-bit floating-point numbers.
>>> [...]
>>> I think such an architecture is too great a departure from current
>>> norms to be considered, but this seems to be a disappointment, as it
>>> seems that it has many merits - involving being neither too big nor
>>> too small, but "just right".
>> Need for more opcode space is not a very good reason to increase the
>> word-size (as used for numbers etc.) -- Many processors have opcodes
>> that are of a different size than the wordsize. Also, the trend these
>> days seems to be decreasing opcode size -- several 32-bit RISC CPUs
>> have added 16-bit opcodes to reduce code size. If you can't fit what
>> you want into a single 32-bit word, you might consider splitting some
>> instructions in two -- you pay when you use these, but not when using
>> instructions that fit into 32 bits, unlike if you go to a uniform
>> 36-bit opcode, where all instructions pay for the size of the largest.
>>
>> And fixed-size opcodes seems to be on the way out also -- Thumb2
>> freely mixes 16 and 32 bit instructions, and in x86 that has a very
>> variable opcode size, handling this takes up only a small fraction of
>> the die-space, and with caching of decoded instructions, the time
>> overhead is also very limited.
>>
>> As for using 36 bits to increase number precision over 32 bits, the
>> step is too small, and the effort of handling strings without waste is
>> a considerable complication (in particular in C-like languages, where
>> you expect to have pointers to individual characters in a string).
>>
>> A more logical intermediate step between 32 and 64 bits is 48 bits --
>> you have a whole number of 8 or 16 bit characters in a word, so you
>> can still have byte addressability. But power-of-two words do have a
>> clear advantage in alignment and fast scaling of indexes to pointers.
>>
>> If you want 36-bit word, you should consider 6-bit characters, so you
>> have a whole number of characters per word -- that was done on some
>> older computers (like the UNIVAC 1100), which used fieldata
>> characters.
>>
>> Torben
>
> How about 9 bit characters? Or even 12. One could get a great extended
> ASCII that would cover most of the world's languages with 12 bits.

Unicode is up to 100,000 characters. You can put that in 18 bits.
2 * 18 = 36 bits

If Chinese needs 262,143 - 100,000 = 162,143 entire words will be needed
to store a single character.

Andrew Swallow
From: Morten Reistad on
In article <45edf250$0$24789$4c368faf(a)roadrunner.com>,
Peter Flass <Peter_Flass(a)Yahoo.com> wrote:
>Morten Reistad wrote:
>...
>>
>> For a while you were very close to reinventing the PDP10.
>>
>> The design only has two hurdles that limit deployment in modern
>> silicon :
>>
>> * Effective address calculation through indirection that will lead to
>> huge pipelining problems.
>> * Too small memory space.
>>
>
>Sounds like a great design to me. I'm not an architecture person, but
>there has to be a way to handle indirection that avoids most of the
>pipeline problems. How about (and I'm *really* not a hardware person)
>having the instruction decoder plug indirect addreses into an
>associative memory when it finds them, and then computes the effective
>address. If the indirect address is modified before it's used then the
>pipeline somewhere can look for a hit in the associative array, and only
>then stall to recompute the effective address. The usual case won't
>suffer any degredation.

I was commenting on the PDP-10 ISA. It is a regular, word-oriented
architecture with 16 registers.

So far so good. There are skip instructions there. They are pretty
regular, and a dual specualtive instruction should be able to handle
them. A hurdle, but nothing big compared to what has been done e.g.
with the i386 ISA.

Next is the effective address computation.

It is indexed, indirect and can be recursive.

move 1, foo ; moves the contents of foo into register 1
movei 1,foo ; moves the address of foo into register 1
move 1,foo(2) ; moves the word foo+contents of register 2 into 1

So far so good.

move 1, @foo ; dereferences foo as a pointer. Simple? Not
quite. The indirect bit made by "@", bit 13,
is again dereferenced in foo. So if foo has
bit 13 set it will do a double dereferece,
and will check index registers as well. Ad
infinitum. It can loop easily.

Try to pipeline that!. And this is a simplified version.

Now, such indirections are pretty rare in most code so a pipeline
stall is feasible, at least if any multiple indirection/indexing
is happening.

Next we have the memory size. But that can be solved with more
bits without hurting much.

One advantage about the tight "segment zero" is that it fits in
L2 cache of a modern machine.

As an experiment, make a fast PDP10 with L2 cache as ALL the memory,
treat RAM as PDP10's treat disk, and treat disk as tape.

A really large PDP-10 had 4kW/18Mb of ram. This is a little beyond current
L2 caches, but 2kW/9MB should be doable. A large installation had 2-4G
disk. Also doable as current RAM.

Just a thought.

-- mrr
From: Eugene Miya on
A step backward John.
The high-end LISP hackers attempted a 72-bit design over 2 decades ago
with the S-1 which was supposed to be DEC-10 compatible. Never finished.


--