From: Robert Myers on
On Mar 3, 11:17 am, timcaff...(a)aol.com (Tim McCaffrey) wrote:

>
> Ok, now I'm confused.  You original post implied to me you did
> not like Cray or what he did.
>
> To be clear myself, I have a great deal of respect for Mr. Cray.  
> Although he made some mistakes, they are really only obvious in
> hindsight.
>

There was an important mistype in my original post. Since I've talked
here frequently how lame, from a user's point of view, current
"supercomputers" are compared to the Cray, I assumed that the mistype
would be obvious.

I also got back a post recently to the effect that "obviously" current
computers don't have the architecture of the Cray 1, so you
"obviously" can't expect computers to do similar things, and my
suggesting that they should be able to was "almost moronic."

Well, some of the architectural changes being discussed *could* make a
current computer feel a lot more like a Cray-1.

Robert.

From: MitchAlsup on
On Mar 2, 1:48 pm, Robert Myers <rbmyers...(a)gmail.com> wrote:
> On Mar 2, 2:12 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:> On 3/2/2010 9:56 AM, Robert Myers wrote:
>
> > Well, some of the details of Mitch's proposal aren't clearly specified.
> >   I can't tell if he intends the off chip DRAM to be part of the
> > processor's address space or not. If it is, then the on-chip DRAM is
> > essentially a level 4 cache.  But it could be that it isn't, in which
> > case, it is more like a fast paging device with some extra features.
>
> Help me out here.  Isn't the page file part of the processor's address
> space?  When something is paged out, you don't have to worry about
> coherence because you can't touch it.

I intend that the ECS be used as a 'paging' area. That is the memory
is not accessible by a load or a store, but is addressible as if the
ECS were a disk with zero rotational latency, and performs transfers
in page sized units with about the delay of a cache line transfer of
current era. Done this way, the size of the coherent domain is samll
enough that coherence checking does not increase memory access time,
but large memory is accessible because the paging is so fast. Much of
the other detail is to make the memory management updates as fast as
the data transfers.

The things different than ECS-of-olde is that the I/O devices are
attached to the ECS and not directly to main memory. This is not a
paging pack--this is more like the index cache of a 300TB database (or
maybe the resident cache of the 300TB database).

I originally considered this system to have a FBDIMM-like multiplexer
4*4 channels and each CPU chip on the motherboard also had 4 FBDIMM-
like channels. With 4 such chips on the motherboard, and a desire to
build systems with as many as 16 motherboards in a system, one needs a
way to provide relatively uniform access to very large memories.
Consider that the motherboards are positioned horizontally, and that
the memory boards are positionied vertically. At each intersection
between the motherboards and the memory boards there is an FBDIMM-like
channel connection. With such an arrangement and 2 layers of the
FBDIMM multiplerer on the motherboards and two layers on the memory
boards, on has concurrent access to as many as 4096 FBDIMMs. (several
TBytes.)

The reason to use an FBDIMM-like multiplexer is that the FBDIMM
channels has low sequential route latency. One does not wait for the
whole request/response to show up before routing it forward. The
proposed format has all routing information in the first DDR beat of
the message, so that one can get from input pin to output pin in less
than 3 ns. Pin speeds will be 6GTs+ ala FBDIMM. Of the 200ns access
time, 50ns is spent routing, 100ns spent in the DRAM access, and 50ns
is spent waiting for one of several conflicting message to complete
their route throught the needed channel.

With the arrangement of the paragraph above, up to 64 processing chips
have access to up to 64 FBDIMM channels with as many as 8 FBDIMMs on
each channel all running concurrently. Each FBDIMM-like channel has 6
GB/s peak throughput and all 64 channels can operate simutaneously. By
the time such a system could be constructed, each processing chip will
have on the order of 16 threads; while up to 1024 threads would be
available if the CPU architecture was more Niagra-like.

One big reason to punt the I/O to ECS is that you really don't want I/
O requests from the 1024-4096 SATA disks in a system of the
aforementioned scale to swamp the memory interconnect on the coherent
side of things. It is expected that only a moderate portion of the
total available BW into the ECS is used by the computation amalgam,
leaving a significant amount of BW to the I/O devices.

There are a "few" software issue to resolve also.

Mitch
From: Robert Myers on
On Mar 3, 10:17 am, n...(a)cam.ac.uk wrote:
> In article <1d917fa8-0be1-47ba-8863-4a10d0817...(a)t20g2000yqe.googlegroups..com>,
> Robert Myers  <rbmyers...(a)gmail.com> wrote:
>
> >On Mar 3, 1:40=A0am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>
> >> So Robert, do I satisfy your prejudices?
> >> :-)
>
> >Prejudices are just prejudices, and I labeled mine as such.
>
> No, only a few of them.
>
I am a veritable seething cauldron of prejudices.

> >The point about Fortran is that you really have to know what the
> >computer is doing in considerably more detail than the language
> >interface describes.
>
> Eh?  If you were to say that about programming in general, it would
> be debatable.  You might, JUST, be able to say that about Fortran
> versus Python or Java.  But it's a bizarre statement to make about
> Fortran without qualification.
>
> What do you mean by it?
>
No one has ever come to you and said, "I only added a print statement
and now my program doesn't work?" Fortran works a lot like c in
handling arrays, and, if you don't understand that a lot of what
Fortran does is some offset from an address (which could be the base
address of a common block), you can get into serious trouble, or at
least have a hard time understanding what's happening.

As Terje has pointed out, not understanding cache can be a serious
handicap. The language interface offers no clue.

If you are a Cray programmer, the language manual does tell you about
special considerations, but the reason for them isn't so easy to
understand without looking at the architecture.

How many examples do you want?

Robert.





From: nmm1 on
In article <093bcd39-12ae-48a4-9add-5ff041c5c9e2(a)a18g2000yqc.googlegroups.com>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>
>> >The point about Fortran is that you really have to know what the
>> >computer is doing in considerably more detail than the language
>> >interface describes.
>>
>> Eh? =A0If you were to say that about programming in general, it would
>> be debatable. =A0You might, JUST, be able to say that about Fortran
>> versus Python or Java. =A0But it's a bizarre statement to make about
>> Fortran without qualification.
>>
>> What do you mean by it?
>>
>No one has ever come to you and said, "I only added a print statement
>and now my program doesn't work?"

Sometimes. It's more often the other way round.

>Fortran works a lot like c in
>handling arrays, and, if you don't understand that a lot of what
>Fortran does is some offset from an address (which could be the base
>address of a common block), you can get into serious trouble, or at
>least have a hard time understanding what's happening.

Eh? Fortran operates nothing like C in this area. The only aspect
where it could be said to is sequence association, and that is
clearly specified in the standard. Fortran has no equivalent of
C's pointer morass, unless you explicitly use its C interoperability
features and shoot yourself in your foot.

I really can't see why you are singling out Fortran. You don't
need to know what the computer is doing in any more detail than
for almost all other languages, and considerably less than you
do for C, C++ or Perl.

>As Terje has pointed out, not understanding cache can be a serious
>handicap. The language interface offers no clue.

And it makes no difference to whether a Fortran program will work,
only to how fast it runs. Again, why Fortran? The same is true
of ALL other languages!


Regards,
Nick Maclaren.
From: Robert Myers on
On Mar 3, 1:22 pm, n...(a)cam.ac.uk wrote:
> In article <093bcd39-12ae-48a4-9add-5ff041c5c...(a)a18g2000yqc.googlegroups..com>,
> Robert Myers  <rbmyers...(a)gmail.com> wrote:
>
>
>
> >> >The point about Fortran is that you really have to know what the
> >> >computer is doing in considerably more detail than the language
> >> >interface describes.
>
> >> Eh? =A0If you were to say that about programming in general, it would
> >> be debatable. =A0You might, JUST, be able to say that about Fortran
> >> versus Python or Java. =A0But it's a bizarre statement to make about
> >> Fortran without qualification.
>
> >> What do you mean by it?
>
> >No one has ever come to you and said, "I only added a print statement
> >and now my program doesn't work?"
>
> Sometimes.  It's more often the other way round.
>
> >Fortran works a lot like c in
> >handling arrays, and, if you don't understand that a lot of what
> >Fortran does is some offset from an address (which could be the base
> >address of a common block), you can get into serious trouble, or at
> >least have a hard time understanding what's happening.
>
> Eh?  Fortran operates nothing like C in this area.  The only aspect
> where it could be said to is sequence association, and that is
> clearly specified in the standard.  Fortran has no equivalent of
> C's pointer morass, unless you explicitly use its C interoperability
> features and shoot yourself in your foot.
>
> I really can't see why you are singling out Fortran.  You don't
> need to know what the computer is doing in any more detail than
> for almost all other languages, and considerably less than you
> do for C, C++ or Perl.
>
> >As Terje has pointed out, not understanding cache can be a serious
> >handicap.  The language interface offers no clue.
>
> And it makes no difference to whether a Fortran program will work,
> only to how fast it runs.  Again, why Fortran?  The same is true
> of ALL other languages!
>
Now I see why you've bristled. I singled out Fortran because, when I
was in school, it was the language that all engineers used or expected
to use, and, for all practical purposes, it was the only language I
used professionally. In practice, I did fairly reckless things with
Fortran and made use of pointer extensions, but, of course you didn't
have to do that sort of thing and plenty around me got in trouble with
vanilla arrays and common blocks. I think Fortran is still a pretty
good language and I'm sorry that it has fallen into disfavor in so
many places.

Robert.