Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: Anne & Lynn Wheeler on 24 Dec 2009 14:00

re:
http://www.garlic.com/~lynn/2009s.html#18 Larrabee delayed: anyone know what's happening?
http://www.garlic.com/~lynn/2009s.html#20 Larrabee delayed: anyone know what's happening?
http://www.garlic.com/~lynn/2009s.html#22 Larrabee delayed: anyone know what's happening?

some one of the first SANs was at NCAR with pool of IBM CKD dasd, an IBM
43xx (midrange) mainframe, some number of "supercomputers", and
HYPERCHannel.

all the processors could message each other over HYPERChannel and also
access the disks. The IBM mainframe acted as SAN controller ... getting
requests (over hyperchannel) for data ... potentially have to first
stage it from tape to disk ... using real channel connectivity to ibm
disks.

ibm disk controllers had multiple channel connectivity ... at least one
to the "real" ibm channel and one to the HYPERChannel remote device
adapter, emulated channel. The A515 was an upgraded remote device
adapter that had capability of downloading both the full channel program
into local memory ... as well as support for the dasd seek/search
arguments into local memory (could distinquish between address
references for the seek/search arguments in local memory via-a-vis the
read/write transfers that involved "host" memory addresses.

the ibm mainframe would load the channel program (to satisfy the data
request, from some supercomputer) into the memory of the A515 ... and
then respond to the requesting supercomputer with the "handle" of the
channel program in one of the A515s. The supercomputer would then make a
request to that A515 for the execution of that channel program
.... transferring the data directly to the supercomputer ... w/o having
to go thru the ibm mainframe memory ... basically "control" went thru
ibm mainframe ... but actual data transfer was direct.

later, there was standardization work on HIPPI (and FCS) switches to
allow definition of something that would simulate the NCAR HYPERchannel
environment and the ability to do "3rd party transfers" ... directly
between processors and disks ... w/o having to involve the control
machine (setting it all up) in the actual data flow.

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970

From: Terje Mathisen "terje.mathisen at on 24 Dec 2009 15:45

Andy "Krazy" Glew wrote:
> Terje Mathisen wrote:
>>> Since the whole point of this exercise is to try to reduce the overhead
>>> of cache coherency, but people have demonstrated they don't like the
>>> consequences semantically, I am trying a different combination: allow A,
>>> multiple values; allow B weak ordering; but disallow C losing writes.
>>>
>>> I possibly that this may be more acceptable and fewer bugs.
>>>
>>> I.e. I am suspecting that full cache coherency is overkill, but that
>>> completely eliminating cache coherency is underkill.
>>
>> I agree, and I think most programmers will be happy with word-size
>> tracking, i.e. we assume all char/byte operations happens on private
>> memory ranges.
>
> Doesn't that fly in the face of the Alpha experience, where originally
> they did not have byte memory operations, but were eventually forced to?
>
> Why? What changed? What is different?

What's different is easy:

I am not proposing we get rid of byte-sized memory operations, "only"
that we don't promise they will be globally consistent, i.e. you only
use 8/16-bit operations on private memory blocks.
>
> Some of the Alpha people have said that the biggest reason was I/O
> devices from PC-land. Sounds like a special case.
>
> I suspect that there is at least some user level parallel code that
> assumes byte writes are - what is the proper term? Atomic? Non-lossy?
> Not implemented via a non-atomic RMW?

There might be some such code somewhere, but only by accident, I don't
believe anyone is using it intentionally: You want semaphores to be
separated at least by a cache line if you care about performance, but I
guess it is conceivable some old coder decided to pack all his lock
variables into a single byte range.
>
> Can we get away with having byte and other sub-word writes? Saying that
> they may be atomic/non-lossy in cache memory, but not in uncached remote
> memory. But that word writes are non-lossy in all memory types? Or do we
> need to having explicit control?

I think so.

>> Seems to work better than your wan-based tablet posts!
>
> Did you mean "van"? Are you using handwriting or speech recognition? :-)

I'm using "non-native language" human spell checking.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: "Andy "Krazy" Glew" on 24 Dec 2009 16:26

Robert Myers wrote:
> I can't see anything about channels that you can't do with modern PC I/O.

AFAIK there isn't much that IBM mainframe channels could do that modern
PC I/O controllers cannot do. Even a decade ago I saw SCSI controllers
that were more sophisticated than IBM channels.

If anything, the problem is that there are too many different PC I/O
controllers with similar, but slightly different, capabilities.

Perhaps the biggest thing IBM channels had (actually, still have) going
for them is that they are reasonably standard. They are sold by IBM,
not a plethora of I/O device vendors. They can interface to many I/O
devices. You could write channel programs without too much fear of
getting lockind in to a particular device (although, of course, you were
largely locked in to IBM).

Plus, of course, IBM channels were fairly well implemented.

From time to time Intel tried to create its own generical channel
controllers. Even back in the 80s. But, unfortunately, sometimes it
was a net performance loss to use these devices, particularly for
latency sensitive applications.

From: Bernd Paysan on 24 Dec 2009 17:16

Andy "Krazy" Glew wrote:
> However, I think that these problems, although solvable, are the
> reason why specialized processors, such as channel processors or
> active messaging, have not dominated:
>
> a) security issues
>
> b) portability issues.
>
> I suspect the latter are worst. If you write code for a specialised
> SCSI or IPI channel processor (see, these things are not unknown
> outside the world of mainframes), you are locked in.

The latter is obvious. When you want a successful active messaging
system, you must solve the portability problem. Either by having an
industry standard instruction set (like x86 is for PCs), or (IMHO ways
better) by using source code. You may want to send actual source code
around (as my friend does), or you may want to compile from source
before you start. There are intermediate forms like tokenized source or
virtual machines; they have their place, but are of more limited
interest (if you care about bandwidth, compress your source; a
dictionary based system effectively tokenizes).

In general, I would tend to send actual source code around when the
throughput and latency is relatively high compared to the speed of the
nodes, and when the nodes are very heterogeneous. Well-known and widely
used example: JavaScript. Sending source around doesn't mean
interpreters: It is better to use incremental compilers (example: Forth,
recent JavaScript engines, OpenCL). Using source also doesn't
necessarily mean "send text strings around". In a sufficiently
homogeneous environment, you can pre-compile all sources, and then send
the actual binaries. Example: OpenCL. Your program is distributed as
source, and compiled at run-time - but only once.

A compromise for speed could be to send only "stored procedures" around
as source, and the actual invocations (with a limited set of
instructions) as interpreted virtual machine code. Using an event-
driven paradigm (similar to HDLs like Verilog or VHDL) can reduce actual
invocation code considerably. E.g. you only store a procedure in your
node once, and then send data to the node - the node will trigger on the
data arrival, and execute the code bound to that event.

Maybe this could be what's different now: We now know that portability
matters, and we understand much better how to achieve it.

And finally, Merry Christmas, especially to the other people here - like
Terje - who celebrate according to the Jewish calendar (where the day
ends at dusk, and therefore Christmas Eve *is* already Christmas).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

From: Bill Todd on 24 Dec 2009 18:43

Andy "Krazy" Glew wrote:
> Robert Myers wrote:
>> I can't see anything about channels that you can't do with modern PC I/O.
>
> AFAIK there isn't much that IBM mainframe channels could do that modern
> PC I/O controllers cannot do. Even a decade ago I saw SCSI controllers
> that were more sophisticated than IBM channels.

I may be missing something glaringly obvious here, but my impression is
that the main thing that channels can do that PC I/O controllers can't
is accept programs that allow them to operate extensively on the data
they access. For example, one logical extension of this could be to
implement an entire database management system in the channel controller
- something which I'm reasonably sure most PC I/O controllers would have
difficulty doing (not that I'm necessarily holding this up as a good
idea...).

PC I/O controllers have gotten very good at the basic drudge work of
data access (even RAID), and ancillary DMA engines have added
capabilities like scatter/gather - all tasks which used to be done in
the host unless you had something like a channel controller to off-load
them. But AFAIK channels they ain't.

- bill

First | Prev | Next | Last
Pages: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Prev: PEEEEEEP
Next: Texture units as a general function