Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: Del Cecchi on 23 Dec 2009 21:02

"Robert Myers" <rbmyersusa(a)gmail.com> wrote in message
news:ab08929b-50c1-4f4b-8708-f878caa1c641(a)s31g2000yqs.googlegroups.com...
On Dec 23, 12:21 pm, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> Bernd Paysan wrote:
> > Sending chunks of code around which are automatically executed by
> > the
> > receiver is called "active messages". I not only like the idea, a
> > friend of mine has done that successfully for decades (the
> > messages in
> > question were Forth source - it was a quite high level of active
> > messages). Doing that in the memory controller looks like a good
> > idea
> > for me, too, at least for that kind of code a memory controller
> > can
> > handle. The good thing about this is that you can collect all your
> > "orders", and send them in one go - this removes a lot of latency,
> > especially if your commands can include something like
> > compare&swap or
> > even a complete "insert into list/hash table" (that, unlike
> > compare&swap, won't fail).
>
> Why do a feel that this feels a lot like IBM mainframe channel
> programs?
> :-)

Could I persuade you to take time away from your first love
(programming your own computers, of course) to elaborate/pontificate a
bit? After forty years, I'm still waiting for someone to tell me
something interesting about mainframes. Well, other than that IBM bet
big and won big on them.

And CHANNELS. Well. That's clearly like the number 42.

Robert.

---------------------------------------------------

Tell you something interesting about mainframes? If you can't find
something you aren't trying.

A channel is a specialized processor. The way I/O works on a 360 or
follow on is that the main processor puts together a program that
tells this specialized processor what to do and then issues a start
I/O instruction that points the channel processor at the program.

Perhaps you need to take a look at one of the freely available
principles of operations manuals.

I used to think Beemers were insular but I have decided that
non-beemers are just as bad only inverse.

The above is an architecture description by a circuit designer so take
with grain of salt.

del

From: Anne & Lynn Wheeler on 23 Dec 2009 21:07

Robert Myers <rbmyersusa(a)gmail.com> writes:
> What bothers me is the "it's already been thought of"
>
> You worked with a different (and harsh) set of constraints.
>
> The contstraints are different now. Lots of resources free that once
> were expensive. Don't want just a walk down memory lane. The world
> is going to change, believe me. Anyone here interested in seeing
> how?
>
> What can we know from the hard lessons your learned. That's a good
> question. What's different now. That's a good question, too.
> Everything is the same except the time scale. That answer requires a
> detailed defense, and I think it's wrong. Sorry, Terje.

re:
http://www.garlic.com/~lynn/2009s.html#18 Larrabee delayed: anyone know what's happening?

concurrent with fiber channel work was SCI ... sci was going after
asyncronous packetized SCSI commands ... akin to fiber channel and
serial-copper ... but also went after asyncronous packetized memory bus.

the SCI asyncronous packetized memory bus was used by convex for
exemplar, sequent for numa-q ... DG near its end did something akin to
numa-q ... SGI also did flavor.

part of the current issue is that oldtime real storage & paging latency
to disk (in terms of count of processor cycles) ... is compareable to
current cache sizes and cache miss latency to main memory.

i had started in mid-70s saying that major system bottleneck was
shifting from disk/file i/o to memory. in the early 90s ... the
executives in the disk division took exception with some of my
statements that relative system disk thruput had declined by an order of
magnitude over a period of 15 years (cpu & storage resources increased
by factor of 50, disk thruput increased by factor of 3-5) ... they
assigned the division performance group to refute my statements
.... after a couple weeks they came back and effectively said that I had
understated the situation.

part of this was from some work i had done as undergraduate in the 60s
on dynamic adaptive resource management ... and "scheduling to the
bottleneck" (it was frequently referred to as "fair share" scheduling
.... since the default policy was "fair share") ... dynamically
attempting to adjust resource management to system thruput bottleneck
.... required being able to dynamically attempting to recognize where the
bottlenecks were.

misc. past posts mentioning dynamic adaptive resource managerment (and
"fair share" scheduling)
http://www.garlic.com/~lynn/subtopic.html#fairshare

when i was doing hsdt ... some of the links were satellite ... and I had
to redo had the satellite communication operated. a couple years later
there was presentation at IETF meeting with presentation that mentioned
cross-country fiber gigabit bandwidth*latency product ... it turned out
the product was about the same was the product I had dealt with for
high-speed (geo-sync) satellite (latency was much larger while the
bandwidth was somewhat smaller ... but the resulting product was
similar).

there are still not a whole lot of applications that actually do
coast-to-coast full(-duplex) gigabit operation (full concurrent gigabit
in both directions).

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970

From: Del Cecchi on 23 Dec 2009 21:12

"Robert Myers" <rbmyersusa(a)gmail.com> wrote in message
news:22d9b7d3-1570-4b63-a32b-2addf533ef8a(a)v13g2000yqk.googlegroups.com...
On Dec 23, 7:57 pm, Anne & Lynn Wheeler <l...(a)garlic.com> wrote:
> Terje Mathisen <"terje.mathisen at tmsw.no"> writes:
>
> > Why do a feel that this feels a lot like IBM mainframe channel
> > programs?
> > :-)
>
> downside was that mainframe channel programs were half-duplex
> end-to-end
> serialization. there were all sorts of heat & churn in fiber-channel
> standardization with the efforts to overlay mainframe channel
> program
> (half-duplex, end-to-end serialization) paradigm on underlying
> full-duplex asynchronous operation.
>
> from the days of scarce, very expensive electronic storage
> ... especially disk channel programs ... used "self-modifying"
> operation
> ... i.e. read operation would fetch the argument used by the
> following
> channel command (both specifying the same real address). couple
> round
> trips of this end-to-end serialization potentially happening over
> 400'
> channel cable within small part of disk rotation.
>
> trying to get a HYPERChannel "remote device adapter" (simulated
> mainframe channel) working at extended distances with disk
> controller &
> drives ... took a lot of slight of hand. a copy of the
> completedmainframe channel program was created and downloaded into
> the
> memory of the remote device adapter .... to minimize the
> command-to-command latency. the problem was that some of the disk
> command arguments had very tight latencies ... and so those
> arguments
> had to be recognized and also downloaded into the remote device
> adapter
> memory (and the related commands redone to fetch/store to the local
> adapter memory rather than the remote mainframe memory). this
> process
> was never extended to be able to handle the "self-modifying"
> sequences.
>
> on the other hand ... there was a serial-copper disk project that
> effectively packetized SCSI commands ... sent them down outgoing
> link ... and allowed asynchronous return on the incoming link
> ... eliminating loads of the scsi latency. we tried to get this
> morphed
> into interoperating with fiber-channel standard ... but it morphed
> into
> SSA instead.
>
> --
> 40+yrs virtualization experience (since Jan68), online at home since
> Mar1970

What bothers me is the "it's already been thought of"

You worked with a different (and harsh) set of constraints.

The contstraints are different now. Lots of resources free that once
were expensive. Don't want just a walk down memory lane. The world
is going to change, believe me. Anyone here interested in seeing
how?

What can we know from the hard lessons your learned. That's a good
question. What's different now. That's a good question, too.
Everything is the same except the time scale. That answer requires a
detailed defense, and I think it's wrong. Sorry, Terje.

Robert.
------------------------------------------------------
OK how about work queues in InfiniBand?

Most everything anyone thinks of has been thought of before. Not
everything but most things. People have been making things with I/O
processors for years.

Data Flow, Active Messages etc etc.

What is it you think is so new? I read your previous post but other
than the mystery data going to mystery locations didn't quite
understand what you were driving at.

As for "what is interesting about mainframes", basically many things
that the pc and microprocessor folks are coming up with was done first
or looked at first by the mainframe folks, since they could afford it
first.

del

From: "Andy "Krazy" Glew" on 24 Dec 2009 00:17

Robert Myers wrote:
> If you know the future (or the dataflow graph ahead of time), you can
> assemble packets of whatever. Could be any piece of the problem:
> code, data, meta-data, meta-code,... whatever, and send it off to some
> location where it knows that the other pieces that are needed for that
> piece of the problem will also arrive, pushed from who-cares-where.
> When enough pieces are in hand to act on, the receiving location acts
> on whatever pieces it can. When any piece of anything that can be
> used elsewhere is finished, it is sent on to wherever. The only
> requirement is that there is some agent like a DNS that can tell
> pieces with particular characteristics the arbitrarily chosen
> processors (or collections of processors) to which they should migrate
> for further use, and that receiving agents are not required to do
> anything but wait until they have enough information to act on, and
> the packets themselves will inform the receiving agent what else is
> needed for further action (but not where it can be found). Many
> problems seem to disappear as if by magic: the need for instruction
> and data prefetch (two separate prediction processes), latency issues,
> need for cache, and the need to invent elaborate constraints on what
> kinds of packets can be passed around, as the structure (and, in
> effect, the programming language) can be completely ad hoc.
> Concurrency doesn't even seem to be an issue. It's a bit like an
> asynchronous processor, and it seems implementable in any circumstance
> where a data-push model can be implemented.

I just BSed on something very similar to this. We are both talking
about building a large dataflow system. (Indeed, all of my career I
have been building dataflow systems: OOO CPUs are just dataflow on a
micro scale.)

However, the problems don't all magically disappear. In the limit, such
dataflow may be better. In practice, the relative overhead of
transferring code around versus transferring data, and of determining
when data is ready, matters. In practice, the amount of data fetched at
an individual node, relative to the amount of computation, matters. In
practice, thew routing table management and lookup, matters. This is why
I have tried to flesh things out a bit more, although still at a very
high level: big fat CPUs, processing elements for simple active
messages out of buffered inputs, scatter/gather operations, and
translation in the network.

And these are just the implementation details. The real problem is that
this is semantically exposed. We have created a dataflow system. For
memory; albeit PGAS or SHMEM like memory. And dataflow, no matter how
you gloss over it, does not really like stateful memory. Either we hide
the fact that there really is memory back there (Haskell monads,
anyone?), or there is another level of synchronization relating to when
it is okay to overwrite a memory location. I vote for the latter.

From: "Andy "Krazy" Glew" on 24 Dec 2009 00:19

Bernd Paysan wrote:
> Robert Myers wrote:
> It has been tried and it works - you can find a number of papers about
> active message passing from various universities. However, it seems to
> be that most people try to implement some standard protocols like MPI on
> top of it, so the benefits might be smaller than expected. And as Andy
> already observed: Most people seem to be more comfortable with
> sequential programming. Using such an active message system makes the
> parallel programming quite explicit - you model a data flow graph, you
> create packets with code and data, and so on.

Another of the SC09 buzzwords was parallel scripting languages, and
other infracstructure to do exactly this: connect chunks of sequential
code up into dataflow graphs.

Perhaps it is just at a certaim level of abstraction that we will do
this explicitly. Let the compiler autoparallelize the small stuff.

First | Prev | Next | Last
Pages: 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Prev: PEEEEEEP
Next: Texture units as a general function