From: Bernd Paysan on
Andy "Krazy" Glew wrote:
> You can have fixed prioritization rules: E.g. local traffic (b) wins,
> requiring that the downward traffic coming from the above links be
> either buffered or resent. Or the opposite, one of the downward
> traffic wins.

It's fairly easy to decide if you should buffer or not - it just depends
on latency again. I think you can safely assume the data is buffered at
the origin, i.e. the data there already is in memory, and not
constructed on the fly - so this resent "buffer" is for free. If the
time for a "resent" command arrives at the source quick enough so that
the resent data will arrive at the congested node just in time when the
previous, prioritized packed is sent out, then we are fine.

I.e. the buffer memory really is evaluated by the "benefit of locality",
and the benefit is there when the retransmit is better handled locally.
So when we connect a large multi-core chip with such a network (order of
magnitude is cm), it will very likely work with lots of buffer-free
nodes, while when we have a larger off-chip network (order of magnitude
is km), we better buffer.

I'm not sure whether I would use a fat-tree topology if I build an on-
chip network out of 2x2 router primitives. The fat-tree has excellent
bisection bandwidth, and it offers improved local bandwidth, but it has
a non-physical topology (i.e. you get longer and longer wires to the
next level up - and this also means the latency of the nodes is not
equal), and it doesn't scale well, because for a larger network, you
need more *and* longer root connections.

I'd first try with a 2D mesh, i.e. use something like an 8x8 router
connecting four local nodes plus four bidirectional links in each
direction. The bisection bandwidth should be still reasonable, as long
as you freely choose between the different possible routes (for a 2D
system bandwidth from one node to another over distance decreases with
the square root, for a 3D system, with the cube root). If bandwidth is
a problem, I'd rather increase it overall, and maybe change the balance
of the 8x8 router node to favor global connections (i.e. the local
connections will use smaller busses, and maybe also the priority rule
"local before global" you suggest will be inverted - after all, the
local retransmit is the cheapest you can get).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
From: Robert Myers on
On Mar 21, 12:07 pm, n...(a)cam.ac.uk wrote:

>
> Yup.  I looked at it, and my objections aren't because it isn't as
> efficient as it should be, but because it it paid too much attention
> to performance and not enough to usability.  And MOST of that is a
> remark about the software, though not all.
>
I have no idea about how the world might look to someone in your
position (or Eugene's, for that matter).

I only know how the world looks to me and to people who say things
that I can understand.

Many others, completely uninfluenced by me, have comment on the
relative unusability of these massively parallel machines. Just as
with bandwidth, I don't know whether that's fundamental or because the
decisions as to what is on offer are made by people with EE/CS degrees
and not by lusers. Maybe we just have to live with computers that
will produce lots of job openings for people with CS degrees.

My beef with the LLNL purchase is the repeated claim that "We just
bought the biggest, baddest machine on earth." I'm tired of those
claims. For the kinds of problems I care about, it may be a big
machine, but it sure is bad.

If a company, university, or research group wants to buy a machine and
put up with the clumsiness, that's their decision to make. If you['re
claiming to be the one pushing the frontiers of computing, how much
can actually be done with the machine on the hardest problems around
should count for something.

Stop piling on, and I'll stop repeating myself. You ever actually had
anything to do with these bomb lab dudes, BTW?

Robert.
From: Chris Gray on
Bernd Paysan <bernd.paysan(a)gmx.de> writes:

> Well, source routing isn't necessary static. You have to ask someone
> for the route, and this "someone" (the distributed routing table) will
> change when a path or node dies. Usually, the "died" message would be
> part of the flow control ("Hey, I can't send your packets any longer, my
> next hop died"), and the reaction would be to look for an alternative
> path or even an alternative destination.

> It is stupid to make a highly critical part (the router must be as fast
> and as simple as possible) more intelligent to handle rare exceptions
> when you can handle that somewhere else, even though it might be handled
> slower. It's an exception, it doesn't happen all the time.

It's been far too long for me to remember details of stuff I wasn't
directly involved in, but here's what I remember about the Myrias SPS-2 and
SPS-3:

Each message had its destination node address on the front. A receiving
node looked that up in a fast table (ours were small enough that it was
direct indexing I think, rather than a CAM). The answer it got was two
port numbers. Having two gave it the ability to handle a dead link
directly, or to deal with a link that is already busy. The code that
computed the overall routing tables had to ensure that no loops resulted.

As long as you have enough FIFO buffer that you can make the port
decision in time, you don't have to fail the message. If a node could
not forward a message, an error was signalled back to the sending node.

Our messages were long enough that having the sender wait for an ACK
(end-to-end checksum OK) or NAK wasn't an issue. Obviously this wouldn't
work for long-distance networks, or very small messages.

Having the tables at each node adds the flexibility of making that port
choice on the fly, rather than being forced to fail as soon as something
goes wrong.

--
Chris Gray cg(a)GraySage.COM
From: Nicholas King on
On 03/16/2010 11:59 AM, MitchAlsup wrote:
> On Mar 15, 1:28 pm, an...(a)mips.complang.tuwien.ac.at (Anton Ertl)
> wrote:
>> MitchAlsup<MitchAl...(a)aol.com> writes:
>
>> But reading your description again, maybe you mean:
>>
>> | | | | | | | I/O cables (disks)
>> ############# DRAM boards
>> ############# (we see only the one in front)
>> ---------------------- backplane
>> | | | | | | | | | | | | | CPU boards
>> | | | | | | | | | | | | | (edge view, we see all of them in this view)
>> \ \ \ \ \ Other I/O cabling
>>
>> Did I get that right? That would be 1/2m x 1/2m x 1m.
>
> About as good as ASCII art can do. No motherboard and wrap the
> ensemble with a metal skeleton to hold the boards, fans, and route
> power and provide for good looks.
>
> Mitch
I was thinking about this today. How about making the perpendicular CPU
and RAM boards combs so they have more connections between them. One
could have the spine of each comb fat and use that for some local
routing etc.

Cheers,
Nicholas King
From: Kai Harrekilde-Petersen on
MitchAlsup <MitchAlsup(a)aol.com> writes:

> On Mar 15, 1:28�pm, an...(a)mips.complang.tuwien.ac.at (Anton Ertl)
> wrote:
>> MitchAlsup <MitchAl...(a)aol.com> writes:
>
>> �But reading your description again, maybe you mean:
>>
>> | | | | | | | I/O cables (disks)
>> ############# DRAM boards
>> ############# (we see only the one in front)
>> ---------------------- backplane
>> | | | | | | | | | | | | | CPU boards
>> | | | | | | | | | | | | | (edge view, we see all of them in this view)
>> \ \ \ \ \ Other I/O cabling
>>
>> Did I get that right? �That would be 1/2m x 1/2m x 1m.
>
> About as good as ASCII art can do. No motherboard and wrap the
> ensemble with a metal skeleton to hold the boards, fans, and route
> power and provide for good looks.

Sounds like several of the arrangements that I saw proposed for
highly-modular switches around 1999-2002. I think Intel either built
or at least experimented with one or more of such designs before they
folded their network switching developments.

One of the drawbacks of this scheme is that it requires you to have
both a horizontal and a vertical airflow - not impossible, just an
other challenge to sort out.

Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>