Nonlinear systems and nonlocal supercomputing [Computer Architecture]

Prev: Call for Papers Reminder (extended): The World Congress on Engineering WCE 2010
Next: Call to stop spamming here

From: MitchAlsup on 18 Mar 2010 13:10

On Mar 18, 8:17 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> They quote 20 cycles/router, which to me indicates that they might have
> the wrong model:

Thanks for the link.

It appears to me that those routers collect an entire message before
shipping it forward. This is a pure performance loss (pure gain in
latency) compared to organizing the message where the first beat
contrains the routing information.

Mitch

From: "Andy "Krazy" Glew" on 20 Mar 2010 02:36

MitchAlsup wrote:
> On Mar 18, 8:17 am, Terje Mathisen <"terje.mathisen at tmsw.no">
> wrote:
>> They quote 20 cycles/router, which to me indicates that they might have
>> the wrong model:
>
> Thanks for the link.
>
> It appears to me that those routers collect an entire message before
> shipping it forward. This is a pure performance loss (pure gain in
> latency) compared to organizing the message where the first beat
> contrains the routing information.
>
> Mitch

This is what I keep harping about. If you store the entire packet,or probably several for blockage, then you have enough
memory to be a real computer.The temptation to put a processor there becomes overwhelming. But this leads to the
locality that RM is concerned about.

Wormhole routing, send flits on asap, keeps the routers lightweight. Is probably more suited to RM's mindset.

I regard rings as in LRB as a bit of a cop-out. The whole packet received in a single cycle. Unlikely to scale.

From: Bernd Paysan on 20 Mar 2010 17:46

MitchAlsup wrote:

> On Mar 18, 8:17 am, Terje Mathisen <"terje.mathisen at tmsw.no">
> wrote:
>> They quote 20 cycles/router, which to me indicates that they might
>> have the wrong model:
>
> Thanks for the link.
>
> It appears to me that those routers collect an entire message before
> shipping it forward. This is a pure performance loss (pure gain in
> latency) compared to organizing the message where the first beat
> contrains the routing information.

I think the obvious thing was mentioned in the paper: "make the routers
as simple as possible". This means that the routing information
contains a physical route (a sequence of turns - the most simple router
is a butterfly router with two inputs, two outputs, and one bit of
routing information), and the router just passes on packets as it knows
the next hop from the first beat of the message. On collisions, it has
only few options:

1. Route the colliding packet into a buffer, and transmit delayed
2. Drop it and ask for a retransmit
3. Drop it, and don't say anything, relying on a TCP-like control flow.

I'd probably go with option one, and make sure that the buffers are big
enough (this probably involves sending some "jam" messages back to stall
the previous hop, so that the traffic jam propagates through the chip,
and the buffer size per node can be small).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

From: Del Cecchi on 20 Mar 2010 21:08

"Bernd Paysan" <bernd.paysan(a)gmx.de> wrote in message
news:uasf77-m7b.ln1(a)vimes.paysan.nom...
> MitchAlsup wrote:
>
>> On Mar 18, 8:17 am, Terje Mathisen <"terje.mathisen at tmsw.no">
>> wrote:
>>> They quote 20 cycles/router, which to me indicates that they might
>>> have the wrong model:
>>
>> Thanks for the link.
>>
>> It appears to me that those routers collect an entire message
>> before
>> shipping it forward. This is a pure performance loss (pure gain in
>> latency) compared to organizing the message where the first beat
>> contrains the routing information.
>
> I think the obvious thing was mentioned in the paper: "make the
> routers
> as simple as possible". This means that the routing information
> contains a physical route (a sequence of turns - the most simple
> router
> is a butterfly router with two inputs, two outputs, and one bit of
> routing information), and the router just passes on packets as it
> knows
> the next hop from the first beat of the message. On collisions, it
> has
> only few options:
>
> 1. Route the colliding packet into a buffer, and transmit delayed
> 2. Drop it and ask for a retransmit
> 3. Drop it, and don't say anything, relying on a TCP-like control
> flow.
>
> I'd probably go with option one, and make sure that the buffers are
> big
> enough (this probably involves sending some "jam" messages back to
> stall
> the previous hop, so that the traffic jam propagates through the
> chip,
> and the buffer size per node can be small).
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"
> http://www.jwdt.com/~paysan/

Source routing and flow control. This stuff was considered in SCI and
InfiniBand. Perhaps even earlier. What do you do with source routing
if a node dies?

God, I am starting to sound like Lynn. :-)

From: Robert Myers on 20 Mar 2010 21:15

On Mar 20, 9:08 pm, "Del Cecchi" <delcec...(a)gmail.com> wrote:
> "Bernd Paysan" <bernd.pay...(a)gmx.de> wrote in message
>
> news:uasf77-m7b.ln1(a)vimes.paysan.nom...
>
>
>
>
>
> > MitchAlsup wrote:
>
> >> On Mar 18, 8:17 am, Terje Mathisen <"terje.mathisen at tmsw.no">
> >> wrote:
> >>> They quote 20 cycles/router, which to me indicates that they might
> >>> have the wrong model:
>
> >> Thanks for the link.
>
> >> It appears to me that those routers collect an entire message
> >> before
> >> shipping it forward. This is a pure performance loss (pure gain in
> >> latency) compared to organizing the message where the first beat
> >> contrains the routing information.
>
> > I think the obvious thing was mentioned in the paper: "make the
> > routers
> > as simple as possible". This means that the routing information
> > contains a physical route (a sequence of turns - the most simple
> > router
> > is a butterfly router with two inputs, two outputs, and one bit of
> > routing information), and the router just passes on packets as it
> > knows
> > the next hop from the first beat of the message. On collisions, it
> > has
> > only few options:
>
> > 1. Route the colliding packet into a buffer, and transmit delayed
> > 2. Drop it and ask for a retransmit
> > 3. Drop it, and don't say anything, relying on a TCP-like control
> > flow.
>
> > I'd probably go with option one, and make sure that the buffers are
> > big
> > enough (this probably involves sending some "jam" messages back to
> > stall
> > the previous hop, so that the traffic jam propagates through the
> > chip,
> > and the buffer size per node can be small).
>
> > --
> > Bernd Paysan
> > "If you want it done right, you have to do it yourself"
> >http://www.jwdt.com/~paysan/
>
> Source routing and flow control. This stuff was considered in SCI and
> InfiniBand. Perhaps even earlier. What do you do with source routing
> if a node dies?
>
> God, I am starting to sound like Lynn. :-)

No problem, Del. Where, in the old wisdom received to Moses on Mt.
Sinai, was there the notion that you could add flops in a way
completely unrelated to bytes/second?

I guess old wisdom only counts when it fits into IBM's strategic
marketing.

Robert.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Call for Papers Reminder (extended): The World Congress on Engineering WCE 2010
Next: Call to stop spamming here