Nonlinear systems and nonlocal supercomputing [Computer Architecture]

Prev: Call for Papers Reminder (extended): The World Congress on Engineering WCE 2010
Next: Call to stop spamming here

From: nmm1 on 14 Mar 2010 14:44

In article <gnnu67-i192.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Robert Myers wrote:
>> I'm slowly doing my catchup homework on what the national wisdom is on
>> bisection bandwidth. Not too surprisingly, there a plenty of people
>> out there who know that it's already a big problem, and that it is
>> only going to get bigger, as there is no Moore's Law for bandwidth.
>
>Huh?
>
>Sure there is, it is driven by the same size shrinks as regular ram and
>cpu chips have enjoyed.

Of course. But what he has probably reforgotten is that bisection
bandwidth is a completely damn-fool measure in the first place.

There are at least half a dozen 'obvious' definitions of it, some
of which are easy to get to scale and others are not so easy.
In general, the easier they are to get to scale, the less they are
correlated with application performance. Before discussing its
technical merits, it must be defined.

When I was procuring supercomputers, I found that I could usually
find the bisection bandwidth in GB/sec, but rarely find out which
definition had been used to measure it! In some cases, I was sure
that it was the marketdroids' choice, which is almost completely
irrelevant to overall performance.

Regards,
Nick Maclaren.

From: Robert Myers on 14 Mar 2010 15:14

On Mar 14, 2:44 pm, n...(a)cam.ac.uk wrote:
> In article <gnnu67-i192....(a)ntp.tmsw.no>,
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>
> >Robert Myers wrote:
> >> I'm slowly doing my catchup homework on what the national wisdom is on
> >> bisection bandwidth. Not too surprisingly, there a plenty of people
> >> out there who know that it's already a big problem, and that it is
> >> only going to get bigger, as there is no Moore's Law for bandwidth.
>
> >Huh?
>
> >Sure there is, it is driven by the same size shrinks as regular ram and
> >cpu chips have enjoyed.
>
> Of course. But what he has probably reforgotten is that bisection
> bandwidth is a completely damn-fool measure in the first place.
>
> There are at least half a dozen 'obvious' definitions of it, some
> of which are easy to get to scale and others are not so easy.
> In general, the easier they are to get to scale, the less they are
> correlated with application performance. Before discussing its
> technical merits, it must be defined.
>
> When I was procuring supercomputers, I found that I could usually
> find the bisection bandwidth in GB/sec, but rarely find out which
> definition had been used to measure it! In some cases, I was sure
> that it was the marketdroids' choice, which is almost completely
> irrelevant to overall performance.
>
What? We have linpack supported as meaningful and bisection bandwidth
dismissed as a damn-fool measure?

I'm not worried about factors of two. No matter how you measure it,
the limiting bisection bandwidth per flop of Blue Gene and computers
with a similar topology is ****zero****, unless you increase the
bandwidth of each link as you add flops.

I spend enough of my time swimming against the tide. You are the only
one so far not to agree that bisection bandwidth is a useful proxy for
global FFT performance.

Robert.

From: nmm1 on 14 Mar 2010 15:31

In article <2f8c0fa9-3840-495d-aaa6-20c2636afdc7(a)33g2000yqj.googlegroups.com>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>On Mar 14, 2:44=A0pm, n...(a)cam.ac.uk wrote:
>> In article <gnnu67-i192....(a)ntp.tmsw.no>,
>> Terje Mathisen =A0<"terje.mathisen at tmsw.no"> wrote:
>>
>> >> I'm slowly doing my catchup homework on what the national wisdom is on
>> >> bisection bandwidth. =A0 Not too surprisingly, there a plenty of peopl=
>e
>> >> out there who know that it's already a big problem, and that it is
>> >> only going to get bigger, as there is no Moore's Law for bandwidth.
>>
>> >Huh?
>>
>> >Sure there is, it is driven by the same size shrinks as regular ram and
>> >cpu chips have enjoyed.
>>
>> Of course. =A0But what he has probably reforgotten is that bisection
>> bandwidth is a completely damn-fool measure in the first place.
>>
>What? We have linpack supported as meaningful and bisection bandwidth
>dismissed as a damn-fool measure?

Please don't be ridiculous. Linpack is meaningful, because it's well
defined, though it's not very useful. Bisection bandwidth is a damn
fool measure because it isn't well-defined.

>I'm not worried about factors of two. ...

And I'm not talking about factors of two. Try ones of 10-100. No,
I am not joking. I have told you before what the problem is.

The 'best' (marketdroid) bisection bandwidth comes from connecting
the nodes in pairs, optimally for performance and irrespective of
usefulness of the topology.

You can also measure the minimum over all such divisions, or the
average over all divisions, or the median, or .... And you can also
constrain the set of divisions that is considered in any way that
tickles your fancy.

It isn't rare for the ratio of the best to worst (and sometimes best
to average) to be in the range I mention above.

Regards,
Nick Maclaren.

From: Del Cecchi` on 14 Mar 2010 16:30

MitchAlsup wrote:
> On Mar 14, 12:15 pm, Robert Myers <rbmyers...(a)gmail.com> wrote:
>
>>On Mar 14, 5:43 am, Terje Mathisen <"terje.mathisen at tmsw.no">
>>
>>>I guess the real problem is that you'd like the total bandwidth to scale
>>>not just with the link frequencies but even faster so that it also keeps
>>>up by the increasing total number of ports/nodes in the system, without
>>>overloading the central mesh?
>>
>>At the chip (or maybe chip carrier) level, there are interesting
>>things you can do because of decreased feature sizes, as we have
>>recently discussed.
>
>
> One achieves maximal "routable" bandwidth at the "frame" scale . With
> todays current board technologies, this "frame" scale occurs around 1
> cubic meter.
>
> Consider a 1/2 meter sq motherboard with "several" CPU nodes with 16
> bidirectionial (about) byte wide ports running at 6-10 GTs. Now
> consider a back plane that simply couples this 1/2 sq meter
> motherboard to another 1/2 sq meter DRAM carring board also with 16
> bidirectional (almost) bite wide ports running at the same
> frequencies. Except, this time, the DRAM boards are perpendicular to
> the CPU boards. With this arrangement, we have 16 CPU containing
> motherboards fully connected to 16 DRAM containing motherboards and
> 256 (almost) byte wide connections running at 6-10 GTs. 1 cubic meter,
> about the actual size of a refrigerator. {Incidentally, this kind of
> system would have about 4TB/s of bandwidth to about 4TB of actual
> memory}
>
> Once you get larger than this, all of the wires actualy have to exist
> as wires (between "frames"), not just traces of coper on a board or
> through a connector, and one becomes wire bound connecting frames.
>
> Mitch

I went looking for the network characteristics of Blue Gene/P. In the
process I found this paper that, while not discussing BG/P does have
some interesting data.

http://www.cs.rochester.edu/u/sandhya/csc258/seminars/Patel_HPC.pdf

And this one on BG/P

http://workshops.alcf.anl.gov/gs10/files/2010/01/Morozov-BlueGeneP-Architecture.pdf

which appears to have data that should allow the calculation of various
types of bandwidths.

del

From: Del Cecchi on 14 Mar 2010 16:35

"Anton Ertl" <anton(a)mips.complang.tuwien.ac.at> wrote in message
news:2010Mar14.192558(a)mips.complang.tuwien.ac.at...
> MitchAlsup <MitchAlsup(a)aol.com> writes:
>>Consider a 1/2 meter sq motherboard with "several" CPU nodes with 16
>>bidirectionial (about) byte wide ports running at 6-10 GTs. Now
>>consider a back plane that simply couples this 1/2 sq meter
>>motherboard to another 1/2 sq meter DRAM carring board also with 16
>>bidirectional (almost) bite wide ports running at the same
>>frequencies. Except, this time, the DRAM boards are perpendicular to
>>the CPU boards. With this arrangement, we have 16 CPU containing
>>motherboards fully connected to 16 DRAM containing motherboards and
>>256 (almost) byte wide connections running at 6-10 GTs. 1 cubic
>>meter,
>>about the actual size of a refrigerator.
>
> I compute 1/2m x 1/2m x 1/2m = 1/8 m^3.
>
> Where have I misunderstood you?
>
> But that size is the size of a small freezer around here (typical
> width 55-60cm, depth about the same, and the height of the small
> ones
> is around the same, with the normal-sized ones at about 1m height).
>
> Hmm, couldn't you have DRAM boards on both sides of the mainboard
> (if
> you find a way to mount the mainboard in the middle and make it
> strong
> enough). Then you can have a computer like a normal-size fridge:-).
>
> - anton

Golly, how much memory do you want. If a DIMM holds 2GB, you could
put a whole bunch of memory on the board with the CPUs rather than
putting it way over there on the other side of the interconnect and
with a switch that has to arbitrate all those processors beating on
that memory.

Or is there something I missed here?

del

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Call for Papers Reminder (extended): The World Congress on Engineering WCE 2010
Next: Call to stop spamming here