From: Mayan Moudgill on
Robert Myers wrote:

> On Jan 3, 3:16 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:
>
>
>>Robert Myers wrote:
>>
>>>I assume that most who buy installations like Blue Gene would have RAS
>>>requirements that would be hard or impossible to meet with a Beowulf
>>>cluster. In the end, it's probably RAS that rules.
>>
>>What kind of recovery are you looking for? Against node failures?
>>
>>I'd suggest that distributed checkpointing with restart on failures
>>could be an adequate model for small-to-mid-size clusters.
>
>
> It's the entire management problem: backup, recovery from failure,
> detecting and isolating failure, minimizing downtime, and minimizing
> time and focus required for maintenance.
>
> There are, indeed, plausible and relatively off-the-shelf solutions
> for small-to-mid-size clusters, but not for a cluster the size of the
> Blue Gene installations I know about.
>

I thought the whole point was that you wanted to do physics that would
replicate BlueGene's results at a price you (or a physics department)
could afford? If you're saying that the only thing that can do the job
is BlueGene-style petaflop computers, then there are no alternatives -
you're going to have to live with the results coming out of BlueGene.
You can definitely complain about it, and I'm sure that it makes you
feel better, but all that complaining not really productive, is it?

Also, asking for new languages and architecture features is moot, since
its clear that no new language or tweak to existing architectures is
going to allow you to approach the computation level afforded by
BlueGene-class machines.

From: Robert Myers on
On Jan 3, 4:40 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:

> I thought the whole point was that you wanted to do physics that would
> replicate BlueGene's results at a price you (or a physics department)
> could afford? If you're saying that the only thing that can do the job
> is BlueGene-style petaflop computers, then there are no alternatives -
> you're going to have to live with the results coming out of BlueGene.
> You can definitely complain about it, and I'm sure that it makes you
> feel better, but all that complaining not really productive, is it?

What makes me feel better is that, since I started in on these topics
(all the big computers are behind barbed wire, linpack is meaningless
as a measure of real-world performance, and people need to stop
talking as if things that don't really scale do scale), the landscape
has brightened considerably.

NCSA (*not* the freaking DoE) is building computers to compete in
performance with the national labs' pet toys, people are starting to
emphasize that counting peak flops and linpack flops doesn't offer a
very good measure of usefulness, and people are starting to emphasize
bandwidth (to memory, network fabric, and to the Internet). Here's a
really encouraging document:

http://www.apan.net/meetings/kaohsiung2009/presentations/opening/kramer.pdf

It's interesting, and discouraging, that the document mentions
bisection bandwidth and its importance to the FFT, but doesn't offer a
number.

Do these things just magically happen? If my "complaining" has had no
effect, I'm happy to see that the direction things are moving is
exactly the direction I said they should.

And, whadya know, Blue Waters is being built by IBM.

> Also, asking for new languages and architecture features is moot, since
> its clear that no new language or tweak to existing architectures is
> going to allow you to approach the computation level afforded by
> BlueGene-class machines.

It isn't clear to me that that statement is true, since most HPC
applications (with the linpack benchmark being a glaring exception)
tend to be memory bound.

In any case, my goal in looking for alternative languages and
computing models was never to get blood out of a turnip, but rather to
make the world safe for concurrent programming, something you have
said in the past is essentially impossible.

Robert.

From: James Van Buskirk on
"Mayan Moudgill" <mayan(a)bestweb.net> wrote in message
news:bu6dnVWcH_qenNzWnZ2dnUVZ_hSdnZ2d(a)bestweb.net...

> I think we may be saying the same thing. The initial phase of an FFT
> consists of a transpose (based on the so-called bit-reversal transpose).

Not at all the case. Bit-reversal is not a true transpose. I think
Nick is talking about

http://www.jjj.de/fxt/fxtbook.pdf

Section 19.10, the matrix Fourier algorithm. This stuff really does
help reduce memory traffic.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


From: Del Cecchi on

<nmm1(a)cam.ac.uk> wrote in message
news:hhqtpe$3gf$1(a)smaug.linux.pwf.cam.ac.uk...
> In article <u9mdnb0HtsTSaN3WnZ2dnUVZ_gKdnZ2d(a)bestweb.net>,
> Mayan Moudgill <mayan(a)bestweb.net> wrote:
>>
>>(I can't speak to 3D FFTs, so I'll restrict myself to Cooley-Tukey
>>radix-2 1D FFT which I do have experience with, and hopefully the
>>analysis will carry over)
>
> In general, it doesn't, but it more-or-less does for the one you are
> doing - which is NOT the way to do multi-dimensional FFTs! It is
> almost always much faster to transpose, so you are doing vector
> operations in the FFT at each stage.
>
>>Communication can be overlapped with computation - ...
>
> I am afraid not, when you are using 'commodity clusters'. Firstly,
> even with the current offloading of the TCP/IP stack, there is still
> a lot of CPU processing needed to manage the transfer, and obviously
> a CPU can't be doing an FFT while doing that. Secondly, you have
> omitted the cost of the scatter/gather, which again has to be done
> by the CPU.
>
>>Assume a 16-way machine lgM=4, P=1e-9 ns, B=1e9 B/s (assumes dual
>>10GbE,
>>fairly tuned stacks).
>>For lg2N = 30 (N ~ 1G-points), we would end up with Ts = 32.2sec and
>>Tp
>>= 6.0sec, of which 4.3 ns was communication: speedup=5.33.
>>For a 64-way, assuming the same numbers, we end up with Tp = 2.0sec,
>>of
>>which 1.6s is communication: speedup=16.
>>For 256 way, we would end up with Tp=0.6sec, of which 0.5 sec is
>>communication: speedup=51
>
> Hmm. That's not the experience of the people I know who have tried
> it. I haven't checked your calculations, so I am not saying whether
> or not I agree with them.
>
>>Anyway, Nick, you're right that things like FFTs are network
>>bandwidth
>>limited; however, it is still possible to get fairly good speedups.
>
> For multi-dimensional FFTs, certainly. I remain doubtful that you
> would get results anywhere near that good for single-dimensional
> ones. I certainly know that they are notorious SMP-system killers,
> and the speedups obtained by vendors' libraries are not very good.
>
>
> Regards,
> Nick Maclaren.

You could use the provided hardware scatter-gather if you were astute
enough to use InfiniBand interconnect. :-)

del

you can lead a horse to water but you can't make him give up ethernet.


From: Del Cecchi` on
Robert Myers wrote:
> On Jan 3, 3:16 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:
>
>
>>Robert Myers wrote:
>>
>>>I assume that most who buy installations like Blue Gene would have RAS
>>>requirements that would be hard or impossible to meet with a Beowulf
>>>cluster. In the end, it's probably RAS that rules.
>>
>>What kind of recovery are you looking for? Against node failures?
>>
>>I'd suggest that distributed checkpointing with restart on failures
>>could be an adequate model for small-to-mid-size clusters.
>
>
> It's the entire management problem: backup, recovery from failure,
> detecting and isolating failure, minimizing downtime, and minimizing
> time and focus required for maintenance.
>
> There are, indeed, plausible and relatively off-the-shelf solutions
> for small-to-mid-size clusters, but not for a cluster the size of the
> Blue Gene installations I know about.
>
> Robert.
If you go to Google and type "blue gene ras" you get all sorts of
interesting stuff.

You might enjoy a little light reading at the IBM cluster information
center
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.csm16010.install.doc/am7il_bluegeneapx.html

and the Cluster Systems Management Library.


Maybe all that attention to making stuff work is part of what justifies
the high price?

(thunderbird comes through)