Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?
From: Robert Myers on 7 Mar 2010 22:45 On Mar 7, 7:32 pm, Del Cecchi <delcecchinospamoftheno... (a)gmail.com>wrote: > Andy "Krazy" Glew wrote: > > Del Cecchi wrote: > >> For a few months a year, Minnesota isn't that much different from > >> antartica. :-) > > > Stop teasing me like that, Del. > > > I'd love to live in Minnesota. I'm Canadian, from Montreal, and my wife > > is from Wisconsin. She goes dogsledding in Minnesota every February. > > > But AFAIK there are no jobs for somebody like me in Minnesota. > > > If you are a computer architect, it's Intel in Oregon, Silicon Valley, > > Austin. Where else? > > Perhaps IBM in Rochester MN or maybe even Mayo Clinic, Rochester. The > clinic does a lot of special stuff for medical equipment and Dr Barry > Gilbert had a group that did high speed stuff for Darpa. > > Seehttp://mayoresearch.mayo.edu/staff/gilbert_bk.cfm > > IBM Rochester does the (somewhat reviled) Blue Gene and is part of the > team doing the Power Processors. I am His Highness' dog at Kew; Pray tell me, sir, whose dog are you? Robert.
From: Andrew Reilly on 7 Mar 2010 23:15 On Sun, 07 Mar 2010 17:51:55 -0800, Robert Myers wrote: > I've explained my objections succinctly. Of 64000 or more processors, > it can use only 512 effectively in doing an FFT. That the bisection > bandwidth is naturally meaured in *milli*bytes per flop is something > that I have yet to see in an IBM publication. Is that really a serious limitation? (512 cores doing a parallel FFT) I know I'm not familiar with the problem space, but that already seems way out at the limits of precision-limited usefulness. How large (points) is the FFT that saturates a 512-processor group on a BG? Are there *any* other super computers that allow you to compute larger FFTs at an appreciable fraction of their peak floating point throughput?[*] Clearly it is a limitation, otherwise you wouldn't be complaining about it. Still seems that it might be useful to be able to be doing 128 of those mega-FFTs at the same time, if doing lots of them is what you cared about. I reckon I'd be more worried about the precision of the results than the speed of computing them, though. [*] Are any of the popular supercomputer benchmarks capacity-shaped in this way, rather than rate-for-fixed-N-shaped? How many problems are capacity limited rather than rate limited? Cheers, -- Andrew
From: Ken Hagan on 8 Mar 2010 04:52 On Mon, 08 Mar 2010 04:27:44 -0000, Robert Myers <rbmyersusa (a)gmail.com> wrote: > Q: Will our thermonuclear weapons work, despite no actual testing for > decades? I thought that was rather the point (*). Reasonable people can understand that total nuclear disarmament is not going to happen because no side wishes to be seen to have disarmed first. By the cunning use of a test ban treaty, we will arrive at the end of the present century with all sides completely confident that no-one else has working weapons, but no-one ever had to pass through a period where one side knew that they'd given up theirs but another side still had some. (* Well, perhaps this isn't the precise mechanism everyone had in mind, but the test ban treaties presumably *are* intended to let all sides just relax a little bit, in the hope that the world can stay intact long enough to become a safer place. And if blowing up imaginary bombs keeps the hawks out of mischief then it is probably money well spent.)
From: Robert Myers on 8 Mar 2010 15:15 On Mar 7, 11:15 pm, Andrew Reilly <areilly... (a)bigpond.net.au> wrote:> On Sun, 07 Mar 2010 17:51:55 -0800, Robert Myers wrote: > > I've explained my objections succinctly. Of 64000 or more processors, > > it can use only 512 effectively in doing an FFT. That the bisection > > bandwidth is naturally meaured in *milli*bytes per flop is something > > that I have yet to see in an IBM publication. > > Is that really a serious limitation? (512 cores doing a parallel FFT) I > know I'm not familiar with the problem space, but that already seems way > out at the limits of precision-limited usefulness. How large (points) is > the FFT that saturates a 512-processor group on a BG? Are there *any* > other super computers that allow you to compute larger FFTs at an > appreciable fraction of their peak floating point throughput?[*] > > Clearly it is a limitation, otherwise you wouldn't be complaining about > it. Still seems that it might be useful to be able to be doing 128 of > those mega-FFTs at the same time, if doing lots of them is what you cared > about. > > I reckon I'd be more worried about the precision of the results than the > speed of computing them, though. > These would typically be volumetric FFT's, and I believe that's the way that IBM got 512 processors working together--doing volumetric FFT's. With 1 gigabyte memory per node, that's a cube with 4000 points on each edge, using 64-bit words. For making pretty pictures, that's plenty. For understanding the physics of turbulent flows at real world Reynolds numbers, it isn't very interesting. I don't believe that precision is an issue. The foundational mathematics of the Navier-Stokes equations are unknown, even things like smoothness properties. Using spectral methods from macroscopic scales all the way down to the dissipation scale is the only way I know to do fluid mechanics without tangling up unknown and unknowable numerical issues with the physics and mathematics you'd like to explore. It's fine if not many people are interested in that foundational problem, but if computers continue to be driven by the linpack mentality, no one will be able to make progress on it without some fundamental mathematical breakthrough. That is far from being the only problem governed by strongly nonlinear equations that require a large range of scales (and thus, many grid points), but it is the one with which I am the most familiar. You can't routinely run the huge calculations you'd like to be able to run, but you can at least test the accuracy of the turbulence models that are used in practice. Robert.
From: Larry on 8 Mar 2010 15:16
On Mar 7, 11:15 pm, Andrew Reilly <areilly... (a)bigpond.net.au> wrote:> On Sun, 07 Mar 2010 17:51:55 -0800, Robert Myers wrote: > > I've explained my objections succinctly. Of 64000 or more processors, > > it can use only 512 effectively in doing an FFT. That the bisection > > bandwidth is naturally meaured in *milli*bytes per flop is something > > that I have yet to see in an IBM publication. > > Is that really a serious limitation? (512 cores doing a parallel FFT) I > know I'm not familiar with the problem space, but that already seems way > out at the limits of precision-limited usefulness. How large (points) is > the FFT that saturates a 512-processor group on a BG? Are there *any* > other super computers that allow you to compute larger FFTs at an > appreciable fraction of their peak floating point throughput?[*] > > Clearly it is a limitation, otherwise you wouldn't be complaining about > it. Still seems that it might be useful to be able to be doing 128 of > those mega-FFTs at the same time, if doing lots of them is what you cared > about. > > I reckon I'd be more worried about the precision of the results than the > speed of computing them, though. > > [*] Are any of the popular supercomputer benchmarks capacity-shaped in > this way, rather than rate-for-fixed-N-shaped? How many problems are > capacity limited rather than rate limited? > > Cheers, > > -- > Andrew Actually I think R. Myers' facts are wrong here. I downloaded the HPCC Challenge Results data from the UTK website, and added a column comparing computing the ration of global FFT performance versus global HPL performance. Then I filtered away all systems not achieving at least 100 GFlops FFT performance. For systems achieving over a teraflop of FFT performance, BG/L with 32K cores is beaten <only> by the SX-9 in this figure of merit. For systems achieving over 100 GF, it is beaten by the SX-9, by the best Nahelem/Infiniband systems, by the Cray XT3, and by the largest SiCortex machine. BG/L is hard to program, and weird in many ways, buy inability to do FFT is a bum rap. Incidently, only the SX9 gets over 10% of HPL this way, the rest of the pack is in the 3-6% area. Global FFT is hard. There is a very large market for FFT cycles, particularly in doing 2D and 3D ffts for seismic processing. The data sets for these calculations get so large, that there is no way to do substantial numbers of them in parallel, because the machines do not have enough memory to hold, say a terabyte of data per run,. If you try to do it any other way, you've succeeded in turning a cluster communications problem into a (worse) I/O problem. -Larry |