Why does Intel favor thin rectangular CPUs? [Computer Architecture]

Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?

From: Robert Myers on 7 Mar 2010 22:45

On Mar 7, 7:32 pm, Del Cecchi <delcecchinospamoftheno...(a)gmail.com>
wrote:
> Andy "Krazy" Glew wrote:
> > Del Cecchi wrote:
> >> For a few months a year, Minnesota isn't that much different from
> >> antartica. :-)
>
> > Stop teasing me like that, Del.
>
> > I'd love to live in Minnesota. I'm Canadian, from Montreal, and my wife
> > is from Wisconsin. She goes dogsledding in Minnesota every February.
>
> > But AFAIK there are no jobs for somebody like me in Minnesota.
>
> > If you are a computer architect, it's Intel in Oregon, Silicon Valley,
> > Austin. Where else?
>
> Perhaps IBM in Rochester MN or maybe even Mayo Clinic, Rochester. The
> clinic does a lot of special stuff for medical equipment and Dr Barry
> Gilbert had a group that did high speed stuff for Darpa.
>
> Seehttp://mayoresearch.mayo.edu/staff/gilbert_bk.cfm
>
> IBM Rochester does the (somewhat reviled) Blue Gene and is part of the
> team doing the Power Processors.

I am His Highness' dog at Kew;
Pray tell me, sir, whose dog are you?

Robert.

From: Andrew Reilly on 7 Mar 2010 23:15

On Sun, 07 Mar 2010 17:51:55 -0800, Robert Myers wrote:

> I've explained my objections succinctly. Of 64000 or more processors,
> it can use only 512 effectively in doing an FFT. That the bisection
> bandwidth is naturally meaured in *milli*bytes per flop is something
> that I have yet to see in an IBM publication.

Is that really a serious limitation? (512 cores doing a parallel FFT) I
know I'm not familiar with the problem space, but that already seems way
out at the limits of precision-limited usefulness. How large (points) is
the FFT that saturates a 512-processor group on a BG? Are there *any*
other super computers that allow you to compute larger FFTs at an
appreciable fraction of their peak floating point throughput?[*]

Clearly it is a limitation, otherwise you wouldn't be complaining about
it. Still seems that it might be useful to be able to be doing 128 of
those mega-FFTs at the same time, if doing lots of them is what you cared
about.

I reckon I'd be more worried about the precision of the results than the
speed of computing them, though.

[*] Are any of the popular supercomputer benchmarks capacity-shaped in
this way, rather than rate-for-fixed-N-shaped? How many problems are
capacity limited rather than rate limited?

Cheers,

--
Andrew

From: Ken Hagan on 8 Mar 2010 04:52

On Mon, 08 Mar 2010 04:27:44 -0000, Robert Myers <rbmyersusa(a)gmail.com>
wrote:

> Q: Will our thermonuclear weapons work, despite no actual testing for
> decades?

I thought that was rather the point (*). Reasonable people can understand
that total nuclear disarmament is not going to happen because no side
wishes to be seen to have disarmed first. By the cunning use of a test ban
treaty, we will arrive at the end of the present century with all sides
completely confident that no-one else has working weapons, but no-one ever
had to pass through a period where one side knew that they'd given up
theirs but another side still had some.

(* Well, perhaps this isn't the precise mechanism everyone had in mind,
but the test ban treaties presumably *are* intended to let all sides just
relax a little bit, in the hope that the world can stay intact long enough
to become a safer place. And if blowing up imaginary bombs keeps the hawks
out of mischief then it is probably money well spent.)

From: Robert Myers on 8 Mar 2010 15:15

On Mar 7, 11:15 pm, Andrew Reilly <areilly...(a)bigpond.net.au> wrote:
> On Sun, 07 Mar 2010 17:51:55 -0800, Robert Myers wrote:
> > I've explained my objections succinctly. Of 64000 or more processors,
> > it can use only 512 effectively in doing an FFT. That the bisection
> > bandwidth is naturally meaured in *milli*bytes per flop is something
> > that I have yet to see in an IBM publication.
>
> Is that really a serious limitation? (512 cores doing a parallel FFT) I
> know I'm not familiar with the problem space, but that already seems way
> out at the limits of precision-limited usefulness. How large (points) is
> the FFT that saturates a 512-processor group on a BG? Are there *any*
> other super computers that allow you to compute larger FFTs at an
> appreciable fraction of their peak floating point throughput?[*]
>
> Clearly it is a limitation, otherwise you wouldn't be complaining about
> it. Still seems that it might be useful to be able to be doing 128 of
> those mega-FFTs at the same time, if doing lots of them is what you cared
> about.
>
> I reckon I'd be more worried about the precision of the results than the
> speed of computing them, though.
>
These would typically be volumetric FFT's, and I believe that's the
way that IBM got 512 processors working together--doing volumetric
FFT's. With 1 gigabyte memory per node, that's a cube with 4000
points on each edge, using 64-bit words. For making pretty pictures,
that's plenty. For understanding the physics of turbulent flows at
real world Reynolds numbers, it isn't very interesting. I don't
believe that precision is an issue.

The foundational mathematics of the Navier-Stokes equations are
unknown, even things like smoothness properties. Using spectral
methods from macroscopic scales all the way down to the dissipation
scale is the only way I know to do fluid mechanics without tangling up
unknown and unknowable numerical issues with the physics and
mathematics you'd like to explore.

It's fine if not many people are interested in that foundational
problem, but if computers continue to be driven by the linpack
mentality, no one will be able to make progress on it without some
fundamental mathematical breakthrough.

That is far from being the only problem governed by strongly nonlinear
equations that require a large range of scales (and thus, many grid
points), but it is the one with which I am the most familiar.

You can't routinely run the huge calculations you'd like to be able to
run, but you can at least test the accuracy of the turbulence models
that are used in practice.

Robert.

From: Larry on 8 Mar 2010 15:16

On Mar 7, 11:15 pm, Andrew Reilly <areilly...(a)bigpond.net.au> wrote:
> On Sun, 07 Mar 2010 17:51:55 -0800, Robert Myers wrote:
> > I've explained my objections succinctly. Of 64000 or more processors,
> > it can use only 512 effectively in doing an FFT. That the bisection
> > bandwidth is naturally meaured in *milli*bytes per flop is something
> > that I have yet to see in an IBM publication.
>
> Is that really a serious limitation? (512 cores doing a parallel FFT) I
> know I'm not familiar with the problem space, but that already seems way
> out at the limits of precision-limited usefulness. How large (points) is
> the FFT that saturates a 512-processor group on a BG? Are there *any*
> other super computers that allow you to compute larger FFTs at an
> appreciable fraction of their peak floating point throughput?[*]
>
> Clearly it is a limitation, otherwise you wouldn't be complaining about
> it. Still seems that it might be useful to be able to be doing 128 of
> those mega-FFTs at the same time, if doing lots of them is what you cared
> about.
>
> I reckon I'd be more worried about the precision of the results than the
> speed of computing them, though.
>
> [*] Are any of the popular supercomputer benchmarks capacity-shaped in
> this way, rather than rate-for-fixed-N-shaped? How many problems are
> capacity limited rather than rate limited?
>
> Cheers,
>
> --
> Andrew

Actually I think R. Myers' facts are wrong here. I downloaded the
HPCC Challenge Results data from the UTK website, and added
a column comparing computing the ration of global FFT performance
versus global HPL performance. Then I filtered away all systems not
achieving at least 100 GFlops FFT performance.

For systems achieving over a teraflop of FFT performance, BG/L with
32K cores is beaten <only> by the SX-9 in this figure of
merit. For systems achieving over 100 GF, it is beaten by the SX-9,
by the best Nahelem/Infiniband systems, by the Cray XT3, and
by the largest SiCortex machine.

BG/L is hard to program, and weird in many ways, buy inability to do
FFT is a bum rap.

Incidently, only the SX9 gets over 10% of HPL this way, the rest of
the pack is in the 3-6% area. Global FFT is hard.

There is a very large market for FFT cycles, particularly in doing 2D
and 3D ffts for seismic processing. The data sets for
these calculations get so large, that there is no way to do
substantial numbers of them in parallel, because the machines
do not have enough memory to hold, say a terabyte of data per run,.
If you try to do it any other
way, you've succeeded in turning a cluster communications problem into
a (worse) I/O problem.

-Larry

First | Prev | Next | Last
Pages: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?