Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: nmm1 on 3 Jan 2010 15:32

In article <N82dnaYlXssoZt3WnZ2dnUVZ_vJi4p2d(a)bestweb.net>,
Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>> In general, it doesn't, but it more-or-less does for the one you are
>> doing - which is NOT the way to do multi-dimensional FFTs! It is
>> almost always much faster to transpose, so you are doing vector
>> operations in the FFT at each stage.
>
>The initial bit-reversal is trivial to parallelize (at least for 1D
>FFTs). It will (at most) involve a single copy of the array over the
>entire network.

Eh? If you mean the actual bit-reversal of the size, then it's
irrelevant, and doesn't need access to the data!

The issue isn't a single distribution of the FFT (unless the FFT is
too large for a single node, as is common with multi-dimensional FFTs).
It's the access pattern. The native access pattern of FFTs is close
to pessimal on modern cached (and even paged) machines.

Regards,
Nick Maclaren.

From: Mayan Moudgill on 3 Jan 2010 15:41

nmm1(a)cam.ac.uk wrote:
> In article <N82dnaYlXssoZt3WnZ2dnUVZ_vJi4p2d(a)bestweb.net>,
> Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>>>In general, it doesn't, but it more-or-less does for the one you are
>>>doing - which is NOT the way to do multi-dimensional FFTs! It is
>>>almost always much faster to transpose, so you are doing vector
>>>operations in the FFT at each stage.
>>
>>The initial bit-reversal is trivial to parallelize (at least for 1D
>>FFTs). It will (at most) involve a single copy of the array over the
>>entire network.
>
>
> Eh? If you mean the actual bit-reversal of the size, then it's
> irrelevant, and doesn't need access to the data!
>
> The issue isn't a single distribution of the FFT (unless the FFT is
> too large for a single node, as is common with multi-dimensional FFTs).
> It's the access pattern. The native access pattern of FFTs is close
> to pessimal on modern cached (and even paged) machines.
>

I think we may be saying the same thing. The initial phase of an FFT
consists of a transpose (based on the so-called bit-reversal transpose).
Then, the rest of the FFT consists of lgN steps of the form:

for( j = 0; j < N/(2*2**S); j++ ) {
for( i = 0; i < 2**S; i++ ) {
x[i], x[i+L] = fft_bfly(x[i], x[i+L])
}
}

where S is the step number, which is perfectly vectorizable.

The initial transpose:
x[ bitreverse(i) ] = x[i]
can also be made vectorizable, basically by tiling.

From: Robert Myers on 3 Jan 2010 16:03

On Jan 3, 2:51 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:

>
> Anyway, Nick, you're right that things like FFTs are network bandwidth
> limited; however, it is still possible to get fairly good speedups.
>
> Of course, I haven't analyzed the other approaches; e.g. do the last
> lg2M stages on one processor (so that there is only one communication
> phase) or other points in the parallelism space.

Publish! But try it out on Blue Gene first.

Robert.

From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 3 Jan 2010 16:14

Robert Myers <rbmyersusa(a)gmail.com> wrote:

> On Jan 3, 12:34 pm, "Del Cecchi" <delcec...(a)gmail.com> wrote:
>
> > You keep this up and I will have to dust off thunderbird for comp.arch
> > posting since it seems to get the quoting/attribution correct even for
> > robert's posts from google, unlike oe. This will eliminate the
> > concern with people putting robert's words in my mouth as above.
>
> I believe that the problem is that Google groups posts in html, rather
> than in plain text. There is a plain text option for gmail, but not
> for Google groups.

Your posts are not in HTML. I can't see anything that should trip up a
newsreader.

--
Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark

From: Robert Myers on 3 Jan 2010 16:26

On Jan 3, 3:16 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:

> Robert Myers wrote:
> > I assume that most who buy installations like Blue Gene would have RAS
> > requirements that would be hard or impossible to meet with a Beowulf
> > cluster. In the end, it's probably RAS that rules.
>
> What kind of recovery are you looking for? Against node failures?
>
> I'd suggest that distributed checkpointing with restart on failures
> could be an adequate model for small-to-mid-size clusters.

It's the entire management problem: backup, recovery from failure,
detecting and isolating failure, minimizing downtime, and minimizing
time and focus required for maintenance.

There are, indeed, plausible and relatively off-the-shelf solutions
for small-to-mid-size clusters, but not for a cluster the size of the
Blue Gene installations I know about.

Robert.

First | Prev | Next | Last
Pages: 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
Prev: PEEEEEEP
Next: Texture units as a general function