Interesting presentation [Computer Architecture]

Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?

From: Stephen Fuld on 28 Mar 2010 21:56

On the Research channel, which I receive through Dish Network, they show
some of the computer science colloquiums at the University of
Washington. I recently watched a lecture by professor Pat Hanrahan of
Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It
ties together some of the topics that have occurred in different recent
threads in this group, including the highly parallel SIMT stuff and the
need for appropriate domain specific languages to get the most out of
the environment

You can watch the presentation at

http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345

There were several things that I thought were interesting and perhaps
even promising.

First is that the Folding(a)Home client has been rewritten to use a
graphics card with a great speedup. The thing I thought that was
significant about this is that protein folding is a more traditional HPC
application than the more graphics oriented things like Photoshop
effects that seem to be dominating the GPGPU scene.

This also leaves open the possibility of a lot of architecture work in
developing these highly parallel systems in a way that they are
effective for graphics (so that they have substantial volumes) but are
better optimized for more traditional HPC applications.

In a discussion about the language issue, he mentions that this is
really the subject of a different presentation. So I looked at his web
site and found the following presentation

http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf

This talks about using a new system that they are working on that
supports various levels of heterogeneous parallelism in a domain
specific way to support what seems to be to be straight up supercomputer
applications such as turbulence modeling.

I am far from an expert in this area, but it appears that people are
working hard on exactly what the people here have been talking about.

Comments welcome.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: nmm1 on 29 Mar 2010 05:26

In article <hop1d4$ffo$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>
>In a discussion about the language issue, he mentions that this is
>really the subject of a different presentation. So I looked at his web
>site and found the following presentation
>
>http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf
>
>This talks about using a new system that they are working on that
>supports various levels of heterogeneous parallelism in a domain
>specific way to support what seems to be to be straight up supercomputer
>applications such as turbulence modeling.
>
>I am far from an expert in this area, but it appears that people are
>working hard on exactly what the people here have been talking about.
>
>Comments welcome.

Deja moo.

That's a little unfair, but only a little. One of the major language
revolutions of the 1960s was the move away from platform-specific
languages to application-domain-specific languages, generic across
architectures. Since then, there have been repeated attempts to go
back to the 1950s (i.e. platform-domain-specific languages), most
have sunk without trace, and none have lasted very long. To a great
extent, the ONLY platform-domain-specific languages that have
succeeded are those for vector systems (R.I.P.), message-passing
systems, and (to some extent) OpenMP.

When this situation arises, I always ask the following questions:
1) Exactly what has changed since the previous times?
2) Exactly why did the previous systems succeed or fail?
3) Does (1) mean that (2) no longer holds?

Given that the causes of failure in the past have NOT typically been
the mismatch of a language to the platform, but the mismatch from
the user's requirement and abilities to the language, a different
approach is needed.

Yes, some of the things proposed in that talk work, but they are
already being done and need no language changes.

Regards,
Nick Maclaren.

From: Terje Mathisen on 29 Mar 2010 06:24

Stephen Fuld wrote:
> On the Research channel, which I receive through Dish Network, they show
> some of the computer science colloquiums at the University of
> Washington. I recently watched a lecture by professor Pat Hanrahan of
> Stanford. The lecture is titled "Why are Graphics Systems so Fast?". It
> ties together some of the topics that have occurred in different recent
> threads in this group, including the highly parallel SIMT stuff and the
> need for appropriate domain specific languages to get the most out of
> the environment
>
> You can watch the presentation at
>
> http://www.researchchannel.org/prog/displayevent.aspx?rID=30684&fID=345
>
> There were several things that I thought were interesting and perhaps
> even promising.
>
> First is that the Folding(a)Home client has been rewritten to use a
> graphics card with a great speedup. The thing I thought that was
> significant about this is that protein folding is a more traditional HPC
> application than the more graphics oriented things like Photoshop
> effects that seem to be dominating the GPGPU scene.

A couple of months ago I posted a link to a paper by some seismic
processing people, they had ported their application to NVidia, and
gotten _very_ significant speedups.

>
> This also leaves open the possibility of a lot of architecture work in
> developing these highly parallel systems in a way that they are
> effective for graphics (so that they have substantial volumes) but are
> better optimized for more traditional HPC applications.

That sounds exactly like what Intel have stated about the reason for
developing Larrabee, except they've given up on the first-generation
graphics product while continuing with the HPC target.
>
> In a discussion about the language issue, he mentions that this is
> really the subject of a different presentation. So I looked at his web
> site and found the following presentation
>
> http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf
>
> This talks about using a new system that they are working on that
> supports various levels of heterogeneous parallelism in a domain
> specific way to support what seems to be to be straight up supercomputer
> applications such as turbulence modeling.
>
> I am far from an expert in this area, but it appears that people are
> working hard on exactly what the people here have been talking about.

The seismic paper shows how they started with a straight-forward port,
and got pretty much no speedup at all, then they went on to do more and
more platform-specific optimizations, ending up with something which was
40X (afair) faster, but of course totally non-portable.

I.e. the only real key to the speedups was to grok the mapping of the
problem onto the available hardware.

Terje

From: Stephen Fuld on 29 Mar 2010 11:51

On 3/29/2010 2:26 AM, nmm1(a)cam.ac.uk wrote:
> In article<hop1d4$ffo$1(a)news.eternal-september.org>,
> Stephen Fuld<SFuld(a)Alumni.cmu.edu.invalid> wrote:
>>
>> In a discussion about the language issue, he mentions that this is
>> really the subject of a different presentation. So I looked at his web
>> site and found the following presentation
>>
>> http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl.pdf
>>
>> This talks about using a new system that they are working on that
>> supports various levels of heterogeneous parallelism in a domain
>> specific way to support what seems to be to be straight up supercomputer
>> applications such as turbulence modeling.
>>
>> I am far from an expert in this area, but it appears that people are
>> working hard on exactly what the people here have been talking about.
>>
>> Comments welcome.
>
> Deja moo.
>
> That's a little unfair, but only a little. One of the major language
> revolutions of the 1960s was the move away from platform-specific
> languages to application-domain-specific languages, generic across
> architectures. Since then, there have been repeated attempts to go
> back to the 1950s (i.e. platform-domain-specific languages), most
> have sunk without trace, and none have lasted very long. To a great
> extent, the ONLY platform-domain-specific languages that have
> succeeded are those for vector systems (R.I.P.), message-passing
> systems, and (to some extent) OpenMP.

Perhaps I am misunderstanding something, but if you look at page 3, they
seem to be targeting a whole range of different platform types, ranging
from clusters, multi-core chips and GPU type things as well as
combinations of them. One of the things I liked about their work is
that it seem not to be platform specific.

> When this situation arises, I always ask the following questions:
> 1) Exactly what has changed since the previous times?

The ready and low cost availability of very highly parallel, high speed,
but limited in various arcane ways chips. i.e. the GPGPU movement

> 2) Exactly why did the previous systems succeed or fail?

As you said, vector systems succeeded well for their time. In a sense,
the GPGPU is sort of like a FPS co-processor, and to the extent that the
instructions to use it are integrated into the CPU, sort of like a
vector machine.

> 3) Does (1) mean that (2) no longer holds?

Well, of course, that is TBD. :-)

> Given that the causes of failure in the past have NOT typically been
> the mismatch of a language to the platform, but the mismatch from
> the user's requirement and abilities to the language, a different
> approach is needed.

Again, that is one thing I thought seemed to be good about Liszt. It
seem to have primitives that matched to what many HPC programs need,
e.g. meshes, vectors, etc. and some automatic tools to select good methods.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: nmm1 on 30 Mar 2010 03:38

In article <hoqia6$h3b$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>
>Perhaps I am misunderstanding something, but if you look at page 3, they
>seem to be targeting a whole range of different platform types, ranging
>from clusters, multi-core chips and GPU type things as well as
>combinations of them. One of the things I liked about their work is
>that it seem not to be platform specific.

Perhaps I was being unfair. However, I looked at their examples more
than their blurb, and my conclusions weren't based on that.

>> 1) Exactly what has changed since the previous times?
>
>The ready and low cost availability of very highly parallel, high speed,
>but limited in various arcane ways chips. i.e. the GPGPU movement

Yes. But they have been available within the cost of a researcher's
discretionary budget before, and a large number of academic staff
and students failed to get far with them.

>> 2) Exactly why did the previous systems succeed or fail?
>
>As you said, vector systems succeeded well for their time. In a sense,
>the GPGPU is sort of like a FPS co-processor, and to the extent that the
>instructions to use it are integrated into the CPU, sort of like a
>vector machine.

Yes. I use the FPS analogy, as well. That flew, for a bit, until
Intel got their act together on floating-point.

>Again, that is one thing I thought seemed to be good about Liszt. It
>seem to have primitives that matched to what many HPC programs need,
>e.g. meshes, vectors, etc. and some automatic tools to select good methods.

I will try to take another look, but I was singularly unimpressed
by page 12. The point is that we KNOW those problems are intractable,
and the best researchers in the world have failed to make any headway
over the past 40 years! The point is that you need to embed the
architectural assumptions into the program design for such a compiler
to have a hope in hell - yes, it can optimise for variations on a
theme, but no more than that.

Indeed, the very concept of owning and ghost cells is architecture-
specific!

Regards,
Nick Maclaren.

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?