From: nmm1 on
In article <hk6iv7$jlr$1(a)news.eternal-september.org>,
nedbrek <nedbrek(a)yahoo.com> wrote:
>"Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in message
>news:20bi37-lic2.ln1(a)ntp.tmsw.no...
>>
>> Pure streaming architectures, even in the form of GPGPU, are dead.
>
>Terje is right. Even GPUs are moving to short vectors (for ray tracing).
>Long vectors are just so specialized, there is no market for them.

That's not true. There is a market - and it's still a profitable
one. Whether Terje is right in an absolute sense, or only for
'general purpose' designs, is less clear.

>The super guys are riding the coat tails of product oriented towards
>consumers. That's why clusters are so important. They're the only way to
>get more power. Long vectors are "embarassingly parallel", so you can use
>just about any method to exploit them.

That's not true, either. A lot of vector codes are very hard to
convert to hierarchical storage models (including both caching
designs and distributed memory). That are the sort of codes that
tended to rely on particular vector implementations (e.g. efficient
scatter/gather).



Regards,
Nick Maclaren.
From: Robert Myers on
On Feb 1, 2:04 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
> Robert Myers wrote:
> > SSE is  better than nothing, but my long-term bet is on streaming
> > architectures, which are a generalization of Cray-style vector
> > parallelism.
>
> > NO MORE NEW LANGUAGES.  Asm only, if necessary, but figure out how to
> > expand the space that can be handled with a streaming paradigm.
> > Fortunately, GPGPU will save us from all of the expensive mistakes the
> > US has been making ever since Seymour left the scene.  That's my hope,
> > anyway.
>
> Pure streaming architectures, even in the form of GPGPU, are dead.
>
> There is absolutely no way to get enough bandwidth for a pure streaming
> core, you have to have caches.
>
I don't buy the absolute bandwidth argument because I don't believe
electrons are forever. Moving data around (especially if you have to
convert from electrons to photons and back again) is costly from the
pov of energy. That, not bandwidth limitations, is what will
ultimately keep localization at a premium, in my unschooled opinion.

> What will determine your actual throughput is how well you can partition
> your problem into parts that can be reused at least a few times (i.e.
> cached) and those that cannot and have to be streamed by instead.
>
> The crucial part of streaming is simply that it makes it explicit that
> these inputs should not and must not pollute the caches!
>
I don't know how much you can buy with partitioning. The Top 500 list
focuses on a problem where you can buy a lot. :-(

Partitioning, if it has to be done by hand, is always hardware-
specific. Now that everything is x86, there's little reason not to
write in assembler, but the global architecture *still* won't be
universal (that is to say, high level languages as they are constantly
being reinvented buy you nothing of use for portability).

Robert.
From: Terje Mathisen "terje.mathisen at on
Robert Myers wrote:
> On Feb 1, 2:04 am, Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>> Robert Myers wrote:
>>> SSE is better than nothing, but my long-term bet is on streaming
>>> architectures, which are a generalization of Cray-style vector
>>> parallelism.
>>
>>> NO MORE NEW LANGUAGES. Asm only, if necessary, but figure out how to
>>> expand the space that can be handled with a streaming paradigm.
>>> Fortunately, GPGPU will save us from all of the expensive mistakes the
>>> US has been making ever since Seymour left the scene. That's my hope,
>>> anyway.
>>
>> Pure streaming architectures, even in the form of GPGPU, are dead.
>>
>> There is absolutely no way to get enough bandwidth for a pure streaming
>> core, you have to have caches.
>>
> I don't buy the absolute bandwidth argument because I don't believe
> electrons are forever. Moving data around (especially if you have to
> convert from electrons to photons and back again) is costly from the
> pov of energy. That, not bandwidth limitations, is what will
> ultimately keep localization at a premium, in my unschooled opinion.

Moving stuff is hard/slow/expensive, just holding on to it is much
easier/faster/cheaper (both in $ and J), right?

I.e. we're in violent agreement. :-)
>
>> What will determine your actual throughput is how well you can partition
>> your problem into parts that can be reused at least a few times (i.e.
>> cached) and those that cannot and have to be streamed by instead.
>>
>> The crucial part of streaming is simply that it makes it explicit that
>> these inputs should not and must not pollute the caches!
>>
> I don't know how much you can buy with partitioning. The Top 500 list
> focuses on a problem where you can buy a lot. :-(
>
> Partitioning, if it has to be done by hand, is always hardware-
> specific. Now that everything is x86, there's little reason not to
> write in assembler, but the global architecture *still* won't be
> universal (that is to say, high level languages as they are constantly
> being reinvented buy you nothing of use for portability).

When all calculation is effectively free, the language you use to
implement those calculations matter very little.

What does matter is how hard or easy it is to handle all the required
forms of communication, be it load/store, (distributed) shared memory,
transactional memory, mpi, whatever.

C(++) _might_ not be the best choice.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Robert Myers on
On Feb 1, 11:39 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:

>
> When all calculation is effectively free, the language you use to
> implement those calculations matter very little.
>
> What does matter is how hard or easy it is to handle all the required
> forms of communication, be it load/store, (distributed) shared memory,
> transactional memory, mpi, whatever.
>
> C(++) _might_ not be the best choice.
>

I'm pretty clueless about C++. I mean, I've written toy programs, and
I understand what the advantages are supposed to be, but my experience
has been that the advantages are mostly theoretical. It is possible
to write *really* obscure code in C++.

Worrying about where data actually resides and moving it around is
labor-intensive as well as energy-intensive. It's a bigger obstacle
to the usefulness of "supercomputers" than anything else. I left out
the slide where G Bell quotes an LLNL mucky-muck talking about how
labor-intensive these wonderful machines all are.

I'm sure you understand all this. Lots of roadblocks that more flops
won't fix.

Robert.
From: Eugene Miya on
In article <ea2c8491-2403-499f-94dd-1fb3d37cd8f5(a)o28g2000yqh.googlegroups.com>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>Three R Myers prizes to G Bell for the second presentation. Others
Oh, does he award money?
>may be interested in different things from that which got my
>attention.
>
>http://research.microsoft.com/en-us/um/people/gbell/ISHPC99.ppt
>
>(My comments are in parentheses)
>
>Bell-Hillis Bet slide (this business is really about massive single-image egos)

8^)

I drive by Gordon's condo every so often, and his offices are next door
to where Avatar is playing in 3-D Imax. I last same him at an IBM
Almaden event in July where we chatted about where the Museum is headed
(he's not happy either). I skipped his most recent book event due to a
time conflict. I see Danny also with some frequency at Long Now events.
Danny is pretty much out of the parallel computer biz. I think Danny is
doing longer term more important things for civilization.

>ARPA-funded product development failed (doesn't mention effect of DoD
>directive about COTS, though)

Those are different programs.

>ASCI: DOE-funded product purchases creates competition (HAHAHAHAHA)

That's how they attempt to sell the thing to Congress.
That's not an endorsement.

>First efforts in startups... all failed.

You and the world might think that. Depends on time horizon.

>(Left off the role of the DoE in assuring continuing domination by IBM.)

Trust me: the DOE doesn't like that any more than you do. Not that this
is a defense of the DOE. They were given their marching orders in the
late 60s and 70s. They are NOT in the computer construction biz.

>Supercomputing is for the large and rich (Is that a statement of fact
>or a definition?

Definition.

> My desktop will run circles around the
>supercomputers I used to use and that I still think of as the genuine
>article.)

That's because of the past tense. That little 'd' at the end of "use."
You could go back....

>Beowulf, shrink wrap cluters (no negative comments, but who is Gordon
>Bell being paid by... a mfr of shrink-wrap software)

Balmer is paying a lot more people than just Gordon. One should note
that Beowulf clusters didn't run Windows.

>Virtuous Economic Cycle that drives the PC industry (should have
>monoculture, not standards in the center).
The acronym is Wintel.
>(Even a picture of Gordon Bell with an NT (!) cluster)
>You should /not/ focus NSF CS Research on parallelism. I can barely
>write a correct sequential program." Don Knuth 1987 (to Gbell... of
>course)

Depends how you read CS:
computer science
computational science

I just got another unsolicited hexadecmimal buck check from Knuth.

>Non-U.S. users continue to use vectors (no AF captains to drive things
>back to familiar territory--PC's--in Europe)

Depends on the country. Certainly not India nor PRC.

>Interconnecton networks log(p) continue to be the challenge (unless
>you are using a router chip per node, which scales as p/n, where n is
>the number of processors per node)

Design at too low a level.

>Russian Elbrus E2K Micro

Overrated.

>What is the Processor Architecture (slam dunk for vectors, good reason
>to close down CS departments wholesale)
>It's memory bandwidth, cache prediction, inter-communication, and
>overall variation (G Bell gets an R Myers prize for getting it).

I think Gordon knew that before most of us were born.

>Execution efficiencies of Community Climate Model Version 2 (why this
>fluid mechanicist has such contempt for the Top 500 list).

I think CCM is on 5 now.

>Are US users being denied tools? (well, no. they have been sold on
>stupid tools)

The punch line to this argument is "PRINT" statements.

>Do larger scale, faster, longer run times, increase problem insight
>and not just total flop or flops (second R Myers prize to G Bell in
>one presentation)
>Challenge to funders: Is the cost justified? (THREE R Myers prizes to
>G Bell in one presentation).

It's like Alice and the Red Queen: to stay in place, you have to run faster.
If you don't, you might consider getting out of the game. Just try it.
No one gets ahead (money-wise) with longer run times.

Do I think it's justified? Personally, no.
But, it is about attention span (an 80s PIEEE paper by a former boss).

But you also only see part of the playing field as most people do.

--

Looking for an H-912 (container).