Batched_Instruction_Windowing_for_Single_Threaded_Execution [Computer Architecture]

Prev: Larrabee Dead
Next: CHEA Dumbass George Gollin's Racial Hate

From: nmm1 on 4 Jun 2010 04:07

In article <4C087DD5.6050502(a)patten-glew.net>,
Andy 'Krazy' Glew <ag-news(a)patten-glew.net> wrote:
>On 6/3/2010 11:58 AM, Robert Myers wrote:
>> On Jun 2, 12:15 am, Andy 'Krazy' Glew<ag-n...(a)patten-glew.net> wrote:
>>
>Getting back to parallelism:
>
>I'm most hopeful about programmer expressed parallelism.
>
>I think that one of the most important things for compilers will be
>to large amounts of programmer expressed parallelism in an ideal
>machine - PRAM? CSP? - to whatever machine you have.

Yes and no. Technically, I agree with you, and have been riding
that hobby-horse for nearly 40 years now! It is, after all, why
extreme HPC systems work so much better with Fortran codes than
C or C++ ones - the extra restrictions express parallelism that
the compiler is permitted to use.

Unfortunately, since the demise of Algol 68, the languages that
are favoured by the masses have been going in the other direction.
Fortran 90 has not, but it's now a niche market language. Worse,
the moves towards languages defining a precise abstract machine
are regarded as obsolete (though Java is an exception), so most
languages don't have one.

And, no, I do NOT mean the bolt-on extensions that are so common.
They sometimes work, just, but never well. You can't turn a
Model T Ford into something capable of maintaining 60 MPH, reliably,
by bolting on any amount of modern kit!

Regards,
Nick Maclaren.

From: Terje Mathisen "terje.mathisen at on 4 Jun 2010 05:04

Andy 'Krazy' Glew wrote:
> E.g. Andrew Wolfe (author of "Optimizing Supercompilers for
> Supercomputers") taught that compilers had never really gotten all that
> good at vector parallelism. Rather, humans started learning to write
> code in the idioms that compilers could vectorize.

This is fine for new code, but a showstopper for existing C(++) code. :-(
> Nevertheless, I still like to work with compiler teams. History:
>
> 1991: at the start of P6 Intel's compilers were not that good - but
> Intel made a great effort to improe them. However, much of that effort
> was devoted to optimizing P5 (in-order machines need a lot of
> optimization). Compiler folk were somewhat frustrated by P6 running
> unoptimized code almost as fast as optimized code [*], although they
> were happy to learn that many supposedly P6 specific optimizations
> improved P6 even more.
------------------------------------------ P5 specific --- I assume?

This was my experience as well: Since the P6 tries to execute the oldest
instructions first, good P5 scheduling tended to also make the P6 run
faster.
>
> Overall, working with Intel's compiler team in the early days was fun
> and productive. But it did show me the benefits of loose coupling
> between teams.
>
> Towards 1995, the compiler team started getting sucked into the black
> hole of Itanium. I don't think we ever really saw real dedication to
> optimizing for P6-style OOO.
>
> I wasn't there, but I have heard that the compiler was instrumental in
> band-aiding the worst of the stupidities of Willamette. That was also

Willamette, i.e. first-gen P4, right?

That chip had two simultaneous glass jaws: Both integer mul and shift
were slow, so only LEA remained as a fast way to calculate addresses.

> true for P6: the compiler really helped make up for the shortsighted
> decisions wrt partial registers and memory. Overall a pattern: tight
> interaction between compilers and hardware really helps to make up for
> hardware shortcomings. However, truly aggressive compiler optimization
> can often be done in a more loosely coupled fashion.

That's like the difference between micro-optimization and improving the
basic algorithm. :-)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: nedbrek on 4 Jun 2010 08:21

Hello all,

"Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in message
news:s6smd7-1l72.ln1(a)ntp.tmsw.no...
>
> Willamette, i.e. first-gen P4, right?

Yes.

> That chip had two simultaneous glass jaws: Both integer mul and shift were
> slow, so only LEA remained as a fast way to calculate addresses.

The P4 optimization guide was funny:
1) Don't do multiplies, do shifts
2) Don't do shifts, do adds
3) Don't do loads
4) Don't do branches
5) Do adds
6) Don't do anything but adds and you will be ok.

Ned

From: nedbrek on 4 Jun 2010 08:39

Hello all,

"Mike Hore" <mike_horeREM(a)OVE.invalid.aapt.net.au> wrote in message
news:hua1cl$43v$1(a)news.eternal-september.org...
>
> Thanks for that fascinating stuff, Andy. I'm wondering, where was
> Microsoft while this was going on? Did they use Intel's compiler at all,
> or did they "do it their way?"

AFAIK, MS has always done their own thing on their compiler.

Ned

From: Robert Myers on 4 Jun 2010 12:14

On Jun 4, 12:15 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote:

>
> I agree - or, rather, I strongly agreed back in 1991, and I overall agree now - although experience tends to suggest
> that it ain't necessarily so.
>
> Now, as for "should the parallelism come from the software end, or the hardware end?" You have created a false
> dichotomy: it is really a triangle with three corners:
> 1. explicit programmer expressed parallelism
> 2. compiler expressed parallelism,
> 2.1. parallelization of non-parallel from the programmer
> 2.2. compiler supported parallelism of code that the programmer has expressed as parallel
> 3. hardware supported parallelization
> 3.1. of the explicit parallelism of 1. and 2.2
> 3.2. of code that the programmer and compiler treat as non-parallel
>
> Of the above, I am most heartily in favor of
>
> Explicit parallelism all along the way: 1. + 2.2. + 3.1.
>
> I have a loot of experience in 3.2. hardware parallelization (and in 3.1. hardware support for explicit parallelism).
>
> I'm all in favor of 2.1, compiler parallelization of programmer expressed non-parallel code. But I am most suspicious.
>
> E.g. Andrew Wolfe (author of "Optimizing Supercompilers for Supercomputers") taught that compilers had never really
> gotten all that good at vector parallelism. Rather, humans started learning to write code in the idioms that compilers
> could vectorize.
>
> ---

<snip informative historical information>

> ---
>
> Getting back to parallelism:
>
> I'm most hopeful about programmer expressed parallelism.
>
> I think that one of the most important things for compilers will be to large amounts of programmer expressed parallelism
> in an ideal machine - PRAM? CSP? - to whatever machine you have.

To latch onto the vector code example, it works because three
different things work at the same time:

1. The hardware has a feature that is extremely advantageous if
utilized properly and useless otherwise.

2. The compiler gives the programmer reasonably transparent and
lightweight mechanisms (in the case of Cray Fortran, the compiler
directive $IVDEP, for ignore vector dependency, and vector intrinsics)
for exploiting that potentially advantageous feature.

3. Programmers working from a fairly primitive understanding of the
actual hardware implementation can understand what exactly needs to be
done and why and can successfully write correct code with a high
degree of probability of success. I assume that many programmers
wrote successful code knowing only heuristic rules and not having even
a primitive hardware model in mind.

So far as I know, the only "clever" thing that vectorizing compilers
learned how to do was to turn loop nesting inside out so that the
innermost loop would vectorize. That cleverness wasn't really
necessary, as programmers could easily learn to write in that idiom to
begin with. What *was* necessary was that the compiler could allow
the programmer to invoke vector operation with minimum pain.

This is an example exactly of your 1 + 2.2 + 3.1. It was that kind of
optimization that I had mostly in mind.

So far as I know, this all worked out because it was, at one point,
all under the control of Seymour Cray. No industry-wide committees.

When will we get (for example) truly lightweight threads? When
someone has enough clout to control the first implementation from
beginning to end: hardware support, compiler support, OS support, and
programmer education. Until then, we will be manufacturing papers and
conference presentations and little else.

Once again, my intent is to be provocative.

Robert.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Larrabee Dead
Next: CHEA Dumbass George Gollin's Racial Hate