From: Stefan Monnier on
>>> PARALLEL-FOR(20%) i = 1 TO 50 WITH DO
>>> dosomething with i
>>> DONE

> What's 20%?

The expected efficiency.

> As the Cray and subsequent guys have learned:
> you are assuming, for instance, no interactions of i on the LHS with
> i-1 on the RHS.

Not at all. All the annotation here is saying is "I expect this code to
have at least 20% efficiency", so if inter-iteration dependencies prevent
such efficiency, it's a bug that should be reported.

This is just one random example thrown in. Other things would be to make
inter-iteration dependencies explicit, so the compiler would only have to
check them rather than infer them. After all, the programmer has to be
aware of them to get good performance anyway, so let him write down what he
knows so it can be sanity-checked.

> A couple of decades ago, Dave Kuck detailed a survey of all the problems
> needed in parallel software as an opening session of an ICPP.
> Unfortunately that paper is hard to find (it's like 1975 + or minus a year
> or 2).

Where could it be found (my library doesn't seem to carry such things)?

> So you are about 1974 compiler non-UIUC technology.

That wouldn't surprise me, although I feel like we haven't made much (if
any) progress in this area.

> I'm not certain how compilers estimate efficiency. It's barely
> recognized in the community ("cycles for free").

Indeed, and I believe this is the problem.


Stefan
From: Stefan Monnier on
> In current Fortran, one would likely use an array expression, no loops or
> threads in sight. The compiler is completely free (within the defined
> semantics of the expression) to parallelize as it pleases.

My point is that this is the exactly wrong way to go about it. Rather than
hope the compiler will do the right thing, you should be able to write the
code in such a way that the compiler understands that it is expected to
parallelize the loop in a particular way (or better, if that can be defined
"objectively") and that it's a bug in the source code if it can't.


Stefan
From: BDH on
> My point is that this is the exactly wrong way to go about it. Rather than
> hope the compiler will do the right thing, you should be able to write the
> code in such a way that the compiler understands that it is expected to
> parallelize the loop in a particular way (or better, if that can be defined
> "objectively") and that it's a bug in the source code if it can't.

The point was, loops are not good things to parallelize.

From: Stefan Monnier on
>> My point is that this is the exactly wrong way to go about it. Rather than
>> hope the compiler will do the right thing, you should be able to write the
>> code in such a way that the compiler understands that it is expected to
>> parallelize the loop in a particular way (or better, if that can be defined
>> "objectively") and that it's a bug in the source code if it can't.

> The point was, loops are not good things to parallelize.

Then read "code" where I wrote "loop".


Stefan
From: Nick Maclaren on

In article <jwvejr5zuah.fsf-monnier+comp.arch(a)gnu.org>,
Stefan Monnier <monnier(a)iro.umontreal.ca> writes:
|>
|> > In current Fortran, one would likely use an array expression, no loops or
|> > threads in sight. The compiler is completely free (within the defined
|> > semantics of the expression) to parallelize as it pleases.
|>
|> My point is that this is the exactly wrong way to go about it. Rather than
|> hope the compiler will do the right thing, you should be able to write the
|> code in such a way that the compiler understands that it is expected to
|> parallelize the loop in a particular way (or better, if that can be defined
|> "objectively") and that it's a bug in the source code if it can't.

I remember when exactly the same argument was used to claim that all
serious HPC programs should be coded in assembler, because relying on
the compiler's optimisation was the wrong way to proceed :-)

In the case of simple array operations, a run-time system is likely to
code-generate better than a programmer. Where it falls down is in
deciding how to distribute the array - and that is an unsolved problem,
whether it be done manually or automatically, despite many attempts
at systematising the issue.


Regards,
Nick Maclaren.