From: " --MM-->>" on
Richard Maine ha scritto:
> <<--MM-->> <no.spma(a)now.it> wrote:
>
>> Hello,
>> I have a question regandin to "internal optimization" or the meanig of
>> some instruction in fortran 95.
>>
>> I'm speaking of :
>> - DO...ENDDO
>> - FORALL
>> - WHERE...END WHERE
>>

[CUT]

>
> Forall was not designed with optimization in mind. It was designed (in
> HPF) for parallelism, and then added to the Fortran standard as part of
> incorporating the syntactic parts pf HPF. I don't have experience with
> parallel machines to comment knowlegably. But for serial machines, there
> is little reason to expect forall to be more efficient than simple DO
> loops, and there is substantial data to suggest that it is often worse,
> largely because it often involves temporary arrays. I don't know why you
> would think that forall was somehow inherently more optimizable than DO
> loops.
>
> Tim and Gordon discussed that a little, but there is one point which
> they did not mention and which I consider fundamental. Perhaps you know
> this or consider it obvious. But you did ask, and there are some people
> who definitely have been confused by the point, so I feel it important
> to make.
>
> DO is a looping construct. Forall and Where are array assignments. That
> is a really fundamental difference. There are cases where one can
> achieve a desired result using any of the forms, but do not let that
> blind you to the fundamental difference. I have seen people take
> "random" DO loops and change the syntax of the DO statement to that of a
> FORALL, hoping that this might improve their performance or something.
> Except in special cases, this results in something that won't even
> compile.
>

I read about Forall and Where in some paper/tutorial for the fortra95,
and in any case isn't clarifly the real difference, but the idea
suggested was that the compiler can optimize the internal code.
I mean in Do loop on an array a(i,j) I use normaly a sequencing via the
fast coordinate

do i=....
do j=...
a(i,j)=...
enddo
enddo

when the software is increasing I use FORALL e WHERE in order to reduce
the lines of code.

But now I discovered that in this case I can lost the efficency.

Is it true also for dual core or quad core processor?
From: Gordon Sande on
On 2010-01-05 06:39:40 -0400, "<<--MM-->>" <no.spma(a)now.it> said:

> Richard Maine ha scritto:
>> <<--MM-->> <no.spma(a)now.it> wrote:
>>
>>> Hello,
>>> I have a question regandin to "internal optimization" or the meanig of
>>> some instruction in fortran 95.
>>>
>>> I'm speaking of :
>>> - DO...ENDDO
>>> - FORALL
>>> - WHERE...END WHERE
>>>
>
> [CUT]
>
>>
>> Forall was not designed with optimization in mind. It was designed (in
>> HPF) for parallelism, and then added to the Fortran standard as part of
>> incorporating the syntactic parts pf HPF. I don't have experience with
>> parallel machines to comment knowlegably. But for serial machines, there
>> is little reason to expect forall to be more efficient than simple DO
>> loops, and there is substantial data to suggest that it is often worse,
>> largely because it often involves temporary arrays. I don't know why you
>> would think that forall was somehow inherently more optimizable than DO
>> loops.
>>
>> Tim and Gordon discussed that a little, but there is one point which
>> they did not mention and which I consider fundamental. Perhaps you know
>> this or consider it obvious. But you did ask, and there are some people
>> who definitely have been confused by the point, so I feel it important
>> to make.
>>
>> DO is a looping construct. Forall and Where are array assignments. That
>> is a really fundamental difference. There are cases where one can
>> achieve a desired result using any of the forms, but do not let that
>> blind you to the fundamental difference. I have seen people take
>> "random" DO loops and change the syntax of the DO statement to that of a
>> FORALL, hoping that this might improve their performance or something.
>> Except in special cases, this results in something that won't even
>> compile.
>>
>
> I read about Forall and Where in some paper/tutorial for the fortra95,
> and in any case isn't clarifly the real difference, but the idea
> suggested was that the compiler can optimize the internal code.
> I mean in Do loop on an array a(i,j) I use normaly a sequencing via the
> fast coordinate
>
> do i=....
> do j=...
> a(i,j)=...
> enddo
> enddo
>
> when the software is increasing I use FORALL e WHERE in order to reduce
> the lines of code.

To say it again, FORALL and WHERE are array assignments! To make it
trivial, FORALL
is allowed to go from 1 to n, from n down to 1, the even indices up and the odd
indices down and any other way it choses to do so. If it had n
processors it could
use all n in any random order it chose.

A DO loop of

do i = 2, n
a(i) = a(i) + a(i-1)
enddo

will give a progressive partial sum but the same appearance with a FORALL will
only add adjacent values. The DO would give a different answer for "do
i = n, 2, -1"
but the FORALL would not. FORALL does this by having a hidden array
temporary that
might be optimized out. When the right hand side is complicated it can
be hard for
a programmer to figure out a sequential form so instead they just put
the results
into a temporary and copy the temporary at the end. Same for compilers
and FORALL
statements.

> But now I discovered that in this case I can lost the efficency.
>
> Is it true also for dual core or quad core processor?

Is your compiler (exactly that version of that vendor with exactly
those switches!!)
going to multiprocess or not. Clearly it depends. If the compiler comes
from a vendor
of parallel computers and you have paid for the full version and taken
the vendors
cources on parallelism then the chance go way up. Big ifs!

Mostly multicore allows the complier to run at the same time as your
email program.
Some I/O will be overlapped and even some music will be decoded in
parallel. But
beyond that it is hard work.

There is an old saying about yachts. If you have to ask the price you
can not afford one!
Here, if you have to ask about FORALL and WHERE you are very likely to
not be able to
use the parallel features they are intended to enable in very special
circumstnaces.


From: Tim Prince on
<<--MM-->> wrote:

>
> I read about Forall and Where in some paper/tutorial for the fortra95,
> and in any case isn't clarifly the real difference, but the idea
> suggested was that the compiler can optimize the internal code.
> I mean in Do loop on an array a(i,j) I use normaly a sequencing via the
> fast coordinate
>
> do i=....
> do j=...
> a(i,j)=...
> enddo
> enddo
>
> when the software is increasing I use FORALL e WHERE in order to reduce
> the lines of code.
>
> But now I discovered that in this case I can lost the efficency.
>
> Is it true also for dual core or quad core processor?
Your pseudo-code contradicts what you said. Unless you have an
optimizing compiler which swaps loops (and you turn on that option),
nesting the loops backwards as you have done is likely to "lose efficiency."
Similar compiler analysis (or more) is needed to optimize a rank 2
forall(). where() presents somewhat different challenges to optimizing
compilers.
The point was mentioned that forall is intended to require a compiler to
diagnose and reject some situations which might prevent parallel
operation on multi-core. This falls disappointingly short of actually
facilitating parallelism.
From: Phillip Helbig---undress to reply on
In article <1jbt60e.k9bpesczgc7wN%nospam(a)see.signature>,
nospam(a)see.signature (Richard Maine) writes:

> I don't know why you
> would think that forall was somehow inherently more optimizable than DO
> loops.

> DO is a looping construct. Forall and Where are array assignments. That
> is a really fundamental difference.

Maybe that is the reason he thought it would somehow be inherently more
optimizable. DO implies doing things one after the other. If the
compiler can prove to itself that parallel execution is OK, then it can
do that optimisation. However, with FORALL and WHERE, there is no
serial implication, so the compiler can perhaps optimise a bit more
aggressively.

From: Ron Shepard on
In article <hhvt2u$1c0$1(a)online.de>,
helbig(a)astro.multiCLOTHESvax.de (Phillip Helbig---undress to
reply) wrote:

> In article <1jbt60e.k9bpesczgc7wN%nospam(a)see.signature>,
> nospam(a)see.signature (Richard Maine) writes:
>
> > I don't know why you
> > would think that forall was somehow inherently more optimizable than DO
> > loops.
>
> > DO is a looping construct. Forall and Where are array assignments. That
> > is a really fundamental difference.
>
> Maybe that is the reason he thought it would somehow be inherently more
> optimizable. DO implies doing things one after the other. If the
> compiler can prove to itself that parallel execution is OK, then it can
> do that optimisation. However, with FORALL and WHERE, there is no
> serial implication, so the compiler can perhaps optimise a bit more
> aggressively.

The semantics that we all wanted back in the 80s when the next
fortran revision (f88 :-) was being discussed was exactly what you
say above, a looping type construct in which the order of execution
is unspecified. That matched the vector hardware of the time.
Unfortunately, FORALL adds a little more, and it is that little bit
extra that gets in the way of optimization. In particular, the
problem seems to be the requirement that the statement is evaluated
"as if" everything on the right hand side is stored into a temporary
array of the appropriate size and then assigned to the left hand
side target array. If the compiler can't figure out that the
temporary array is unneeded and assigns results directly to the
target array (which seems to be somewhere between "always" and "too
often"), then it actually does allocate a temporary array to hold
the intermediate results. It is that allocation and deallocation
that seems to be the problem with optimization of FORALL.

The looping construct we wanted would have required the programmer
to make sure that the order of execution was not important.
Sometimes that is obvious for a statement or group of statements,
sometimes it isn't, so this was a potential source of coding errors
for programmers. FORALL does the arbitrary-order part, but it
provides the safety net of evaluation-before-assignment so that the
programmer cannot possible make a mistake. It is that safety net
that seems to be the cause of the optimization and performance
problems.

At this point, I don't know what the best solution is. Should a new
DOALL construct be added that works the right way? Should a
compiler directive be specified somehow in the standard to tell
FORALL to behave correctly? There doesn't really seem to be a good
solution to the problem. In hindsight, the FORALL semantics was a
bad choice, but once it was in the language it is practically
impossible to remove it, so we are stuck with it in the language
forever.

BTW, when FORALL was added, I thought it was what we all wanted. I
did not recognize that such a seemingly minor difference between
what we really wanted and what we got would have such major
consequences. As a result, I tend to avoid FORALL for all but
trivial statements. If a loop is important to performance, I tend
to use old fashioned DO loops, or a mixture of DO loops and simple
array syntax. Even if a FORALL behaves well on one compiler, you
can't rely on it working well on the next one.

$.02 -Ron Shepard