ifort volatile with O3 [Fortran]

Prev: PGFIO-F-217, FORTRAN STOP
Next: Reading lines in free format (*) with an occational string

From: Ron Shepard on 1 May 2010 15:44

In article <hrf7tm$ir4$1(a)news.eternal-september.org>,
mecej4 <mecej4_no_spam(a)operamail.com> wrote:

> On 4/30/2010 12:06 PM, Ron Shepard wrote:
> > In article<hregk7$aqi$1(a)news.eternal-september.org>,
> > mecej4<mecej4_no_spam(a)operamail.com> wrote:
> >
> >> Some months back, I had a program which did something of this sort:
> >>
> >> real, volatile :: t1,t2
> >> ...
> >> call cpu_time(t1)
> >> ... do some heavy processing in subroutines...
> >> call cpu_time(t2)
> >> ...
> >> write(*,*)t2-t1
> >>
> >> and found one compiler's optimizer being smart enough to observe that t1
> >> was not referenced in the intermediate code, and my program always ran
> >> in ZERO time, never mind the "volatile" attribute !
> >
> > The t1 and t2 variables are local, so I don't think volatile would
> > have any effect on this. Maybe I'm wrong, but usually volatile
> > variables are in special common blocks or in shared memory or
> > something, not local variables.
>
> Why restrict "volatile" to changes to the variable caused in other
> routines or computer processes?

Perhaps I was not clear. I was not saying that there should be
restrictions on your declaration, I was pointing out that t1 and t2
seem to be normal local variables that cannot be changed by anything
other than the cpu_time() subroutine, so the VOLATILE attribute
would not seem to have any effect on them.

> "Volatile" should include all causes of the value of a variable
> changing, in ways that the processor cannot always comprehend, such as
> the ticking of a clock, the calling of an RNG, I/O, unplugging the
> computer, etc.

Ok, but none of those apply to those two local variables t1 and t2.

> In other words, "volatile" is one way for the programmer to tell the
> compiler, "don't try to be smart, just run the code in the sequence
> written".

I think you want subroutine calls to be executed in the sequence
written in most cases. The only exception I can think of are pure
subroutines that return constant values. As I said before (or tried
to say, but I was unclear), this optimization where "heavy
processing in subroutines" is done before the first call to
subroutine cpu_time() rather than afterwards seems to be a potential
bug regardless of the VOLATILE attribute on the variables t1 and t2.
In fact, you might just as well argue that adding the VOLATILE
attribute to the variables gives the compiler *more* freedom to
rearrange the code, not less, since the values are less
deterministic than otherwise.

> Zero is predictable if the optimizer is used and "volatile" not used,

I'm saying that the clause "and volatile not used" is superfluous in
this sentence.

> unless the compiler has the special knowledge that CPU_TIME is a
> non-deterministic subroutine.
>
> In the case that I cited, the compiler writers allowed VOLATILE to be
> run through the compiler without complaint, but as a NOP.

That would be correct. In this situation, it is a NOP, the
variables t1 and t2 cannot change in any way that the compiler
cannot see. The declaration has nothing to do with the bug. You
would want the values of t1 and t2 to be set correctly even without
the VOLATILE attribute.

> They fixed the
> problem in a later version. Until they did so, I kept the timing calls
> in a separate subroutine which I compiled with -O0 .

With the fixed version, does the code now work correctly without the
VOLATILE attribute? That is what I would have been reporting as a
bug in the first place. The VOLATILE attribute is just a red
herring (a distraction) from the real bug.

I would say that the bug appears to be that the compiler was
treating the cpu_time() intrinsic as if it were a pure subroutine
that returns a constant value. The compiler should be allowed to
rearrange calls to such pure functions and subroutines, including
those with intent(out) arguments such as this situation. Suppose
instead of cpu time you were asking for the model number of the CPU
(or some other similar value that would not change during the run).
Then the subroutine call could indeed be moved around anywhere and
it would always return the same value and it would not matter
exactly when the call was executed relative to other code in the
program. In this case, the VOLATILE attribute on the actual
argument still would not matter, it would be essentially a NOP as
you say, and the subroutine would always return the same value
regardless. But cpu time is not like that, it does not return the
same value every time it is invoked.

$.02 -Ron Shepard

From: Ron Shepard on 1 May 2010 16:07

In article <hrh56p$80i$1(a)naig.caltech.edu>,
glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:

[...]
> > VOLATILE effectively prohibits that by
> > requiring that at least some loads go in a prescribed
> > order.
>
> So my thought is that only loads of VOLATILE variables
> need to go in the prescribed order. Others can be moved
> around, especially with -O3.

The problem with all this is that fortran does not really have a
notion of "prescribed order". Other languages, such as C, do have
this notion (sequence points). Instead, fortran uses parentheses to
specify the mathematical order that expressions are evaluated, which
can be (and usually is) different from the order that the expression
is evaluated within the hardware. Without the parentheses, fortran
is even allowed to rearrange expressions into mathematically
equivalent forms and evaluate them in that way. Oddly enough, the
languages that have sequence points often do not respect parentheses
in mathematical expressions (this was an annoying "feature" in C
until one of the later revisions of the language, c89 or c99, I
forget which).

This is all reflected also at the hardware level in modern cpus.
Now, not only are instructions reordered on-the-fly in order to make
maximal use of functional units, but different instruction sequences
from entirely different processes are intermixed at the hardware
level <http://en.wikipedia.org/wiki/Hyper-threading>.

$.02 -Ron Shepard

From: Ron Shepard on 1 May 2010 16:16

In article <hrhh3f$57e$1(a)news.eternal-september.org>,
mecej4 <mecej4.nyetspam(a)operamail.com> wrote:

> > Remember, the OP did ask about optimization level 3.
> > As far back as Fortran H Extended even OPT=2 moved assignments
> > out of loops, and OPT=3 asked for the best the compiler could do.
> > (I haven't known any with levels higher than 3.)
>
> Sun Fortran goes up to -O5. Some versions of GCC allowed -O9. You may have
> had only mainframe compilers in mind, but the OP did not indicate any
> specific system.

Yes, and the volume knob on my amplifier goes up to 11.

$.02 -Ron Shepard :-)

From: JB on 1 May 2010 18:26

On 2010-05-01, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:
> According to the gcc (C compiler) man page that I have, it goes
> up to -O3. It could have changed very recently, though.

Looking at

http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_2.html#SEC10

gcc 2.95 didn't go any higher either. On the same website, you'll find
the manuals for newer versions of gcc as well, which also don't go
higher than -O3.

From

http://gcc.gnu.org/gcc-2.95/

one can see that 2.95 was released in 1999.

>>> So my thought is that only loads of VOLATILE variables
>>> need to go in the prescribed order. Others can be moved
>>> around, especially with -O3.
>
>> Why only loads? With VOLATILE, the compiler cannot say to itself, " ... I
>> know that the value has not changed, and I wrote it into memory earlier, so
>> I don't have to do a STORE again." The VOLATILE attribute says, "just do it
>> when I tell you".
>
> Yes stores also. But if W is not VOLATILE, then I don't see
> the requirements applied to it, even from an expression with V.
>
> Probably one should apply VOLATILE to other nearby variables
> to be sure that the compiler does the right thing.

Well, maybe. The exact semantics of reordering volatile accesses
vs. regular loads or stores seems to be unclear. See e.g.

http://stackoverflow.com/questions/2535148/volatile-qualifier-and-compiler-reorderings

(which, per se, is about C/C++, but it seems Fortran volatile was
modeled after C/C++, and the C/C++ world has much longer experience
with volatiles, and moreso, compilers are likely to view Fortran/C/C++
volatiles as equivalent in the optimizers)

--
JB

From: mecej4 on 1 May 2010 18:58

Ron Shepard wrote:

> In article <hrf7tm$ir4$1(a)news.eternal-september.org>,
> mecej4 <mecej4_no_spam(a)operamail.com> wrote:
>
>> On 4/30/2010 12:06 PM, Ron Shepard wrote:
>> > In article<hregk7$aqi$1(a)news.eternal-september.org>,
>> > mecej4<mecej4_no_spam(a)operamail.com> wrote:
>> >
>> >> Some months back, I had a program which did something of this sort:
>> >>
>> >> real, volatile :: t1,t2
>> >> ...
>> >> call cpu_time(t1)
>> >> ... do some heavy processing in subroutines...
>> >> call cpu_time(t2)
>> >> ...
>> >> write(*,*)t2-t1
>> >>
>> >> and found one compiler's optimizer being smart enough to observe that
>> >> t1 was not referenced in the intermediate code, and my program always
>> >> ran in ZERO time, never mind the "volatile" attribute !
>> >
>> > The t1 and t2 variables are local, so I don't think volatile would
>> > have any effect on this. Maybe I'm wrong, but usually volatile
>> > variables are in special common blocks or in shared memory or
>> > something, not local variables.
>>
>> Why restrict "volatile" to changes to the variable caused in other
>> routines or computer processes?
>
> Perhaps I was not clear. I was not saying that there should be
> restrictions on your declaration, I was pointing out that t1 and t2
> seem to be normal local variables that cannot be changed by anything
> other than the cpu_time() subroutine, so the VOLATILE attribute
> would not seem to have any effect on them.
>
>> "Volatile" should include all causes of the value of a variable
>> changing, in ways that the processor cannot always comprehend, such as
>> the ticking of a clock, the calling of an RNG, I/O, unplugging the
>> computer, etc.
>
> Ok, but none of those apply to those two local variables t1 and t2.
>
>> In other words, "volatile" is one way for the programmer to tell the
>> compiler, "don't try to be smart, just run the code in the sequence
>> written".
>
> I think you want subroutine calls to be executed in the sequence
> written in most cases. The only exception I can think of are pure
> subroutines that return constant values. As I said before (or tried
> to say, but I was unclear), this optimization where "heavy
> processing in subroutines" is done before the first call to
> subroutine cpu_time() rather than afterwards seems to be a potential
> bug regardless of the VOLATILE attribute on the variables t1 and t2.
> In fact, you might just as well argue that adding the VOLATILE
> attribute to the variables gives the compiler *more* freedom to
> rearrange the code, not less, since the values are less
> deterministic than otherwise.
>
>> Zero is predictable if the optimizer is used and "volatile" not used,
>
> I'm saying that the clause "and volatile not used" is superfluous in
> this sentence.
>
>> unless the compiler has the special knowledge that CPU_TIME is a
>> non-deterministic subroutine.
>>
>> In the case that I cited, the compiler writers allowed VOLATILE to be
>> run through the compiler without complaint, but as a NOP.
>
> That would be correct. In this situation, it is a NOP, the
> variables t1 and t2 cannot change in any way that the compiler
> cannot see. The declaration has nothing to do with the bug. You
> would want the values of t1 and t2 to be set correctly even without
> the VOLATILE attribute.
>
>> They fixed the
>> problem in a later version. Until they did so, I kept the timing calls
>> in a separate subroutine which I compiled with -O0 .
>
>
> With the fixed version, does the code now work correctly without the
> VOLATILE attribute? That is what I would have been reporting as a
> bug in the first place. The VOLATILE attribute is just a red
> herring (a distraction) from the real bug.
>
> I would say that the bug appears to be that the compiler was
> treating the cpu_time() intrinsic as if it were a pure subroutine
> that returns a constant value. The compiler should be allowed to
> rearrange calls to such pure functions and subroutines, including
> those with intent(out) arguments such as this situation. Suppose
> instead of cpu time you were asking for the model number of the CPU
> (or some other similar value that would not change during the run).
> Then the subroutine call could indeed be moved around anywhere and
> it would always return the same value and it would not matter
> exactly when the call was executed relative to other code in the
> program. In this case, the VOLATILE attribute on the actual
> argument still would not matter, it would be essentially a NOP as
> you say, and the subroutine would always return the same value
> regardless. But cpu time is not like that, it does not return the
> same value every time it is invoked.
>
> $.02 -Ron Shepard

A thoughtful discussion. Thanks!

-- mecej4

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: PGFIO-F-217, FORTRAN STOP
Next: Reading lines in free format (*) with an occational string