ifort volatile with O3 [Fortran]

Prev: PGFIO-F-217, FORTRAN STOP
Next: Reading lines in free format (*) with an occational string

From: Gary L. Scott on 1 May 2010 18:59

On 5/1/2010 5:26 PM, JB wrote:
> On 2010-05-01, glen herrmannsfeldt<gah(a)ugcs.caltech.edu> wrote:
>> According to the gcc (C compiler) man page that I have, it goes
>> up to -O3. It could have changed very recently, though.
>
> Looking at
>
> http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_2.html#SEC10
>
> gcc 2.95 didn't go any higher either. On the same website, you'll find
> the manuals for newer versions of gcc as well, which also don't go
> higher than -O3.
>
> From
>
> http://gcc.gnu.org/gcc-2.95/
>
> one can see that 2.95 was released in 1999.
>
>>>> So my thought is that only loads of VOLATILE variables
>>>> need to go in the prescribed order. Others can be moved
>>>> around, especially with -O3.
>>
>>> Why only loads? With VOLATILE, the compiler cannot say to itself, " ... I
>>> know that the value has not changed, and I wrote it into memory earlier, so
>>> I don't have to do a STORE again." The VOLATILE attribute says, "just do it
>>> when I tell you".
>>
>> Yes stores also. But if W is not VOLATILE, then I don't see
>> the requirements applied to it, even from an expression with V.
>>
>> Probably one should apply VOLATILE to other nearby variables
>> to be sure that the compiler does the right thing.
>
> Well, maybe. The exact semantics of reordering volatile accesses
> vs. regular loads or stores seems to be unclear. See e.g.
>
> http://stackoverflow.com/questions/2535148/volatile-qualifier-and-compiler-reorderings
>
> (which, per se, is about C/C++, but it seems Fortran volatile was
> modeled after C/C++, and the C/C++ world has much longer experience
> with volatiles, and moreso, compilers are likely to view Fortran/C/C++
> volatiles as equivalent in the optimizers)
>
The Fortran world (in extensions) had volatiles long before there was a
C language. It just didn't get standardized.

From: glen herrmannsfeldt on 1 May 2010 20:08

Ron Shepard <ron-shepard(a)nospam.comcast.net> wrote:
(snip, someone wrote)

>> >> call cpu_time(t1)
>> >> ... do some heavy processing in subroutines...
>> >> call cpu_time(t2)
(snip)

> Perhaps I was not clear. I was not saying that there should be
> restrictions on your declaration, I was pointing out that t1 and t2
> seem to be normal local variables that cannot be changed by anything
> other than the cpu_time() subroutine, so the VOLATILE attribute
> would not seem to have any effect on them.

(snip)

>> In other words, "volatile" is one way for the programmer to tell the
>> compiler, "don't try to be smart, just run the code in the sequence
>> written".

> I think you want subroutine calls to be executed in the sequence
> written in most cases. The only exception I can think of are pure
> subroutines that return constant values.

I may have posted this before, as it dates back to at least 1972,
documented by IBM and the optimizer in the Fortran H (and HX)
compilers:

DO 11 I=1,10
DO 12 J=1,10
9 IF(B(I).LT.0) GO TO 11
12 C(J)=SQRT(B(I))
11 CONTINUE

(Indenting added)

Fortran H figures out that the B(I), and so SQRT(B(I)) doesn't
change in the inner loop. (It seems to understand pure functions.)
It then moves the computation of SQRT(B(I)) out of the J loop.
There was no VOLATILE attribute, and it doesn't really apply here.
What does is reordering of expression evaluation. I presume
newer compilers will also recognize SQRT as pure, and move its
evaluation out of the loop. The problem is that the IF statement
is not moved out, and so the program can fail if B(I) is negative.

> As I said before (or tried
> to say, but I was unclear), this optimization where "heavy
> processing in subroutines" is done before the first call to
> subroutine cpu_time() rather than afterwards seems to be a potential
> bug regardless of the VOLATILE attribute on the variables t1 and t2.
> In fact, you might just as well argue that adding the VOLATILE
> attribute to the variables gives the compiler *more* freedom to
> rearrange the code, not less, since the values are less
> deterministic than otherwise.

Maybe it is some (buggy) interaction with PURE and VOLATILE.

>> Zero is predictable if the optimizer is used and "volatile" not used,

> I'm saying that the clause "and volatile not used" is superfluous in
> this sentence.

>> unless the compiler has the special knowledge that CPU_TIME is a
>> non-deterministic subroutine.

>> In the case that I cited, the compiler writers allowed VOLATILE to be
>> run through the compiler without complaint, but as a NOP.

It seems that the PL/I attribute ABNORMAL is also ignored by
most compilers. That is beside the fact that programs written
in PL/I can legally change variables at unusual times, such as
in ON units processing exceptions.

> That would be correct. In this situation, it is a NOP, the
> variables t1 and t2 cannot change in any way that the compiler
> cannot see. The declaration has nothing to do with the bug. You
> would want the values of t1 and t2 to be set correctly even without
> the VOLATILE attribute.
(snip)

-- glen

From: JB on 4 May 2010 11:02

On 2010-05-02, Gordon Sande <Gordon.Sande(a)EastLink.ca> wrote:
> On 2010-05-02 16:49:11 -0300, "Gary L. Scott" <garylscott(a)sbcglobal.net> said:
>> I've never quite understood how cache refreshing works. We once had a
>> situation in which the hardware interrupt that updates cache wasnt
>> working (connector problem) (it did refresh under software control at
>> times, but symptoms were strange (I could get it to update by hitting
>> enter key on any RS-232 attached terminal for example)). So, the
>> peripheral device was writing data to the physical memory address but
>> it wasn't being reflect in cache as accessed by the host. It doesn't
>> seem practical for there to be an absolutely accurate reflection of the
>> physical memory state in cache unless 1) with each write to physical
>> memory it activates a cache refresh or 2) with each read to cache, it
>> activates a cache refresh which seems to destroy the benefit of having
>> high speed cache. How is this coordinated? Are there special rules
>> for multiport shared memory that is also being cached (I would assume
>> so)? What does this question have to do with Fortran?...well er...my
>> recording and analysis tools were written in Fortran...er...ok, its off
>> topic, but related.
>
> Isn't that a standard topic called "cache coherency". One sees it in
> the fine print
> of things like cpu chips that support multiple processors which have
> "cache coherency"
> logic while those that don't don't have it and so are at a lower price
> point. In
> the Intel line this is gamer's fast chips vrs the Xeon line which is a
> lot pricier.
> For Intel Macs this means that Apple uses the Xeons for both single and
> multiple
> processor versions but only has to have one motherboard design. But I
> guess they
> get a pretty good bulk discount for only ordering one part number.

A good overview is "What every programmer should know about memory" at

http://people.redhat.com/drepper/cpumemory.pdf

In this case, the short version is, as Glen hinted at, that memory can
be configured, with the help of so-called Memory Type Range Registers
(MTRR), in different modes, such as Write-Back (WB), Write-Through
(WT), Write-Combining (WC), and Uncached (UC). WB is the "usual" type
of memory that one normally uses, and requires that all devices
accessing it partake in some for of cache-coherency protocol. AFAIK,
all current multiprocessor/multicore capable x86 processors support
cache coherency; after all, it's needed even within a processor so
that the different cores can keep their private caches coherent
(non-cache-coherent shared memory machines have existed in the past,
but they are definitely rare nowadays). What the more expensive
processors have is pins for a high-speed external bus capable of
running the cache-coherency protocol, making cache coherent machines
with multiple sockets possible.

Now, peripheral devices generally don't support cache coherency (there
might be exceptions, but I can't offhand recall any), and thus when
mapping device memory that memory range has to be configured in some
other mode than WB. WT and WC mode are AFAIK used in practice only by
video cards, so for other devices it's UC.

--
JB

First | Prev |
Pages: 1 2 3 4 5
Prev: PGFIO-F-217, FORTRAN STOP
Next: Reading lines in free format (*) with an occational string