From: James Kanze on
On Mar 18, 10:32 pm, Joshua Maurice <joshuamaur...(a)gmail.com> wrote:
> On Mar 17, 8:16 pm, "Leigh Johnston" <le...(a)i42.co.uk> wrote:

[...]
> I can't recall for the life of me where I read it, but I seem
> to recall Andrei admitting that he misunderstand volatile, and
> learned of the error of his ways, possibly in conjunction with
> "C++ And The Perils Of Double-Checked Locking".

It was in a discussion in this group, although I don't remember
exactly when. The curious thing is that Andrei's techniques
actually work, not because of any particular semantics of
volatile, but because of the way it works in the type system;
its use caused type errors (much like the one the original
poster saw) if you attempted to circumvent the locking.

The misunderstanding of volatile is apparently widespread. To
the point that Microsoft actually proposed giving it the
required semantics to the standards committee. That didn't go
over very well, since it caused problems with the intended use
of volatile. The Microsoft representative (Herb Sutter, as it
happens) immediately withdrew the proposal, but I think they
intend to implement these semantics in some future compiler, or
perhaps have already implemented them in VC10. In defense of
the Microsoft proposal: the proposed semantics do make sense if
you restrict yourself to the world of application programs under
general purpose OS's, like Windows or Unix. And the semantics
actually implemented by volatile in most other compilers, like
g++ or Sun CC, are totally useless, even in the contexts for
which volatile was designed. At present, it's probably best to
class volatile in the same category as export: none of the
widely used compilers implement it to do anything useful.

[...]
> B- repeat my (perhaps unfounded) second hand information that
> volatile in fact on most current implementations does not make
> a global ordering of reads and writes.

Independantly of what the standard says (and it does imply
certain guarantees, such as would be necessary, for example, to
use it for memory mapped IO), volatile has no practical
semantics in most current compilers (Sun CC, g++, VC++, at least
up through VC8.0).

--
James Kanze

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Leigh Johnston on


"Chris Vine" <chris(a)cvine--nospam--.freeserve.co.uk> wrote in message
news:ceum77-go6.ln1(a)cvinex--nospam--x.freeserve.co.uk...
> On Tue, 23 Mar 2010 08:05:28 CST
> "Leigh Johnston" <leigh(a)i42.co.uk> wrote:
> [snip]
>> Sometimes you have to use common sense:
>>
>> thread A:
>> finished = false;
>> spawn_thread_B();
>> while(!finished)
>> {
>> /* do work */
>> }
>>
>> thread B:
>> /* do work */
>> finished = true;
>>
>> If finished is not volatile and compiler optimizations are enabled
>> thread A may loop forever.
>>
>> The behaviour of optimizing compilers in the real world can make
>> volatile necessary to get correct behaviour in multi-threaded
>> designs. You don't always have to use a memory barriers or a mutexes
>> when performing an atomic read of some state shared by more than one
>> thread.
>
> It is never "necessary" to use the volatile keyword "in the real world"
> to get correct behaviour because of "the behaviour of optimising
> compilers". If it is, then the compiler does not conform to the
> particular standard you are writing to. For example, all compilers
> intended for POSIX platforms which support pthreads have a
> configuration flag (usually "-pthread") which causes the locking
> primitives to act also as compiler barriers, and the compiler would be
> non-conforming if it did not both provide this facility and honour it.
>
> Of course, there are circumstances when you can get away with the
> volatile keyword, such as the rather contrived example you have given,
> but in that case it is pretty well pointless because making the
> variable volatile as opposed to using normal synchronisation objects
> will not improve efficiency. In fact, it will hinder efficiency if
> Thread A has run work before thread B, because thread A will depend on a
> random future event on multi-processor systems, namely when the caches
> happen to synchronise to achieve memory visibility, in order to proceed.
>
> Chris
>

It is not a contrived example, I have the following code in my codebase
which is similar:
.....
lock();
while (iSockets.empty() && is_running())
{
unlock();
Sleep(100);
if (!is_running())
return;
lock();
}
.....

is_running() is an inline member function which returns the value of a
volatile member variable and shouldn't require a lock to query as it is
atomic on the platform I target (x86). It makes sense for this platform and
compiler (VC++) that I use volatile. Admittedly I could use an event/wait
primitive instead but that doesn't make the above code wrong for the
particular use-case in question. I agree that for other platforms and
compilers this might be different. From what I understand and I agree with
the advent of C++0x should see such volatiles disappear in favour of
std::atomic<>. Not everyone in the real world is using C++0x as the
standard has not even been published yet.

/Leigh


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
On Mar 20, 7:12 am, Ulrich Eckhardt <eckha...(a)satorlaser.com> wrote:
> Leigh Johnston wrote:
> > "Joshua Maurice" <joshuamaur...(a)gmail.com> wrote in message
> >news:900580c6-c55c-46ec-b5bc-1a9a2f0d76f5(a)w9g2000prb.googlegroups.com...
> >>> Obviously the volatile keyword may not cause a memory
> >>> barrier instruction to be emitted but this is a side
> >>> issue. The combination of a memory barrier and volatile
> >>> makes multi-threaded code work.

> >> No. Memory barriers when properly used (without the
> >> volatile keyword) are sufficient.

> > No. Memory barriers are not sufficient if your optimizing
> > compiler is caching the value in a register: the CPU is not
> > aware that the register is referring to data being revealed
> > by the memory barrier.

> Actually, memory barriers in my understanding go both ways.
> One is to tell the CPU that it must not cache/optimise/reorder
> memory accesses. The other is to tell the compiler that it
> must not do so either.

Actually, as far as standard C++ is concerned, memory barriers
don't exist, so it's difficult to talk about them. In practice,
there are three ways to obtain them:

-- Inline assembler. See your compiler manual with regards to
what it guarantees; the standard makes no guarantees here.
A conforming implementation can presumably do anything it
wants with the inline assembler, including move it over an
access to a volatile variable. From a QoI point of view,
either 1) the compiler assumes nothing about the assembler,
considers that it might access any accessible variable, and
ensures that the actual semantics of the abstract machine
correspond to those specified in the standard, 2) reads and
interprets the inline assembler, and so recognizes a fence
or a memory barrier, and behaves appropriately, or 3)
provides some means of annotating the inline assember to
tell the compiler what it can or cannot do.

-- Call a function written in assembler. This really comes
down to exactly the same as inline assembler, except that
it's a lot more difficult for the compiler to implement the
alternatives 2 or 3. (All compilers I know implement 1.)

-- Call some predefined system API. In this case, the
requirements are defined by the system API. (This is the
solution used by Posix, Windows and C++0x.)

--
James Kanze

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Joshua Maurice on
On Mar 23, 7:05 am, "Leigh Johnston" <le...(a)i42.co.uk> wrote:
> Sometimes you have to use common sense:
>
> thread A:
> finished = false;
> spawn_thread_B();
> while(!finished)
> {
> /* do work */
>
> }
>
> thread B:
> /* do work */
> finished = true;
>
> If finished is not volatile and compiler optimizations are enabled thread A
> may loop forever.
>
> The behaviour of optimizing compilers in the real world can make volatile
> necessary to get correct behaviour in multi-threaded designs. You don't
> always have to use a memory barriers or a mutexes when performing an atomic
> read of some state shared by more than one thread.

No. You must use proper synchronization to guarantee a "happens-
before" relationship, and volatile does not do that portably. Without
the proper synchronization, the write to a variable in one thread,
even a volatile write, may never become visible to another thread,
even by a volatile read, on some real world systems.

"Common sense" would be to listen to the people who wrote the
compilers, such as Intel and gcc, to listen to the writers of the
standard who influence the compiler writers, such as the C++ standards
committee and their website, to listen to well respected experts who
have studied these things in far greater detail than you and I, to
read old papers and correspondence to understand the intention of
volatile (which does not include threading), etc. It is not "common
sense" to blithely ignore all of this and read into an ambiguous
definition in an unrelated standard to get your desired properties (C+
+03 standard does not mention threads so it's not the relevant
standard to look at); it's actually quite unreasonable to do so.

Let me put it like this. Either you're writing on a thread-aware
compiler or you are not. On a thread-aware compiler, you can use the
standardized threading library, which will probably look a lot like
POSIX, WIN32, Java, and C++0x. It will include mutexes and condition
variables (or some rough equivalent, stupid WIN32), and possibly
atomic increments, atomic test and swap, etc. It will define a memory
model roughly compatible with the rest and include a strong equivalent
of Java's "happens-before" relationship. In which case, volatile has
no use (for threading) because the compiler is aware of the
abstractions and will honor them, including the optimizer. In the
other case, when you're using threads on a not-threads-aware compiler,
you're FUBAR. There are so many little things to get right to produce
correct assembly for threads that if the compiler is not aware of it,
even the most innocuous optimization, or even register allocation, may
entirely break your code. volatile may produce the desired result, and
it may not. This is entirely system dependent as you are not coding to
any standard, and thus not portable by any reasonable definition of
portable.

Also note that your (incorrect) reading of the C and C++ standards
makes no mention of a guarantee about reorderings between non-volatile
and volatile, so if thread B in your example changed shared state,
these writes may be moved after the write to "finished", so thread A
could see the write to "finished" but not see the changes to the
shared state, or perhaps only a random portion of the writes to the
shared state, a inconsistent shared state, which is begging for a
crash. So, you could fully volatile qualify all of the shared state,
leading to a huge performance hit, or you could just use the
standardized abstractions which are guaranteed to work, which will
actually work, which will run much faster, and which are portable.

There seems to persist this "romanticized" ideal of "volatile" as
somehow telling the compiler to "shut up" and "just do it", a
sentiment noted by Andrei and Scott in "C++ And The Perils Of Double-
Checked Locking". Please, go read the paper and its cited sources.
They explain it so much better than I could. I'll link to it again
here:
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
On Mar 22, 11:22 pm, "Bo Persson" <b...(a)gmb.dk> wrote:
> Leigh Johnston wrote:
> > "Andy Venikov" <swojchelo...(a)gmail.com> wrote in message
> >news:ho5s8u$52u$1(a)news.eternal-september.org...
> >>> I still must ask, really? That would mean that all shared
> >>> state must be volatile qualified, including internal class
> >>> members for shared data. Wouldn't that big a huge
> >>> performance hit when the compiler can't optimize any of
> >>> that? Could you even use prebuilt classes (which usually
> >>> don't have volatile overloads) in the shared data, like
> >>> say std::string, std::vector, std::map, etc.?

> >> Not at all!
> >> Most multi-threading issues are solved with mutexes,
> >> semaphores, conditional variables and such. All of these
> >> are library calls. That means that using volatile in those
> >> cases is not necessary. It's only when you get into more
> >> esotheric parallel computing problems where you'd like to
> >> avoid a heavy-handed approach of mutexes that you enter the
> >> realm of volatile. In normal multi-threading solved with
> >> regular means there is really no reason to use volatile.

> > Esoteric? I would have thought independent correctly
> > aligned (and therefore atomic) x86 variable reads
> > (fundamental types) without the use of a mutex are not
> > uncommon making volatile not uncommon also on that platform
> > (on VC++) at least. I have exactly one volatile in my
> > entire codebase and that is such a variable. From MSDN
> > (VC++) docs:
> > "The volatile keyword is a type qualifier used to declare
> > that an object can be modified in the program by something
> > such as the operating system, the hardware, or a
> > concurrently executing thread."

> > That doesn't seem esoteric to me! :)

> The esoteric thing is that this is a compiler specific
> extension, not something guaranteed by the language. Currently
> there are no threads at all in C++.

> Note that the largest part of the MSDN document is clearly
> marked "Microsoft Specific". It is in that part the release
> and aquire semantics are defined.

Note too that at least through VC8.0, regardless of the
documentation, VC++ didn't implement volatile in a way that
would allow it to be used effectively for synchronization on a
multithreaded Windows platform. For some of the more performing
machines, you need a fence, or at least some use of the lock
prefex, and VC++ didn't generate these.

Microsoft has expressed its intent to implement these extended
semantics for volatile, however.

--
James Kanze

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]