From: Ulrich Eckhardt on
Leigh Johnston wrote:
> "Joshua Maurice" <joshuamaurice(a)gmail.com> wrote in message
> news:900580c6-c55c-46ec-b5bc-1a9a2f0d76f5(a)w9g2000prb.googlegroups.com...
>>> Obviously the volatile keyword may not cause a memory barrier
>>> instruction to be emitted but this is a side issue. The combination
>>> of a memory barrier and volatile makes multi-threaded code work.
>>
>> No. Memory barriers when properly used (without the volatile keyword)
>> are sufficient.
>>
>
> No. Memory barriers are not sufficient if your optimizing compiler is
> caching the value in a register: the CPU is not aware that the register is
> referring to data being revealed by the memory barrier.

Actually, memory barriers in my understanding go both ways. One is to tell
the CPU that it must not cache/optimise/reorder memory accesses. The other
is to tell the compiler that it must not do so either. You can add the
former via libraries to an existing compiler, but you can't do the latter
without compiler support. That said, volatile often had the same effect as
part 2 of the puzzle in legacy compilers, so smart hackers simply used
that.

> I never said volatile was a panacea but is something that is probably
> required when using an optimizing compiler. If your C++ compiler has
> memory barrier intrinsics it might be able to ensure volatile is not
> required but this is also non-standard.

If your compiler is aware of multithreading, you don't need volatile. If it
isn't, even volatile doesn't guarantee you that it will work. At the very
best, using volatile for the compiler and some other instructions for the
CPU works as a workaround to get a not thread-aware compiler to play nice.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: red floyd on
On Mar 19, 2:06 am, "Leigh Johnston" <le...(a)i42.co.uk> wrote:
> That was my point, volatile whilst not a solution in itself is a "part" of a
> solution for multi-threaded programming when using a C++ (current standard)
> optimizing compiler:
>
> thread A:
> finished = false;
> spawn_thread_B();
> while(!finished)
> {
> /* do work */
>
> }
>
> thread B:
> /* do work */
> finished = true;
>
> If finished is not volatile and compiler optimizations are enabled thread A
> may loop forever.

Agreed. I've seen this in non-threaded code with memory-mapped I/O.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Tony Jorgenson on
> >You are incorrect to claim that volatile as defined by the current C++
> >standard has no use in multi-threaded programming. Whilst volatile does not
> >guarantee atomicity nor memory ordering across multiple threads the fact
> >that it prevents the compiler from caching vales in registers is useful and
> >perhaps essential.

You seem to be saying that volatile can be useful for multi-threaded
code?
(See questions below)

> Yes, volatile does that. Unfortunately, that is necessary but not
> sufficient for inter-thread communication to work correctly. Volatile
> is for hardware access; std::atomic<T> is for multithreaded code
> synchronized without mutexes.

I understand that volatile does not guarantee that the order of memory
writes performed by one thread are seen in the same order by another
thread doing memory reads of the same locations. I do understand the
need for memory barriers (mutexes, atomic variables, etc) to guarantee
order, but there are still 2 questions that have never been completely
answered, at least to my satisfaction, in all of the discussion I have
read on this group (and the non moderated group) on these issues.

First of all, I believe that volatile is supposed to guarantee the
following:

Volatile forces the compiler to generate code that performs actual
memory reads and writes rather than caching values in processor
registers. In other words, I believe that there is a one-to-one
correspondence between volatile variable reads and writes in the
source code and actual memory read and write instructions executed by
the generated code. Is this correct?

Question 1:
My first question is with regard to using volatile instead of memory
barriers in some restricted multi-threaded cases. If my above
statements are correct, is it possible to use _only_ volatile with no
memory barriers to signal between threads in a reliable way if only a
single word (perhaps a single byte) is written by one thread and read
by another?

Question 1a:
First of all, please correct me if I am wrong, but I believe volatile
_must_always_ work as described above on any single core CPU. One CPU
means one cache (or one hierarchy of caches) meaning one view of
actual memory through the cache(s) that the CPU sees, regardless of
which thread is running. Is this much correct for any CPU in
existence? If not please mention a situation where this is not true
(for single core).

Question 1b:
Secondly, the only way I could see this not working on a multi-core
CPU, with individual caches for each core, is if a memory write
performed by one CPU is allowed to never be updated in the caches of
other CPU cores. Is this possible? Are there any multi-core CPUs that
allow this? Doesn�t the MESI protocol guarantee that eventually memory
cached in one CPU core is seen by all others? I know that there may be
delays in the propagation from one CPU cache to the others, but
doesn�t it eventually have to be propagated? Can it be delayed
indefinitely due to activity in the cores involved?

Question 2:
My second question is with regard to if volatile is necessary for
multi-threaded code in addition to memory barriers. I know that it has
been stated that volatile is not necessary in this case, and I do
believe this, but I don�t completely understand why. The issue as I
see it is that using memory barriers, perhaps through use of mutex OS
calls, does not in itself prevent the compiler from generating code
that caches non-volatile variable writes in registers. I have heard it
written in this group that posix, for example, supports additional
guarantees that make mutex lock/unlock (for example) sufficient for
correct inter-thread communication through memory without the use of
volatile. I believe I read here once (from James Kanze I believe) that
�volatile is neither sufficient nor necessary for proper multi-
threaded code� (quote from memory). This seems to imply that posix is
in cahoots with the compiler to make sure that this works. If you add
mutex locks and unlocks (I know RAII, so please don�t derail my
question) around some variable reads and writes, how do the mutex
calls force the compiler to generate actual memory reads and writes in
the generated code rather than register reads and writes?

I understand that compilation optimization affects these issues, but
if I optimize the hell out of my code, how do posix calls (or any
other OS threading calls) force the compiler to do the right thing? My
only conjecture is that this is just an accident of the fact that the
compiler can�t really know what the mutex calls do and therefore the
compiler must make sure that all globally accessible variables are
pushed to memory (if they are in registers) in case _any_ called
function might access them. Is this what makes it work? If not, then
how do mutex call guarantee the compiler doesn�t cache data in
registers, because this would surely make the mutexes worthless
without volatile (which I know from experience that they are not).


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andy Venikov on
Andrei Alexandrescu wrote:

<snip>
> But by and large that's not sufficient to make sure things do work, and
> they will never work portably. Here's a good article on the topic:
>
> http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
>
>
> entitled eloquently "Volatile: Almost Useless for Multi-Threaded
> Programming". And here's another entitled even stronger 'Why the
> "volatile" type class should not be used':
>
> http://kernel.org/doc/Documentation/volatile-considered-harmful.txt
>
> The presence of the volatile qualifier in Loki is at best helpful but
> never a guarantee of correctness. I recommend Scott and my article on
> the topic, which was mentioned earlier in this thread:
>
> http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
>
> Bottom line: using volatile with threads is almost always a red herring.
>
>
> Andrei
>

Not in my wildest dreams would I think that I'd ever disagree with you,
but here goes....

While it's true that there's a wild-spread misconception that volatile
is a panacea for multi-threading issues and it's true that by itself it
won't do anything to make multi-threaded programs safe, it's not correct
to say that it's totally useless for threading issues as the
"volatile-considered-harmful.txt" article is trying to implicate.

In short, volatile is never sufficient, but often necessary to solve
certain multi-threading problems. These problems (like writing lock-free
algorithms) try to prevent execution statement re-ordering. Re-ordering
can happen in two places: in the hardware, which is mitigated with
memory fences; and in the compiler, which is mitigated with volatile.
It's true that depending on the memory fence library that you use, the
compiler won't move the code residing inside the fences to the outside,
but it's not always the case. If you use raw asm statements for example
(even if you add "volatile" to the asm keyword) your non-volatile
variable is not guaranteed to stay inside the fenced region unless you
declare it volatile.

The advent of C++0x may well render it useless for multi-threading, but
up until now it has been necessary.


Thanks,
Andy.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andy Venikov on
Joshua Maurice wrote:

>Leigh Johnston wrote:
>> Obviously the volatile keyword may not cause a memory barrier instruction to
>> be emitted but this is a side issue. The combination of a memory barrier
>> and volatile makes multi-threaded code work.
>

> No. Memory barriers when properly used (without the volatile keyword)
> are sufficient.

Sorry Joshua, but I think it's a wrong, or at least an incomplete,
statement.

It all depends on how memory barriers/fences are implemented. In the
same way that C++ standard doesn't talk about threads it doesn't talk
about memory fences. If a memfence call is implemented as a library
call, then yes, you will in essence get a compiler-level fence directive
as none of the compilers I know of are allowed to move the code across a
call to a library. But oftentimes memfences are implemented as macros
that expand to inline assembly. If you don't use volatile then nothing
will tell the compiler that it can't optimize the code and move the
read/write across the macroized memfence. It is especially true on
platforms that don't actually need hardware memfences (like x86) since
in those cases calls to macro memfences will expand to nothing at all
and then you will have nothing in your code that tells anything about a
code-migration barrier.

So is volatile sufficient - absolutely not. Portable? - hardly.
Is it necessary in certain cases - absolutely.


Thanks,
Andy.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]