From: James Kanze on
On Mar 29, 7:46 am, Herb Sutter <herb.sut...(a)gmail.com> wrote:
> On Sun, 28 Mar 2010 15:25:46 CST, James Kanze <james.ka...(a)gmail.com>
> wrote:

> >On Mar 26, 12:33 am, Herb Sutter <herb.sut...(a)gmail.com> wrote:
> >> Please remember this: Standard ISO C/C++ volatile is useless
> >> for multithreaded programming. No argument otherwise holds
> >> water; at best the code may appear to work on some
> >> compilers/platforms, including all attempted counterexamples
> >> I've seen on this thread.

> >I agree with you in principle, but do be careful as to how
> >you formulate this. Standard ISO C/C++ is useless for
> >multithreaded programming, at least today. With or without
> >volatile. And in Standard ISO C/C++, volatile is useless for
> >just about anything;

> All of the above is still true in draft C++0x and C1x, both of
> which have concurrency memory models, threads, and mutexes.

Huh? "Standard ISO C/C++" is useless for multithreaded
programming today because as far as the standard is concerned,
as soon as there's more than one thread, you have undefined
behavior. Unless things changed a lot while I wasn't looking
(I've not been able to follow things too closely lately), C++0x
will define threading, and offer some very useful primitives for
multithreaded code. Considerably more than boost::thread, which
was already very, very useful.

> >it was always intended to be mainly a hook for implementation
> >defined behavior, i.e. to allow things like memory-mapped IO
> >while not imposing excessive loss of optimizing posibilities
> >everywhere.

> Right. And is therefore (still) deliberately underspecified.

As it should be.

> >In theory, an implementation could define volatile in a way that
> >would make it useful in multithreading---I think Microsoft once
> >proposed doing so in the standard.

> Yes, back in 2006 I briefly agreed with that before realizing
> why it was wrong (earlier in this thread you correctly said I
> supported it and then stopped doing so).

> >In my opinion, this sort of violates the original intention
> >behind volation, which was that volatile is applied to a
> >single object, and doesn't affect other objects in the code.
> >But it's certainly something you could argue both ways.

> No, it's definitely wrong.

Well, I basically agree with you there. But there are degrees
of wrong: it's not wrong in the same sense as claiming that an
mfence instruction doesn't affect cache synchronization on an
Intel is wrong.

--
James Kanze

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Herb Sutter on
On Mon, 29 Mar 2010 15:22:35 CST, James Kanze <james.kanze(a)gmail.com>
wrote:
>while I've yet to find an exact specification for Windows, the
>implementation of volatile in VC++ 8.0 doesn't do enough to make
>it useful in threading, and Microsoft (in the voice of Herb
>Sutter) has said here that it isn't useful (although I don't
>know if Herb is speaking for Microsoft here, or simply
>expressing his personal opinion).

Not exactly, actually what I said was that in VC++ targeting x86/x64,
volatile was strengthened to add most (not all) ordering guarantees of
an atomic<>. It is enough to make most patterns safe including
Double-Checked Locking and reference counting, but not enough to make
examples like Dekker's safe.

But in retrospect strengthening volatile in this way to make it
useful for some/most inter-thread communication was a mistake, and has
several drawbacks: a) it does so at the cost of making volatile writes
slower which means some pessimization of 'normal' uses of volatile for
hardware access; b) it doesn't work for some thread communication
techniques that rely on a global ordering of independent reads of
independent writes to different objects; c) it doesn't make the
variables atomic so to make them useful they have to be aligned
variables of a type and size that happens to be naturally atomic on
the target processor; and of course d) it's not portable.

The right solution is to leave volatile alone and add std::atomic<>.
That's what C++0x does. Longer-term, that's what Visual C++ will go to
and recommend as well, with two caveats: first, now that we've shipped
volatile this way we'll probably have to keep supporting the
strengthened semantics for a long time (possibly forever?) to preserve
code that relies on those semantics (alas, the price of shipping
something is trying hard to not break customers that use it); and
second, atomic<> didn't make it into VS 2010 and so will have to await
a later release.

>On Mar 29, 7:45 am, "Leigh Johnston" <le...(a)i42.co.uk> wrote:
>> Performance is often cited as another reason to not use
>> volatile however

No "however" needed, that cited reason is correct. Volatile disables
optimizations that would be legal for an atomic<>. A quick example is
combining/eliding writes (e.g., v = 1; v = 2; can't be transformed to
v = 2;, but a = 1; a = 2; can be transformed to a = 2;). Another is
combining/eliding reads.

Herb


---
Herb Sutter (herbsutter.wordpress.com) (www.gotw.ca)

Convener, SC22/WG21 (C++) (www.gotw.ca/iso)
Architect, Visual C++ (www.gotw.ca/microsoft)

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
On Mar 28, 10:05 pm, George Neuner <gneun...(a)comcast.net> wrote:
> On Thu, 25 Mar 2010 17:31:25 CST, James Kanze <james.ka...(a)gmail.com>
> wrote:

> >On Mar 25, 7:10 pm, George Neuner <gneun...(a)comcast.net> wrote:
> >> On Thu, 25 Mar 2010 00:20:43 CST, Andy Venikov

> > [...]
> >> As you noted, 'volatile' does not guarantee that an OoO CPU will
> >> execute the stores in program order ...

> >Arguably, the original intent was that it should. But it
> >doesn't, and of course, the ordering guarantee only applies to
> >variables actually declared volatile.

> "volatile" is quite old ... I'm pretty sure the "intent" was defined
> before there were OoO CPUs (in de facto use if not in standard
> document). Regardless, "volatile" only constrains the behavior of the
> *compiler*.

More or less. Volatile requires the compiler to issue code
which is conform to what the documentation says it does. It
requires all accesses to take place after the preceding sequence
point, and the results of those accesses to be stable before the
following sequence point. But it leaves it up to the
implementation to define what is meant by "access", and most
take a very, very liberal view of it.

> >> for that you need to add a write fence between them. However,
> >> neither 'volatile' nor write fence guarantees that any written
> >> value will be flushed all the way to memory - depending on
> >> other factors - cache snooping by another CPU/core, cache
> >> write back policies and/or delays, the span to the next use of
> >> the variable, etc. - the value may only reach to some level of
> >> cache before the variable is referenced again. The value may
> >> never reach memory at all.

> >If that's the case, then the fence instruction is seriously
> >broken. The whole purpose of a fence instruction is to
> >guarantee that another CPU (with another thread) can see the
> >changes.

> The purpose of the fence is to sequence memory accesses.

For a much more rigorous definition of "access" that that used
by the C++ standard.

> All the fence does is create a checkpoint in the instruction
> sequence at which relevant load or store instructions
> dispatched prior to dispatch of the fence instruction will
> have completed execution.

That's not true for the two architectures whose documentation
I've studied, Intel and Sparc. To quote the Intel documentation
of MFENCE:

Performs a serializing operation on all load and store
instructions that were issued prior the MFENCE
instruction. This serializing operation guarantees that
every load and store instruction that precedes in
program order the MFENCE instruction is globally visible
before any load or store instruction that follows the
MFENCE instruction is globally visible.

Note the "globally visible". Both Intel and Sparc guarantee
strong ordering within a single core (i.e. a single thread);
mfence or membar (Sparc) are only necessary if the memory will
also be "accessed" from a separate unit: a thread running on a
different core, or memory mapped IO.

> There may be separate load and store fence instructions and/or
> they may be combined in a so-called "full fence" instruction.

> However, in a memory hierarchy with caching, a store
> instruction does not guarantee a write to memory but only that
> one or more write cycles is executed on the core's memory
> connection bus.

On Intel and Sparc architectures, a store instruction doesn't
even guarantee that. All it guarantees is that the necessary
information is somehow passed to the write pipeline. What
happens after that is anybody's guess.

> Where that write goes is up to the cache/memory controller and
> the policies of the particular cache levels involved. For
> example, many CPUs have write-thru primary caches while higher
> levels are write-back with delay (an arrangement that allows
> snooping of either the primary or secondary cache with
> identical results).

> For another thread (or core or CPU) to perceive a change a
> value must be propagated into shared memory. For all
> multi-core processors I am aware of, the first shared level of
> memory is cache - not main memory. Cores on the same die
> snoop each other's primary caches and share higher level
> caches. Cores on separate dies in the same package share
> cache at the secondary or tertiary level.

And on more advanced architectures, there are core's which don't
share any cache. All of which is irrelevant, since simply
issuing a store instruction doesn't even guarantee a write to
the highest level cache, and a membar or a fence instruction
guarantees access all the way down to the main, shared memory.

[...]
> >The reason volatile doesn't work with memory-mapped
> >peripherals is because the compilers don't issue the
> >necessary fence or membar instruction, even if a variable is
> >volatile.

> It still wouldn't matter if they did. Lets take a simple case of one
> thread and two memory mapped registers:

> volatile unsigned *regA = 0x...;
> volatile unsigned *regB = 0x...;
> unsigned oldval, retval;

> *regA = SOME_OP;
> *regA = SOME_OP;

> oldval = *regB;
> do {
> retval = *regB;
> }
> while ( retval == oldval );

> Let's suppose that writing a value twice to regA initiates
> some operation that returns a value in regB. Will the above
> code work?

Not on a Sparc. Probably not on an Intel, but I'm less sure.
It wouldn't surprise me if Intel did allow certain segments to
be configured with an implicit fence around each access, and if
the memory mapped IO were in such a segment, it would work.

> No. The processor will execute both writes, but the cache
> will combine them so the device will see only a single write.
> The cache needs to be flushed between writes to regA.

Again, the cache is really irrelevant here. The combining will
already occur in the write pipeline.

[...]
> The upshot is this:
> - "volatile" is required for any CPU.

I'm afraid that doesn't follow from anything you've said.
Particularly because the volatile is largely a no-op on most
current compilers---it inhibits compiler optimizations, but the
generated code does nothing to prevent the reordering that
occurs at the hardware level.

> - fences are required for an OoO CPU.

By OoO, I presume you mean "out of order". That's not the only
source of the problems.

> - cache control is required for a write-back cache between
> CPU and main memory.

The cache is largely irrelevent on Sparc or Intel. The
processor architectures are designed in a way to make it
irrelevant. All of the problems would be there even in the
absence of caching. They're determined by the implementation of
the write and read pipelines.

--
James Kanze

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Leigh Johnston on


"Herb Sutter" <herb.sutter(a)gmail.com> wrote in message news:j462r55lr984u1vg0jttl47upeutfcardg(a)4ax.com...
<snip>
>>On Mar 29, 7:45 am, "Leigh Johnston" <le...(a)i42.co.uk> wrote:
>>> Performance is often cited as another reason to not use
>>> volatile however
>
> No "however" needed, that cited reason is correct. Volatile disables
> optimizations that would be legal for an atomic<>. A quick example is
> combining/eliding writes (e.g., v = 1; v = 2; can't be transformed to
> v = 2;, but a = 1; a = 2; can be transformed to a = 2;). Another is
> combining/eliding reads.
>

If atomic reads can be elided won't that be problematic for using atomics with the double checked locking pattern (so we are back to using volatile atomics)?

/Leigh

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andy Venikov on
James Kanze wrote:
> On Mar 29, 7:45 am, "Leigh Johnston" <le...(a)i42.co.uk> wrote:
>> I agree with what Andy said elsewhere in this thread:
>
>> "Is volatile sufficient - absolutely not.
>> Portable - hardly.
>> Necessary in certain conditions - absolutely."
>
> Yes, but Andy didn't present any facts to back up his statement.
>
> The simplest solution would be to just post a bit of code
> showing where or how it might be useful. A good counter example
> trumps every argument.


I just did in my reply to Herb Sutter.
Sorry, if I read your post earlier, I would've put my example here - it
makes more contextual sense.

>
> --
> James Kanze
>

Andy.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]