From: Andy Venikov on
Andy Venikov wrote:
<snip>
> Here's the final code:
> struct Node
> {
> <unspecified> data;
> Node volatile * pNext;
> };
> Node volatile * volatile head_;
> Node volatile * volatile tail_;
>
> dequeue()
> {
> while (true)
> {
> Node volatile * localHead = head_;
> Node volatile * localTail = tail_;
> DataDependencyBarrier();
> Node volatile * localNext = localHead->next;
>
> if (localHead == head_)
> {
> ...
> }
> ....
> }
>

Of course I missed the LoadLoad barrier before the if statement...

Andy.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Leigh Johnston on


"Herb Sutter" <herb.sutter(a)gmail.com> wrote in message
news:fli2r59v8lf47gav2qhcfiffcju0rqo9ka(a)4ax.com...
> On Mon, 29 Mar 2010 16:55:44 CST, "Leigh Johnston" <leigh(a)i42.co.uk>
> wrote:
>>"James Kanze" <james.kanze(a)gmail.com> wrote in message
>>news:36f7e40e-4584-430d-980e-5f7478728d16(a)z3g2000yqz.googlegroups.com...
>>>> Performance is often cited as another reason to not use
>>>> volatile however the use of volatile can actually help with
>>>> multi-threading performance as you can perform a safe
>>>> lock-free check before performing a more expensive lock.
>>>
>>> Again, I'd like to see how. This sounds like the double-checked
>>> locking idiom, and that's been proven not to work.
>>
>>IMO for an OoO CPU the double checked locking pattern can be made to work
>>with volatile if fences are also used or the lock also acts as a fence (as
>>is the case with VC++/x86). This is also the counter-example you are
>>looking for, it should work on some implementations. FWIW VC++ is clever
>>enough to make the volatile redundant for this example however adding
>>volatile makes no difference to the generated code (read: no performance
>>penalty)
>
> Are you sure? On x86 a VC++ volatile write is supposed to be emitted
> as xchg, whereas an ordinary write is usually emitted as mov. If the
> DCL control variable write is emitted as mov on x86 then DCL won't
> work correctly (well, it'll appear to work...).
>

Yes on x86 VC++ (VC9) emits a MOV for a volatile write however entering the
critical section in the DCL should act as a fence so it should work. I
asked this question (about VC++ volatile not emitting fences) in
microsoft.public.vc.language but didn't get a satisfactory reply.

/Leigh


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Leigh Johnston on


"Leigh Johnston" <leigh(a)i42.co.uk> wrote in message
news:MqidnVXvpZ-ngSzWnZ2dnUVZ7qidnZ2d(a)giganews.com...
<snip>
> IMO for an OoO CPU the double checked locking pattern can be made to work
> with volatile if fences are also used or the lock also acts as a fence (as
> is the case with VC++/x86). This is also the counter-example you are
> looking for, it should work on some implementations. FWIW VC++ is clever
> enough to make the volatile redundant for this example however adding
> volatile makes no difference to the generated code (read: no performance
> penalty) and I like making such things explicit similar to how one uses
> const (doesn't effect the generated output but documents the programmer's
> intentions). Which is better: use volatile if there is no noticeable
> performance penalty or constantly check your compiler's generated
> assembler
> to check the optimizer is not breaking things? The only volatile in my
> entire codebase is for the "status" of my "threadable" base class and I
> don't always acquire a lock before checking this status and I don't fully
> trust that the optimizer won't cache it for all cases that might crop up
> as
> I develop code. BTW I try and avoid singletons too so I haven't found the
> need to use the double checked locking pattern AFAICR.
>

In case I was unclear: obviously using volatile can affect generated output
and performance in general but in the specific example of DCL that I tried
on VC++ (VC9) for x86 there was no difference.

/Leigh


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: George Neuner on
On Tue, 30 Mar 2010 05:05:24 CST, George Neuner <gneuner2(a)comcast.net>
wrote:

>Sparc also does not have separate load and store fences,

Whoops!, brain freeze. I forgot that Sparc does have separate load
and store fences, specified by parameters to MEMBAR.

That'll teach me to post when I'm tired.

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on
On Mar 30, 12:04 pm, Andy Venikov <swojchelo...(a)gmail.com> wrote:

[...]
>> Of course, for another thread to be guaraneed to see the results
>> of any store, it has to use a load fence, to ensure that the
>> values it sees are those after the load fence, and not some
>> value that it happened to pick up earlier.

> What I meant was that memory fence doesn't mean that the
> effects of a write will be immediately flushed to the main
> memory or effects of a read immediately read from the main
> memory.

Not meaning to be impolite or anything, but what you meant or
mean isn't really that relevant. The Intel specification says
that mfence guarantees global visibility. And if you're
programming on an Intel, that's the only definition that is
relevant.

> Generally, memory fence is merely a checkpoint to tell the
> processor not to reorder instructions around the fence.

Again, a fence will prevent reordering, but only as a
consequence its fundamental requirements.

(I keep seeing mention here of instruction reordering. In the
end, instruction reordering is irrelevant. It's only one thing
that may lead to reads and writes being reordered. And what
mfence guarantees is strong memory---not just
instruction---ordering around it.)

> I don't remember what processor docs I've read (I believe it
> was Itanium) but here's for example what the docs said about a
> store fence: a store barrier would make sure that all the
> stores appearing before a fence would be stored in the
> write-queue before any of the stores that follow the fence.

That would be a very strange definition, since it would mean
that a store barrier would be useless, and that there would
never be a reason for using one. The Intel IA86 documentation
says very clearly that all preceding writes will be globally
visible; the Sparc architecture specifications say basically the
same thing for a membar.

> In no way you're guaranteed that any of the stores are in main
> memory after the fence instruction was executed.

That's not the case for IA-32, nor for Sparc.

> For that you'd have to use a flush instruction.

I suppose a machine could require two instructions to achieve a
true fence, but it seems like a very awkward way of doing
things.

--
James Kanze

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]