From: Pete Becker on
On 2010-07-12 03:22:44 -0400, Andy said:

> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }
> }
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?

It depends on the compiler. But C++0x will have atomic variables and
the implementation will take care of the compiler-specific techniques
to prevent reordering.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book)


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andy Venikov on
Andy wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?

The short answer is yes.
Unfortunately, there's more to that than a simple answer.

First, any lock-free algorithm should be thoroughly analyzed prior to
trying to implement it in any language (for example, your algorithm is
not lock-free at all! More on that at the end of the post)

Second, it's very hard to write portable lock-free C++ code.
You are always at the mercy of the library that you're using to
implement atomic primitives. For example, does your "AtomicTestAndSet"
function feature acqure or release semantics? (i.e., does it issue a
relevant memory fence?) Does the library that you're using give you any
guarantees with respect to compiler cooperation (like pthread library does).

With advent of c++0x's atoimc<> and boost::atomic it's becoming easier
however.


>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }
> }
> SharedStdVector[3] += 10;

This snippet has several problems.
First, even if we assume that AtomicTestAndSet has both acquire and
release semantics (i.e., it issues a full memory barrier), the access to
isLocked is not synchronized. The line "while (isLocked)" might go into
an infinite loop even if another thread updates isLocked to false. It
can happen due to the compiler optimization where the compiler is
allowed to treat all code as a single-threaded code, and isLocked will
be read from the main memory only once.
One way to fix that is to mark isLocked as volatile (beware though,
there has been a long thread on this topic about 3 months ago which I
had been a part of. Using volatile is not for a faint of heart and
should only be done by a library-writer
http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/fcef57c539a42ef5/be98b8ef87ff64a4?hl=en&tvc=1&q=is+alexandrescu+right#be98b8ef87ff64a4).
Better yet, make all access to isLocked atomic, i.e.
atomic<bool> isLocked; or similar

But the bigger issue in your code is that even if you can get it to
work, it's not lock-free.
Lock-free is not about avoiding mutexes or other synchronization
primitives. Lock-free is a property of an algorithm that guarantees
progress of the overall program at every step. I.e., that at least one
of the threads will make progress.

What you did is you wrote an implementation of a lock without resorting
to system-provided primitives. But it's a lock nevertheless. And you
have all the issues with locking - what if another thread stalls and
never updates isLocked to false? Also, you can dead-lock as easily as
with system-provided locks. Moreover, since usually system-provided
locks don't resort to busy-waiting as you do they will most likely have
a better performance as well. And, at least on Linux, taking an
uncontested lock is never a system call, just a single atomic operation.
I'm pretty sure that taking an uncontested lock on other operating
systems is optimized as well.

HTH,
Andy.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Joshua Maurice on
On Jul 12, 6:22 am, Andy <amsk1...(a)gmail.com> wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }}
>
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?

The short version is that C++ standard says nothing about threading.
All threading and synchronization issues are covered by other
standards. What tends to happen in practice is that the compiler will
be written with a particular threading library and model in mind, such
as POSIX. When you write POSIX code, you are not just using a library.
A library alone is insufficient to offer the needed guarantees because
the guarantees are not expressible in C++. That is, it's not just a
POSIX library. It's a POSIX compliant library \and compiler\.

Thus, I would suggest you look at the docs for this library which
offers the threading primitives, and look at the docs for your
compiler (if any). Also, you look at the output assembly to make sure
it's doing what it should be doing.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Yaser Zhian on
On Jul 12, 5:22 pm, Andy <amsk1...(a)gmail.com> wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }}
>
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
>

I am the first to admit that I am not well-versed in these matters,
but in theory (and in practice as far as I have experienced) you
should be able to use the "volatile" keyword when declaring the
involved variables to signal to the compiler that their values
might change in some other context (e.g. some other thread) and it
should not try to mess around with the order of the statements
involving them.

For example, you could (and you probably should) declare "isLocked"
and SharedStdVector as "volatile", like this:

volatile bool isLocked = false;
volatile std::vector<int> SharedStdVector;

I don't know whether a volatile attribute on a non-POD class has
any effect or what kind. I have never tried that, in fact. But
it should work for simple data types.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andy on
> > bool continue = true;
> > while(continue) {
> > while(isLocked) { }
> > // atomically set isLocked to true if it has the value false
> > if(AtomicTestAndSet(&isLocked, true, false)) {
> > continute = false;
> > }
> > }
> > SharedStdVector[3] += 10;
>
> This snippet has several problems.
[snip]
> But the bigger issue in your code is that even if you can get it to
> work, it's not lock-free.

Thanks for your comments; I realize the example was not lock free. It
was really just intended to make the re-ordering issues I was
concerned about explicit, and a simple spin mutex implementation is
the simplest, most obvious was to write a short piece of code to
illustrate this. I probably should have chosen a better example. I
must admit that I missed the "single read" optimization issue
(although if this was real code I hopefully wouldn't have done).

It seems to me that code such as this can best be made to work on non-C
++0X implementations by:

1) using an assembly library call to read isLocked before the CAS
thereby preventing compiler optimization of the read. I definitely
don't want the volatile modifier here as on some platforms I would
also get an unwanted fence (e.g. on MSVC), and the whole point of this
read is that it is non-fenced.

2) Ensuring that the CAS Instruction is defined to include a
"compiler memory barrier" in addition to the processor one. Code like
this be proven to work on non-C++0X compilers where there are no such
compiler barriers.

On C++0X implementations this should work with std::atomic (although I
haven't yet checked the guarantees this new class makes yet).

Is that correct?


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]