From: David Schwartz on
On Dec 24, 7:54 am, "novicki...(a)gmail.com" <novicki...(a)gmail.com>
wrote:

> Ok, I feel convinced, that it is NOT safe to read and write to a
> variable from multiple threads without locks even if you are not
> concerned with missing the latest updates to the values.  At the risk
> of starting a whole new thread, I guess the best performance would be
> to use atomic memory access functions like these:
> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html or
> the equivalent.

Sadly, atomic operations on x86 CPUs tend to be ridiculously
expensive, on the order of 200 clock cycles or so. This is because
even a simple atomic increment implements a full CPU barrier. If
that's not a problem, then that's probably your best choice.

But in my experience, it's almost always possible to architect these
kinds of things out of all performance-critical paths. If your threads
need to exchange information so frequently that it's causing a
performance problem, you're probably doing something wrong.

DS
From: Chris M. Thomasson on
"David Schwartz" <davids(a)webmaster.com> wrote in message
news:5053e4d8-d400-4570-92da-581e57438392(a)v15g2000prn.googlegroups.com...
On Dec 24, 7:54 am, "novicki...(a)gmail.com" <novicki...(a)gmail.com>
wrote:

> > Ok, I feel convinced, that it is NOT safe to read and write to a
> > variable from multiple threads without locks even if you are not
> > concerned with missing the latest updates to the values. At the risk
> > of starting a whole new thread, I guess the best performance would be
> > to use atomic memory access functions like these:
> > http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html or
> > the equivalent.

> Sadly, atomic operations on x86 CPUs tend to be ridiculously
> expensive, on the order of 200 clock cycles or so. This is because
> even a simple atomic increment implements a full CPU barrier. If
> that's not a problem, then that's probably your best choice.

> But in my experience, it's almost always possible to architect these
> kinds of things out of all performance-critical paths. If your threads
> need to exchange information so frequently that it's causing a
> performance problem, you're probably doing something wrong.

FWIW, there is distributed counting algorithms in which each thread has it's
own counter. A thread simply mutates it's own counter. A reader thread sums
up all the counters from each thread in order to get the actual value. It
would scale far better than using a single global counter. There are places
where these types of algorithms can fit in fairly well.

From: David Schwartz on
On Dec 23, 10:49 pm, David Schwartz <dav...(a)webmaster.com> wrote:

> Sadly, there are similar known cases on x86. If I had the time, I'd
> track down the GCC bug report.

I found the example I was thinking of. If GCC sees code like this:

if(mutex!=NULL) acquire_lock(mutex);
for(int i=0; i<100; i++)
{
some_stuff(i);
if(mutex!=NULL) shared_variable+=i;
}
if(mutex!=NULL) release_lock(mutex);

GCC assumes that 'mutex' is most likely not NULL, so it can optimize
the code to:

saved_copy=shared_variable;
if(mutex!=NULL) acquire_lock(mutex);
for(int i=0; i<100; i++)
{
some_stuff(i);
shared_variable+=i;
}
if(mutex!=NULL) release_lock(mutex); else shared_variable=saved_copy;

See how this gets a conditional out of the loop? That can actually be
an optimization, especially since GCC assumes that a variable is
rarely NULL. However, this means that the shared variable is modified
(and put back) if mutex is NULL. Ack.

GCC does not promise that it won't write to variables your code flow
doesn't write to. :(

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862

DS
From: Moi on
On Wed, 23 Dec 2009 20:43:10 -0800, David Schwartz wrote:

> On Dec 23, 8:34 pm, "novicki...(a)gmail.com" <novicki...(a)gmail.com> wrote:
>
>> However, I am still curious if due to the design of modern hardware in
>> general, it is possible to read a value for the variable that was never
>> actually set.  The code is trying to set values between 1 and 1
>> million.  But could you read a value of 8 million if 8 million is never
>> set.  The reason would be because you read an incompletely written
>> value at some point.  Or is the computer such that you can only read
>> complete values and never a partially written value.
>
> Most likely, word tearing would be the only way. I don't know of any
> POSIX platform where integers are subject to word tearing. The problem
> is that you would be producing software that might break when a new CPU,
> operating system, compiler, or something else comes out. On the bright
> side, at least on x86, a lot of software makes the assumption that read
> and writes to and from aligned integers are atomic (I think it may even
> be guaranteed by Intel in the x86 specification) so you certainly
> wouldn't be the only thing that would break.
>

In short: memory reads or writes are atomic for word-sized variables.
But "read-modify-write"s are not.

eg, you cannot use

while (i++) {i--;}
.... critical section here ...
i--;

as a spinlock.
I would take a look at the linux kernel source code to see
how locks and spinlocks are implemented on various platforms.

HTH,
AvK
From: David Schwartz on
On Dec 25, 5:05 am, Moi <r...(a)invalid.address.org> wrote:

> In short: memory reads or writes are atomic for word-sized variables.
> But "read-modify-write"s are not.

Inside the CPU, yes. But there's no guarantee the compiler will issue
a read just because your code calls for one and similarly no guarantee
the compiler will issue a write just because your code calls for one.
So the fact that read and write assembly operations are atomic on the
CPU doesn't help a C programmer much unless he has some way to get the
compiler to issue the right assembly instructions. (Which, of course,
there are ways of doing.)

DS