From: gast128 on
Hello all,

When I wrote a little test application, I noticed that the memory
access was completely removed in release builds (both vstudio 2003 as
2008):

void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while (*pContinue)
{ }
}

In dissambly:
void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
00981270 mov eax,dword ptr [esp+4]
00981274 mov eax,dword ptr [eax]
while (*pContinue)
00981276 test eax,eax
00981278 jne TestIntlIppiImplFlood+6 (981276h)
{
}
}

One can notice that only the value of *pContinue is stored in eax, and
this is tested. Even if pContinue is modified in another thread, the
function never ends.

This is quite an optimization, but I think it is too agressive. Ofc I
can use volatile for the address, but in effect this means that every
shared variable over threads must get the volatile keyword. I am aware
that one should use boost::mutex or other stuff to prevent data race
conditions, but this was just a simple test in which the variable was
atomicly changed (thru InterlockedIncrement) in one thread and read in
another thread.

can anyone shed light in this?

thx

From: John Keenan on
<gast128(a)hotmail.com> wrote:

> This is quite an optimization, but I think it is too agressive.

Sometimes you can use a do-nothing function to stop this optimization (you
must test with each compiler). For example:

void doNothing( long* pContinue )
{
return;
}

Then add a call to doNothing to your original function:

void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while( *pContinue ){
doNothing( pContinue );
}
}

While a compiler could optimize this to your original assembly code my
experience is that today's compilers do not.

John



From: Igor Tandetnik on
gast128(a)hotmail.com wrote:
> When I wrote a little test application, I noticed that the memory
> access was completely removed in release builds (both vstudio 2003 as
> 2008):
>
> void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
> {
> while (*pContinue)
> { }
> }
>
> In dissambly:
> void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
> {
> 00981270 mov eax,dword ptr [esp+4]
> 00981274 mov eax,dword ptr [eax]
> while (*pContinue)
> 00981276 test eax,eax
> 00981278 jne TestIntlIppiImplFlood+6 (981276h)
> {
> }
> }
>
> One can notice that only the value of *pContinue is stored in eax, and
> this is tested. Even if pContinue is modified in another thread, the
> function never ends.

You should use Interlocked* family of functions to access variables shared between threads. Alternatively, use proper synchronization primitives such as critical sections.

> This is quite an optimization, but I think it is too agressive. Ofc I
> can use volatile for the address, but in effect this means that every
> shared variable over threads must get the volatile keyword. I am aware
> that one should use boost::mutex or other stuff to prevent data race
> conditions, but this was just a simple test in which the variable was
> atomicly changed (thru InterlockedIncrement) in one thread and read in
> another thread.

Synchronizing access to shared data only works when all threads do it. It's pointless to do it in some places but not in others. Use InterlockedCompareExchange to atomically read your variable - like this:

while (InterlockedCompareExchange(pContinue, 0, 0)) {...}

--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925

From: gast128 on
Thx.

Yes we can fool the optimizer by using dummy functions.

I am aware of the threading issues and that one should lock or
atomically exchange values. Even if the read isn't atomic, one might
expect that the atomic write at least flushes it to memory. I am not
sure if this guarantees a correct read (not sure if the processor
updates its cache for all processors after a memory write, maybe
multicore machine behave here differently then multiprocessor
machines).

Still the compiler has completely optimized away the read, so I was
wondering if this is always correct. If I put any dummy object in the
call, the compiler already produces code in which the memory gets
accessed, so I was wondering why in this simple case the compiler
decided to completely optimize the memory access away and if this is
correct in all cases.
From: Igor Tandetnik on
gast128(a)hotmail.com wrote:
> I am aware of the threading issues and that one should lock or
> atomically exchange values. Even if the read isn't atomic, one might
> expect that the atomic write at least flushes it to memory.

.... but that doesn't mean that a different CPU reads it from memory and not, say, from its own cache.

> I am not
> sure if this guarantees a correct read

On many modern multicore architectures, it doesn't. See also http://en.wikipedia.org/wiki/Memory_barrier

> Still the compiler has completely optimized away the read, so I was
> wondering if this is always correct.

Yes. It's your responsibility to be careful with shared data, and use appropriate access patterns. You don't want the compiler to automatically penalize access to all variables in the program, just in case some of them are shared. That would effectively disable most optimizations.

> If I put any dummy object in the
> call, the compiler already produces code in which the memory gets
> accessed

I'm not sure what you mean by "dummy object". My guess is, you are putting a call into the loop whose source code the compiler doesn't see at this point. Now, even in a single-threaded program, it's possible to do this:

void TestIntlIppiImplFlood(/*volatile*/ long* pContinue)
{
while (*pContinue)
{
DoSomething();
}
}

// in a different source file

long global_continue = 1;
void DoSomething() {
global_continue = 0;
}

TestIntlIppiImplFlood(&global_continue);

This effect is called "aliasing" ( http://en.wikipedia.org/wiki/Aliasing_(computing) ). The compiler has to assume the presence of aliasing unless proven otherwise (e.g. local variables whose address is never given out can't be aliased), and optimize accordingly.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925

 |  Next  |  Last
Pages: 1 2
Prev: utime() and GMT
Next: PeekNamedPipe and ReadFile