From: Alexander Terekhov on

"Eric P." wrote:
>
> Alexander Terekhov wrote:
> >
> > My reading of the specs is that MFENCE is guaranteed to provide
> > store-load barrier.
> >
> > P1: X = 1; R1 = Y;
> > P2: Y = 1; R2 = X;
> >
> > (R1, R2) = (0, 0) is allowed under pure PC, but
> >
> > P1: X = 1; MFENCE; R1 = Y;
> > P2: Y = 1; MFENCE; R2 = X;
> >
> > (R1, R2) = (0, 0) is NOT allowed.
>
> Are you sure you are not being inconsistent in example 2 here?
> (wrt what you answered yesterday about S/LFENCE).

PC implies both LFENCE and SFENCE ordering constraints. I don't
think that you've got invalidations stuff entirely accurate, but
the basic logic is correct.

>
> If MFENCE is just an SFENCE+LFENCE,

No.

SFENCE is store-store barrier and LFENCE is load-load barrier.

store-store + load-load != store-load.

MFENCE ensures that preceding writes are made globally visible
before subsequent reads are performed (store-load barrier)...
plus it imposes all other PC ordering constraints (load-load +
load-store + store-store).

regards,
alexander.
From: Alexander Terekhov on
Hey Mr. andy.glew(a)intel.com,

you better fix the specs, really. It's not funny anymore.

http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/default.aspx

"When multiprocessor systems based on the x86 architecture were being
designed, the designers needed a memory model that would make most
programs just work, while still allowing the hardware to be reasonably
efficient. The resulting specification requires writes from a
single processor to remain in order with respect to other writes, but
does not constrain reads at all.

Unfortunately, a guarantee about write order means nothing if reads
are unconstrained. After all, it does not matter that A is written
before B if every reader reading B followed by A has reads reordered
so that the pre-update value of B and the post-update value of A is
seen. The end result is the same: write order seems reversed. Thus,
as specified, the x86 model does not provide any stronger guarantees
than the ECMA model.

It is my belief, however, that the x86 processor actually implements
a slightly different memory model than is documented. While this model
has never failed to correctly predict behavior in my experiments, and
it is consistent with what is publicly known about how the hardware
works, it is not in the official specification. New processors might
break it."

regards,
alexander.
From: Joe Seigh on
Alexander Terekhov wrote:
> Hey Mr. andy.glew(a)intel.com,
>
> you better fix the specs, really. It's not funny anymore.
>
> http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/default.aspx
>
It's pretty clear from Andy's comments and from the technical documentation
that Intel's technical writers aren't entirely sure who their audience
actually is and mix up the specification, which is of interest to programmers,
and the implementation, which is of interest to engineers. Andy's last
comment, which appeared to me to be about implementation, certainly didn't
help.

It also doesn't help that Intel has a tradition of not architecting multi-processing
support and do it on the fly as Intel adds in multi-processing support, in clear
contrast to how other companies have documented multi-processing support in their
architectures. You had companies building Intel based multi-processors before Intel
even supported multi-processing, which meant the memory model they implemented may
or may not have matched what Intel later documented as the official memory model.
This is apparently now a tradition and there's a comment to this effect in the Intel
documentation.

"Also, software should not depend on processor ordering in situations where
the system hardware does not support this memory-ordering model."


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.