Intel x86 memory model question [Computer Architecture]

Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination

From: Joe Seigh on 6 Sep 2005 09:49

David Hopwood wrote:
> Joe Seigh wrote:
>
>> David Hopwood wrote:
>>>
>>> But OSes, thread libraries and language implementations *aren't*
>>> portable
>>> code.
>>
>>
>> I do not think that word means what you think it means.
>>
>> Note that I am an ex-kernel developer and have created enough
>> sychronization api's that run on totally different platforms.
>
>
> You are totally missing the point. OSes, thread libraries and language
> implementations have some code that needs to be adapted to each hardware
> architecture. If the memory model were to change in future processors
> that are otherwise x86-like, this code would have to change. It's not a
> big deal, because this platform-specific code is maintained by people who
> know how to change it, and because there are few enough OSes, thread
> libraries, and language implementations for the total effort involved
> not to be very great. It would, however, be a big deal if existing x86
> *applications* stopped working on an otherwise x86-compatible processor.
>

I am talking about that. You insist on maintaining that I advocate
applications hardcode platform specific assembly code into their
source. I never have advocated that.

But when you design these api's you have to have a pretty good idea
what kinds of things an be ported and what assumptions you are making
about the memory model. Since I've actually done this kind of stuff
I probably have a much better idea than you have what the actual issues
are.

And yes, there isn't any assumption about the memory model that can't
be broken by a hardware designer. The only thing that keeps hardware
companies from breaking widely used api's like Posix pthreads is they
might go out of business if they did. Hence, shorting Intel stock
might be a good idea if you believe they did do that. But saying
that we should only use widespread api's and not ever create any
new ones is ridiculous.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Eric P. on 6 Sep 2005 10:26

Alexander Terekhov wrote:
>
> My reading of the specs is that MFENCE is guaranteed to provide
> store-load barrier.
>
> P1: X = 1; R1 = Y;
> P2: Y = 1; R2 = X;
>
> (R1, R2) = (0, 0) is allowed under pure PC, but
>
> P1: X = 1; MFENCE; R1 = Y;
> P2: Y = 1; MFENCE; R2 = X;
>
> (R1, R2) = (0, 0) is NOT allowed.

Are you sure you are not being inconsistent in example 2 here?
(wrt what you answered yesterday about S/LFENCE).

If MFENCE is just an SFENCE+LFENCE, and neither of those guarantee
delivery or receipt of invalidates, then P1 can have a stale Y
and P2 a stale X. The MFENCE does nothing but prevent bypassing.

Eric

From: Eric P. on 6 Sep 2005 10:58

"Eric P." wrote:
>
> Alexander Terekhov wrote:
> >
> > My reading of the specs is that MFENCE is guaranteed to provide
> > store-load barrier.
> >
> > P1: X = 1; R1 = Y;
> > P2: Y = 1; R2 = X;
> >
> > (R1, R2) = (0, 0) is allowed under pure PC, but
> >
> > P1: X = 1; MFENCE; R1 = Y;
> > P2: Y = 1; MFENCE; R2 = X;
> >
> > (R1, R2) = (0, 0) is NOT allowed.
>
> Are you sure you are not being inconsistent in example 2 here?
> (wrt what you answered yesterday about S/LFENCE).
>
> If MFENCE is just an SFENCE+LFENCE, and neither of those guarantee
> delivery or receipt of invalidates, then P1 can have a stale Y
> and P2 a stale X. The MFENCE does nothing but prevent bypassing.
>
> Eric

Forget it, I see. With two processors Y can be stale on P1,
or X stale on P2, but not both.

Eric

From: Alexander Terekhov on 6 Sep 2005 11:29

"Eric P." wrote:
>
> Alexander Terekhov wrote:
> >
> > My reading of the specs is that MFENCE is guaranteed to provide
> > store-load barrier.
> >
> > P1: X = 1; R1 = Y;
> > P2: Y = 1; R2 = X;
> >
> > (R1, R2) = (0, 0) is allowed under pure PC, but
> >
> > P1: X = 1; MFENCE; R1 = Y;
> > P2: Y = 1; MFENCE; R2 = X;
> >
> > (R1, R2) = (0, 0) is NOT allowed.
>
> Are you sure you are not being inconsistent in example 2 here?
> (wrt what you answered yesterday about S/LFENCE).

PC implies both LFENCE and SFENCE ordering constraints. I don't
think that you've got invalidations stuff entirely accurate, but
the basic logic is correct.

>
> If MFENCE is just an SFENCE+LFENCE,

No.

SFENCE is store-store barrier and LFENCE is load-load barrier.

store-store + load-load != store-load.

MFENCE ensures that preceding writes are made globally visible
before subsequent reads are performed (store-load barrier)...
plus it imposes all other PC ordering constraints (load-load +
load-store + store-store).

regards,
alexander.

From: Alexander Terekhov on 14 Sep 2005 04:07

Hey Mr. andy.glew(a)intel.com,

you better fix the specs, really. It's not funny anymore.

http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/default.aspx

"When multiprocessor systems based on the x86 architecture were being
designed, the designers needed a memory model that would make most
programs just work, while still allowing the hardware to be reasonably
efficient. The resulting specification requires writes from a
single processor to remain in order with respect to other writes, but
does not constrain reads at all.

Unfortunately, a guarantee about write order means nothing if reads
are unconstrained. After all, it does not matter that A is written
before B if every reader reading B followed by A has reads reordered
so that the pre-update value of B and the post-update value of A is
seen. The end result is the same: write order seems reversed. Thus,
as specified, the x86 model does not provide any stronger guarantees
than the ECMA model.

It is my belief, however, that the x86 processor actually implements
a slightly different memory model than is documented. While this model
has never failed to correctly predict behavior in my experiments, and
it is consistent with what is publicly known about how the hardware
works, it is not in the official specification. New processors might
break it."

regards,
alexander.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination