Intel x86 memory model question [Computer Architecture]

Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination

From: Joe Seigh on 30 Aug 2005 19:25

already5chosen(a)yahoo.com wrote:
> Joe Seigh wrote:
>
>>The question isn't what is the x86 memory model. If you
>>want to discuss that, you are welcome to join the fray on
>>c.p.t. The question is why can't or why doesn't Intel
>>want to document the x86 memory model since apparently
>>what is in the System Programming Guide is *not* the
>>memory model. I.e. not as far as program observable
>>behavior is concerned though it may be if you have
>>tracing scopes attached to the memory bus.
>>
>
>
> I don't understand what's particularly wrong with paragraph 7.2.2
> ftp://download.intel.com/design/Pentium4/manuals/25366816.pdf
> Could you be a bit more specific.

Some people are interpreting processor consistency as implying
reads are in order and the statment
1. Reads can be carried out speculatively and in any order.
only applying to speculative reads (commit criteria being
in order at time of commit).
>
>
>>Is this some kind of Intel State Secret? Is writing
>>correct multi-threaded programs not in Intel's interest?
>>
>
>
> Obviously, writing correct multi-threaded SMP programs is in Intel's
> interest. However, according to my understanding, Intel couldn't care
> less about _lockless_ multi-threaded SMP programs. The reasons are
> clear:
> 1. That's such a tiny niche!
> 2. Average programmer can't do it correctly regardless of the quality
> of documentation.
>

You package as part of a (hopefully) easy to use api such as a
synchronized queue (which can use locks or be lock-free in the
implementation).

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Eric P. on 31 Aug 2005 11:06

Joe Seigh wrote:
>
> Joe Seigh wrote:
> >
> > processor 1 stores into X
> > processor 2 see the store by 1 into X and stores into Y
> >
> > So the store into Y occurred after causal reasoning.
> >
> > processor 3 loads from Y
> > processor 3 loads from X
> >
> > If loads were in order you could infer that if processor 3
> > sees the new value of Y then it will see the new value of X.
> > But the rules for processor consistency *clearly* state that
> > you will necessarily see stores by different processors in
> > order.
> that should be
>
> But the rules for processor consistency *clearly* state that
> you will not necessarily see stores by different processors in
> order.

I see what you are getting at, but for this to occur the new value
of Y would have to arrive at P3 before the new value of X from P1,
implying the msg from P2 to P3 somehow passed the msg from P1 to P3.
This would mean that no update order at all could be concluded
and the whole system would break.

Since they clearly do function, this is obviously not how they work :-)

Eric

From: Joe Seigh on 31 Aug 2005 12:29

Eric P. wrote:
> Joe Seigh wrote:
>
>>Joe Seigh wrote:
>>
>>> processor 1 stores into X
>>> processor 2 see the store by 1 into X and stores into Y
>>>
>>>So the store into Y occurred after causal reasoning.
>>>
>>> processor 3 loads from Y
>>> processor 3 loads from X
>>>
>>>If loads were in order you could infer that if processor 3
>>>sees the new value of Y then it will see the new value of X.
>>>But the rules for processor consistency *clearly* state that
>>>you will necessarily see stores by different processors in
>>>order.
>>
>>that should be
>>
>>But the rules for processor consistency *clearly* state that
>>you will not necessarily see stores by different processors in
>>order.
>
>
> I see what you are getting at, but for this to occur the new value
> of Y would have to arrive at P3 before the new value of X from P1,
> implying the msg from P2 to P3 somehow passed the msg from P1 to P3.
> This would mean that no update order at all could be concluded
> and the whole system would break.
>
> Since they clearly do function, this is obviously not how they work :-)
>

It turns out the x86 memory model is defined, it's just not defined in the
IA-32 manuals which is where you would expect it to be defined. It's defined
in the Itanium manuals and is equivalent to Sparc TSO memory model.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Seongbae Park on 31 Aug 2005 12:46

Joe Seigh <jseigh_01(a)xemaps.com> wrote:
....
> It turns out the x86 memory model is defined, it's just not defined in the
> IA-32 manuals which is where you would expect it to be defined. It's defined
> in the Itanium manuals and is equivalent to Sparc TSO memory model.
>
> 2.1.2 Loads and Stores
> In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
> store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
> release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
> operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
> provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
> Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

I suspect the above paragraph is stronger than what it really wanted to say.
It seems that the intention was to say
that Itanium can correctly emulate x86 by running effectively in a TSO mode,
since x86's memory model is not stronger than TSO.

On http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx:
> the memory model for X86 can be described as:
> 1. All stores are actually store.release.
> 2. All loads are normal loads.
> 3. Any use of the LOCK prefix (e.g. ?LOCK CMPXCHG? or ?LOCK INC?) creates a full fence.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"

From: Joe Seigh on 31 Aug 2005 15:45

Seongbae Park wrote:
> Joe Seigh <jseigh_01(a)xemaps.com> wrote:
> ...
>
>>It turns out the x86 memory model is defined, it's just not defined in the
>>IA-32 manuals which is where you would expect it to be defined. It's defined
>>in the Itanium manuals and is equivalent to Sparc TSO memory model.
>>
>> 2.1.2 Loads and Stores
>> In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
>> store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
>> release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
>> operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
>> provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
>> Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.
>
>
> I suspect the above paragraph is stronger than what it really wanted to say.
> It seems that the intention was to say
> that Itanium can correctly emulate x86 by running effectively in a TSO mode,
> since x86's memory model is not stronger than TSO.
>

Hmm, that's possible. If you take IA-32's loads as being unordered they're not
entirely unordered due to the processor consistency model. It's likely that
nobody uses processor consistency as a programming memory model but since Intel
specified it as part of the memory model they have to adhere to it for compatibility
reasons. Is this the reason Itanium runs so slow in IA-32 mode? Because it has
to use ld.acq instead of ld for IA-32 loads? All because they used a memory
model that was more convenient for hardware architects than for programmers?

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination