From: David Hopwood on
Joe Seigh wrote:
> David Hopwood wrote:
>> Joe Seigh wrote:
>>> Alexander Terekhov wrote:
>>>
>>>> So where do you put the fence, then?
>>>>
>>>> : processor 1 stores into X
>>>> : processor 2 see the store by 1 into X and stores into Y
>>>> : processor 3 loads from Y
>>>> : processor 3 loads from X
>>>
>>> Since this was my example I should clarify. It was meant to
>>> show that PC alone wasn't sufficient to guarantee that if processor
>>> 3 saw the store into Y by processor 2 that it would see the
>>> store into X by processor 1.
>>>
>>> My understanding of the ia32 memory model is that you
>>> need a fence instruction between the loads by processor 3
>>> and a fence between the load and store by processor 2 to
>>> make the guarantee work.
>>
>> My understanding is that if the claimed problem exists at all, adding
>> these fences won't fix it (as far as the model is concerned, possibly
>> as opposed to implementation details of specific chips).
>
> The architected memory model as opposed to the implemented one?

Yes, that's what I said.

> "Despite the fact that Pentium 4, Intel Xeon, and P6 family
> processors support processor ordering, Intel does not guarantee that
> future processors will support this model. To make software portable
> to future processors, it is recommended that operating systems provide
> critical region and resource control constructs and API’s (application
> program interfaces) based on I/O, locking, and/or serializing
> instructions be used to synchronize access to shared areas of
> memory in multiple-processor systems."

This is all perfectly sensible. "Future processors" from Intel are not
necessarily ISA-compatible with x86 anyway. For example, you need to
recompile to use long mode in EM64T. Also note that it doesn't say
"future x86 processors". Maybe they were talking about Itanic.

Even if they weren't talking about IA-64 or a different mode, it's
still a good idea to avoid dependencies on the memory model in
*applications*, since it is more difficult to change all apps that
have such dependencies than it is to change threading libraries in OS
and language implementations. In fact OS/lang-impl maintainers half
expect stuff to rot on new hardware, and hopefully remember what they
depended on. Application maintainers generally don't (if they ever
understood it in the first place). This is what I've been saying
consistently.

Anyway, this issue doesn't have anything to do with what we were talking
about, which is whether the current architected x86 model allows a
particular behaviour.

> That one? And what do people think the memory model that only
> "I/O, locking, and/or serializing instructions" can synchronize is?

You're overanalysing a fairly loosely worded recommendation.

--
David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>
From: Joe Seigh on
David Hopwood wrote:
> Joe Seigh wrote:
>
>> "Despite the fact that Pentium 4, Intel Xeon, and P6 family
>> processors support processor ordering, Intel does not guarantee that
>> future processors will support this model. To make software portable
>> to future processors, it is recommended that operating systems provide
>> critical region and resource control constructs and APIýs (application
>> program interfaces) based on I/O, locking, and/or serializing
>> instructions be used to synchronize access to shared areas of
>> memory in multiple-processor systems."
>
>
> This is all perfectly sensible. "Future processors" from Intel are not
> necessarily ISA-compatible with x86 anyway. For example, you need to
> recompile to use long mode in EM64T. Also note that it doesn't say
> "future x86 processors". Maybe they were talking about Itanic.
>
> Even if they weren't talking about IA-64 or a different mode, it's
> still a good idea to avoid dependencies on the memory model in
> *applications*, since it is more difficult to change all apps that
> have such dependencies than it is to change threading libraries in OS
> and language implementations. In fact OS/lang-impl maintainers half
> expect stuff to rot on new hardware, and hopefully remember what they
> depended on. Application maintainers generally don't (if they ever
> understood it in the first place). This is what I've been saying
> consistently.

Yes, your adversion to anarchist application programmers doing their
own thing is well known. :)

>
> Anyway, this issue doesn't have anything to do with what we were talking
> about, which is whether the current architected x86 model allows a
> particular behaviour.
>
>> That one? And what do people think the memory model that only
>> "I/O, locking, and/or serializing instructions" can synchronize is?
>
>
> You're overanalysing a fairly loosely worded recommendation.
>

I'm not sure what you're saying here. That all future processors
from Intel that don't have processor ordering won't be x86? And
that the synchronization intructions in these future processors
won't be similar to the one's in x86? That Intel is telling people
in an x86 manual to start writing portable code not now but when
they get to the future processor? That's a little strange even for
Intel.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
From: David Hopwood on
Joe Seigh wrote:
> David Hopwood wrote:
>> Joe Seigh wrote:
>>
>>> "Despite the fact that Pentium 4, Intel Xeon, and P6 family
>>> processors support processor ordering, Intel does not guarantee that
>>> future processors will support this model. To make software portable
>>> to future processors, it is recommended that operating systems provide
>>> critical region and resource control constructs and API’s (application
>>> program interfaces) based on I/O, locking, and/or serializing
>>> instructions be used to synchronize access to shared areas of
>>> memory in multiple-processor systems."
>>
>> This is all perfectly sensible. "Future processors" from Intel are not
>> necessarily ISA-compatible with x86 anyway. For example, you need to
>> recompile to use long mode in EM64T. Also note that it doesn't say
>> "future x86 processors". Maybe they were talking about Itanic.
>>
>> Even if they weren't talking about IA-64 or a different mode, it's
>> still a good idea to avoid dependencies on the memory model in
>> *applications*, since it is more difficult to change all apps that
>> have such dependencies than it is to change threading libraries in OS
>> and language implementations. In fact OS/lang-impl maintainers half
>> expect stuff to rot on new hardware, and hopefully remember what they
>> depended on. Application maintainers generally don't (if they ever
>> understood it in the first place). This is what I've been saying
>> consistently.
>
> Yes, your adversion to anarchist application programmers doing their
> own thing is well known. :)

Right, I am absolutely convinced that the roles of application
programmer and infrastructure programmer should be clearly separated
(even if there are a few people with the ability and expertise needed
to successfully do both).

>> Anyway, this issue doesn't have anything to do with what we were talking
>> about, which is whether the current architected x86 model allows a
>> particular behaviour.
>>
>>> That one? And what do people think the memory model that only
>>> "I/O, locking, and/or serializing instructions" can synchronize is?
>>
>> You're overanalysing a fairly loosely worded recommendation.
>
> I'm not sure what you're saying here. That all future processors
> from Intel that don't have processor ordering won't be x86?

Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will
have to be changed to run on or generate code for this new x86-like
thing, and changes in the memory model will probably be only one issue
they need to deal with.

> And that the synchronization intructions in these future processors
> won't be similar to the one's in x86? That Intel is telling people
> in an x86 manual to start writing portable code not now but when
> they get to the future processor?

Of course not. Read what they actually wrote.

--
David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>
From: Andy Glew on
Alexander Terekhov <terekhov(a)web.de> writes:

> So just do cmpxchg(&X, 42, 42) which will perform locked read-write
> (with its read part store-load fenced from prior writes, I infer).
> You'll get classic SC if you replace all loads with cmpxchg(&X, 42,
> 42). That's my understanding, and I'm eagerly awaiting confirmation
> from Andy Glew and/or someone from Intel hanging at C++ memory model
> mailing list.

42, eh? Sounds like a joke: Goodbye, and thanks for all the thrash...

I think that the overall intention is that placing MFENCE before and
after every memory reference is supposed to get you SC semantics.
However, MFENCE, LFENCE, and SFENCE were defined after my time, and I
suspect that their definitions are not quite complete enough for what
you want. In particular, *FENCE really only work wrt WC cacheable
memory, and do not drain external buffers such as may occur in bus
bridges. In general, the P6 and Wmt families' mechanism for ensuring
ordering, waiting for global observability, only works for perfectly
vanilla WC cacheable memory, and is frequently violated wrt other
memory types. So I do not want to guarantee that it will work for
things like WC cached memory that is private to a graphics
accelerator.

You may be right that using the cmpxchg as you describe achieves SC on
x86. However, I need to think about it a bit more, since the
reasoning you provide is implementation specific, not architectural.

(Note that an atomic RMW like cmpxchg could well be implemented
without any fencing semantics. I.e. atomic RMWs and memory
ordering/fencing are independent concepts. I argued for this in
Itanium; I am trying to remember if x86 required that the two be mixed
up together. I can't see why it should have... I.e. I am sure that
using cmpxchg as you describe need not provide SC on a reasonable
computer architecture. I just need to find out if x86 mixed the two up
for some legacy reasons. In the meantime: use the fences would be my
recommendation.)


> > 4) The only way to guarantee that a processor has the most recent
> > value of a location is to take ownership of the variable,
> > and that requires a write. Since we actually want to read X,
> ^^^^^^^^^^^^^^^^^^^^^^^^^
>
> That's the key.
>
> > we use CAS (x86 LOCK CMPXCHG) to read the most recent value.

Flawed argument.

It is entirely possible to imagine implementations of CAS that do not
write the variable if the value is unchanged.

> That will work too, but you don't really need to LD X and loop on
> CAS compare failure given that x86's cmpxchg always makes a write.
> "The destination operand is written back if the comparison fails;
> otherwise, the source operand is written into the destination. (The
> processor never produces a locked read without also producing a
> locked write.)"

You are confusing implementation with semantics.
From: Joe Seigh on
David Hopwood wrote:
> Joe Seigh wrote:
>
>> David Hopwood wrote:
>>>
>>>> That one? And what do people think the memory model that only
>>>> "I/O, locking, and/or serializing instructions" can synchronize is?
>>>
>>>
>>> You're overanalysing a fairly loosely worded recommendation.
>>
>>
>> I'm not sure what you're saying here. That all future processors
>> from Intel that don't have processor ordering won't be x86?
>
>
> Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will
> have to be changed to run on or generate code for this new x86-like
> thing, and changes in the memory model will probably be only one issue
> they need to deal with.
>
>> And that the synchronization intructions in these future processors
>> won't be similar to the one's in x86? That Intel is telling people
>> in an x86 manual to start writing portable code not now but when
>> they get to the future processor?
>
>
> Of course not. Read what they actually wrote.
>

I did. It sounded to me like they said if you want to write
portable code, don't assume processor ordering but use the
locking and serializing instructions instead on the current
processors.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.