From: Joe Seigh on
Andy Glew wrote:
>
> Bottom quoting: asbestos donned!
>
> I think that Joe Seigh has incorrectly assumed that processor
> consistency implies (a) a global ordering of all loads, and (b) causal
> ordering.

I think I was trying to prove that you couldn't imply global ordering
of loads.

Part of the problem is there's two target groups of programmers for
the memory model here. The processor consistency is alright if you're
doing HPC/parallel programming but isn't very useful if you're doing
general multi-threaded programming. There, all you really care about
is what the implicit global ordering between the various combinations
of loads and stores, and what memory barriers to use for the combinations
where ordering isn't defined.

In the ia32 docs, it's a little muddied because of the mention of
speculative loads. None the less I had assumed that loads weren't
ordered and that LFENCE or some other memory barrier or serializing
instruction was needed for global ordering of loads. However there
were some that claimed LFENCE wasn't needed. And the documentation
wasn't explicit enough to definitively counter their claims. And
it had to be really explicit given the rather incomprehensible
arguments they were presenting.

I've basically decided to ignore these people for now and stick with
my orginal interpretation of the ia32 memory model.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
From: Alexander Terekhov on

Joe Seigh wrote:
[...]
> In the ia32 docs, it's a little muddied because of the mention of
> speculative loads. None the less I had assumed that loads weren't
> ordered and that LFENCE or some other memory barrier or serializing
> instruction was needed for global ordering of loads.

Neither will give you "global ordering of loads". Loads on ia32 are
in-order with respect to other loads and subsequent stores (by the
same processor). The only thing that differentiates PC from TSO is
the lack of remote write atomicity (in IA64 formal memory model
speak). Implementations (e.g. SPO) of course can do all sorts of
tricks to improve performance, but that doesn't change the memory
model. You're in denial.

regards,
alexander.
From: Joe Seigh on
Alexander Terekhov wrote:
> Joe Seigh wrote:
> [...]
>
>>In the ia32 docs, it's a little muddied because of the mention of
>>speculative loads. None the less I had assumed that loads weren't
>>ordered and that LFENCE or some other memory barrier or serializing
>>instruction was needed for global ordering of loads.
>
>
> Neither will give you "global ordering of loads". Loads on ia32 are
> in-order with respect to other loads and subsequent stores (by the
> same processor). The only thing that differentiates PC from TSO is
> the lack of remote write atomicity (in IA64 formal memory model
> speak). Implementations (e.g. SPO) of course can do all sorts of
> tricks to improve performance, but that doesn't change the memory
> model. You're in denial.
>

Whatever. I'm going to use LFENCE for situations where I'd use
#LoadLoad on sparc (generic, not assuming TSO). And it's not
because I'm in denial. It's because nothing you say is
comprehensible. It's possible you are making some kind of
valid technical point but I have no way of telling.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
From: Alexander Terekhov on

Joe Seigh wrote:
[...]
> Whatever. I'm going to use LFENCE for situations where I'd use
> #LoadLoad on sparc (generic, not assuming TSO).

You mean RMO? Reportedly, RMO is vaporware, so yeah, you'll get the
same "useful" effect on Sparc as on ia32 (weakly ordered WC memory
aside for a moment): none whatsoever.

regards,
alexander.
From: Joe Seigh on
Alexander Terekhov wrote:
> Joe Seigh wrote:
> [...]
>
>>Whatever. I'm going to use LFENCE for situations where I'd use
>>#LoadLoad on sparc (generic, not assuming TSO).
>
>
> You mean RMO? Reportedly, RMO is vaporware, so yeah, you'll get the
> same "useful" effect on Sparc as on ia32 (weakly ordered WC memory
> aside for a moment): none whatsoever.
>
In the same sense that Sparc documentation assumes the weakest possbile
architected memory model when documenting usage of its memory barriers.

I know that some sparc processors only implement TSO and Solaris assumes
and requires TSO (so far).

It's possible Intel processors are all effectivly implemented as TSO, but we're
talking about the architected memory model and have to assume that unless
writing model dependent code.

I like how you sidestepped whether LFENCE or some serializing instruction
is required in some situations between sucessive loads on Intel ia32 processors.
We're assuming weakly ordered memory I think, whatever the typical multiprocessor
Intel box meant to run Linux or windows uses. Whatever "write-back cacheable"
is.

:

This whole thing is bizarre. Any other architecture, e.g. IBM Z architecture,
powerpc, sparc, alpha, ... and there's no problem in discussing whether
memory barriers are needed in certain situations. Only in Intel ia32 and only
when Alexander participates. However, if you filter out any comments by
Alexander then the problem goes away. I should have put in an Alexander filter
earlier. Then I wouldn't have raised this issue in the first place, which
has probably put *me* in a few filters. :)



--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.