How Many Processor Cores Are Enough? [Computer Architecture]

Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6

From: Alexander Terekhov on 25 Oct 2006 08:12

Chris Thomasson wrote:
>
> > I gather that the "membar #LoadStore | #LoadLoad" notation means
> > something in c.p.t but it is not in regular usage here.
>
> Its from the SPARC ...

The way I used to explain it: SPARC TSO == x86-under-Itanic/WB ==
x86-native/WB + "remote write atomicity" (see Itanic's formal memory
model). Apart from a few minor details. Pretty easy formula. Without
membar #blah-blah. ;-)

regards,
alexander.

From: Eric P. on 25 Oct 2006 10:07

Chris Thomasson wrote:
>
> "Chris Thomasson" <cristom(a)comcast.net> wrote in message
> news:jP2dnQSf9uyya6PYnZ2dnUVZ_vOdnZ2d(a)comcast.com...
> >
> >> I gather that the "membar #LoadStore | #LoadLoad" notation means
> >> something in c.p.t but it is not in regular usage here.
> >
> > Its from the SPARC instruction set...
>
> You can create many different types of barriers with the SPARC membar
> instruction.
>
> RCU would not seem to work on x86 if the loads can get freely reordered..
> This would make it similar to the way Alpha does things.

I can't speak to RCU's requirements, though I'm sure that Joe Seigh
and others are quite aware of the x86 section 7.2.2 ordering rules
and specifically rule #1: that loads can complete in any order.

Note that on x86, rule #1 does not imply that dependent loads are
unordered because of restrictions on the order stores become visible.

If you are concerened about read bypassing side effects then
add an LFENCE or MFENCE.

Eric

From: Eric P. on 25 Oct 2006 10:25

Alexander Terekhov wrote:
>
> "Eric P." wrote:
> [...]
> > Anyway, I go by what the Intel manual says.
>
> That will lead you to clinic. x86 native for WB is x86-under-Itanic/aka
> TSO minus "remote write atomicity" (in Itanic's formal memory model
> speak). Apart from a few minor details, that is. It's pretty obvious
> that the manual (as far as "memory ordering" is concerned) was written
> for testers sitting on "system bus", not software programmers.

<grin> Yeah, I debated whether that deserved a smiley or not.
In the end I decided that for the particular issue under discussion,
load-load ordering, that the manuals' rule #1 "Reads can be carried
out in any order" had such a low coefficient of ambiguity that it
wasn't warranted.

I don't know what you mean by x86 = TSO minus "remote write atomicity".
If by "remote write atomicity" you mean atomic global visibility
(all processors agree that each memory location has a single same
value), we discussed that here and it was determined (based on
'knowledgeable sources') that x86 does have atomic global visibility.

In other words
Intel Processor Consistency != Gharachorloo Processor Consistency
so x86 P.C. is TSO by another name.

Or did I misunderstand?

Eric

From: Alexander Terekhov on 25 Oct 2006 11:58

"Eric P." wrote:
[...]
> If by "remote write atomicity" you mean atomic global visibility
> (all processors agree that each memory location has a single same
> value), we discussed that here and it was determined (based on
> 'knowledgeable sources') that x86 does have atomic global visibility.

Really? IIRC, Glew went on record*** claiming that it is not true.

See also

http://www.decadentplace.org.uk/pipermail/cpp-threads/2006-September/001141.html

***) "WB memory is processor consistent, type II."

With "type II" he meant "Extension to Dubois' Abstraction", I gather.

regards,
alexander.

From: Chris Thomasson on 25 Oct 2006 20:07

ARGH!

> Humm. Okay... I am still holding my assertion that lfence is not required
> on x86 at all.
^^^^^^^^^^^^

that was suppose to read:

I am still holding my assertion that lfence is not required for the
reader-size of the RCU algorithm on the 'current' x86.

Sorry for any confusion!

First | Prev | Next | Last
Pages: 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6