Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: nmm1 on 29 Dec 2009 10:08

In article <hhd3vr$cq3$1(a)smaug.linux.pwf.cam.ac.uk>, <nmm1(a)cam.ac.uk> wrote:
>In article <3b6dneYekdE0jafWnZ2dnUVZ_t2dnZ2d(a)bestweb.net>,
>Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>>I find it somewhat hard to imagine that occuring; what it would mean is
>>that there was no synchronization for the reads; i.e. the program did
>>not care if it read x[A] or x[A]'. If, however, there were some form of
>>synchronization, then it would be guaranteed that the correct version of
>>each variable was read and you wouldn't get this inconsistency.
>
>Not at all. What happens is that the inconsistency shows up in more
>complicated code, and breaks a (reasonable) constraint assumed by the
>code. It is possible to describe the problem very simply, but not to
>explain why it causes so much trouble. However, it does, and it is
>one of the classic problems of shared-memory parallelism.

Try the following, which is about the simplest 'real' example I can
think of:

Thread A creates a data object X, and then stores a pointer to
it in P.
Thread B creates a data object Y, which contains a copy of P,
and then stores a pointer to it in Q.
Thread C reads Q, and indirects through P to X.

All well-defined, with sequential consistency and some other such
models. But, without that, the update of X need not have been
completed, as seen by thread C, just because the update of Q has
been. And, yes, it really happens.

Regards,
Nick Maclaren.

From: Mayan Moudgill on 29 Dec 2009 12:53

nmm1(a)cam.ac.uk wrote:

> In article <hhd3vr$cq3$1(a)smaug.linux.pwf.cam.ac.uk>, <nmm1(a)cam.ac.uk> wrote:
>
>>In article <3b6dneYekdE0jafWnZ2dnUVZ_t2dnZ2d(a)bestweb.net>,
>>Mayan Moudgill <mayan(a)bestweb.net> wrote:
>>
>>
>>>I find it somewhat hard to imagine that occuring; what it would mean is
>>>that there was no synchronization for the reads; i.e. the program did
>>>not care if it read x[A] or x[A]'. If, however, there were some form of
>>>synchronization, then it would be guaranteed that the correct version of
>>>each variable was read and you wouldn't get this inconsistency.
>>
>>Not at all. What happens is that the inconsistency shows up in more
>>complicated code, and breaks a (reasonable) constraint assumed by the
>>code. It is possible to describe the problem very simply, but not to
>>explain why it causes so much trouble. However, it does, and it is
>>one of the classic problems of shared-memory parallelism.
>
>
> Try the following, which is about the simplest 'real' example I can
> think of:
>
> Thread A creates a data object X, and then stores a pointer to
> it in P.
> Thread B creates a data object Y, which contains a copy of P,
> and then stores a pointer to it in Q.
> Thread C reads Q, and indirects through P to X.
>
> All well-defined, with sequential consistency and some other such
> models. But, without that, the update of X need not have been
> completed, as seen by thread C, just because the update of Q has
> been. And, yes, it really happens.
>
>

Thanks for the example. It does highlight a potential problem [though it
appears completely unrelated to cache line sharing]. Writing out the
time-line:

A: writes X, synch#1
B: synch#1, writes Q pointing to X, synch#2
C: synch#2, derefs Q

where synch is a pairwise synchronizaton + barrier.
- synch#1 reconciles A's and B's writes, so B sees the writes to X, and
B can safely deref Q.
- synch#2 reconciles B & C, so C sees the writes to Q, *BUT* there is no
guarantee that A's writes are visible to C, so C derefing Q can access
the old value of X.

I guess this kind of situation will arise whenever barriers are not
global, but apply to only a subset of all processors.

But isn't that just a variation of an already existing category of bug -
failure to respect the memory model?

From: nmm1 on 29 Dec 2009 13:16

In article <sZWdnUZRf9Gd36fWnZ2dnUVZ_u-dnZ2d(a)bestweb.net>,
Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>Thanks for the example. It does highlight a potential problem [though it
>appears completely unrelated to cache line sharing]. ...

It may appear to be, but it isn't. There are several ways of ensuring
that writes to part of cache lines never get lost, which are scalable
to lots of cores (Andy Glew mentioned one a long while back in this
thread, if I recall). Unfortunately, there aren't any that preserve
sequential consistency, which is why no (?) current hardware supports
it. Sequential consistency is inherently global, and hence inherently
non-scalable.

>I guess this kind of situation will arise whenever barriers are not
>global, but apply to only a subset of all processors.

Yes.

>But isn't that just a variation of an already existing category of bug -
>failure to respect the memory model?

THE memory model? That's my point - there isn't one, there are at
least two - the one provided by the hardware and the one assumed by
the programming language.

What I am trying to explain is that every method of reducing cache
synchronisation cost and increasing its scalability has the downside
that it reduces the consistency of the memory model. All computer
architecture is a compromise between efficiency and usability, and
this is no exception. Currently, almost all shared-memory parallel
models being introduced by programming languages are MORE consistent
than what the hardware currently provides, rather than less (which
would allow freedom to change caching technology).

Therefore, either the language memory model has to be reduced to
match the hardware one or the language has to be disciplined enough
that the compiler can detect the cases that need special attention.
All right, or it can be left as the programmer's problem, but that
is a shortcut to vastly less reliable programs than we have today.

The main exceptions are the PGAS models, which are sort of semi-
shared memory with explicit synchronisation. They are popular with
some of the HPC people, but that's a subset of a subset. The main
ones assume sequential consistency, with or without fencing.

Regards,
Nick Maclaren.

From: Mayan Moudgill on 29 Dec 2009 14:46

nmm1(a)cam.ac.uk wrote:

> In article <sZWdnUZRf9Gd36fWnZ2dnUVZ_u-dnZ2d(a)bestweb.net>,
> Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>
>>But isn't that just a variation of an already existing category of bug -
>>failure to respect the memory model?
>
>
> THE memory model? That's my point - there isn't one, there are at
> least two - the one provided by the hardware and the one assumed by
> the programming language.
>
> What I am trying to explain is that every method of reducing cache
> synchronisation cost and increasing its scalability has the downside
> that it reduces the consistency of the memory model.
>
> Therefore, either the language memory model has to be reduced to
> match the hardware one or the language has to be disciplined enough
> that the compiler can detect the cases that need special attention.
> All right, or it can be left as the programmer's problem, but that
> is a shortcut to vastly less reliable programs than we have today.
>

Thanks for clearing things up.

From: Robert Myers on 29 Dec 2009 18:48

On Dec 29, 2:46 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:

> Thanks for clearing things up.

This exchange, while impressive for its depth of experience and
understanding, is a bit like a discussion of how to save the Ptolemaic
universe.

Keeping track of things by memory locations that can be written to and
accessed by multiple processes and that can have multiple meanings
based on context is a losing enterprise.

No matter what you propose, you'll wind up with specifications that
read like the health care bill now before Congress. And, as I think
Nick is pointing out with current solutions, the bugs won't always be
the result of programmer incompetence.

You've both got so much invested in thinking about these things that
you'll beat it to death forever, rather than admitting that the entire
model of stored variables simply doesn't work.

Other models have been proposed. Time to get on with it.

Robert.

First | Prev | Next | Last
Pages: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Prev: PEEEEEEP
Next: Texture units as a general function