From: nmm1 on 31 Dec 2009 07:05
In article <4B3BFDF9.2070607(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>Copy in/out helps, but generalize:
>Some funky linked data structure where you can't tell a priori what
>subsections there are.
I have been thinking about that, on and off, for a decade, and I think
that something could be done. Unfortunately, no existing language
would be suitable as a basis :-(
>Or, heck, an N-dimensional array, where you sometimes want row access,
>sometimes columns, sometimes any of the diagonals, sometimes
>sub-matrices - where, in general, you can't tell what subset of the data
>structure is being manipulated. And where there is not some hierarchy
>that is natural for locking or other mutex.
Oh, THAT's easy! Fortran already does it (except for the diagonals,
which is a fairly easy extension, and was there in one draft of
>Whereas with a subset of linked datastructure that does not have a
>single dominator, you can't necessarily copy back in a changed
>datastructure, with nodes that have been added or deleted.
No. You have to add other constraints on what can be done. Not
impossible, but not easy, either.
>Herlihy's non-blocking lock free algorithms using compare-and-swap
>depend on there being a single pointer that dominates the data structure
>you are changing.
One of the many reasons that so many people feel that a double form
of compare-and-swap is essential.
>Anyway, yes, copy in/out helps. But it is not the final answer. The
>real problem in so much of this is that we have memory. Data structures
>that are operated on in place. And, no matter what advocates of data
>flow, logic programming, applicative or functional languages say, I
>don't think that we are going to do away with them.
Agree. However, my original remark WASN'T that copy-in/copy-out is
the solution, but that the language restrictions that allow it are
key to any such data division (as well as structural rearrangements).
Despite common believe, Fortran has NEVER mandated either reference
or copy-in/copy-out, and several other techniques have been used.
The key is that you can't do ANYTHING when handicapped by a memory
model like those of C, C++ or Java - once the address of a subobject
is required to be fixed, you are dead.
From: nmm1 on 31 Dec 2009 07:31
In article <4B3C097F.5010304(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>> ... Currently, almost all shared-memory parallel
>> models being introduced by programming languages are MORE consistent
>> than what the hardware currently provides, rather than less (which
>> would allow freedom to change caching technology).
>In my experience somewhat the reverse. The languages have a weaker
>memory ordering model than the hardware. (Admittedly, at Intel and AMD
>I worked on the processors that were put into large systems - the system
>vendor did not need to implement the memory model we were implementing.)
This is something that I am actively involved in, and I agree that it
APPEARS to be as you say. Unfortunately, few standards are consistent,
let alone mathematically precise, and the current trend is to specify
constraints and guarantees by implication.
>E.g. the Java standard's sequential atomics. The language
>allows arbitrary reordering, except at well identified places. Whereas
>the hardware enforces ordering on all memory operations.
It isn't the explicit atomic operations I am referring to (which are
easy to make consistent, if that's what you want, anyway). It's the
effect of them on the 'ordinary' memory accesses. C++ is probably
the most important such language, as well has having one of the more
precise specifications of this area (and I mean the draft standard,
not C++ 2003).
Almost all such languages specify some form of causal consistency on
such ordinary operations, with the inter-thread ordering that matches
the use of the atomics or fences (which are equivalent in this sense).
But what the languages DON'T do is to specify exactly which memory
locations are read and 'updated' by other primitives in the language,
and leave that to the generic rule that the compiler must get the
If you work through that, in the same way that a mathematican proves
a theorem, you discover that the only reliable rule for a system is
to provide sequential consistency of all operations that cannot be
proved not to be involved in any inter-thread effects. Well, that's
a well-known intractable problem, so make it all of them ....
>The supercomputer people keep saying "We don't want cache coherency."
>and "We don't want memory ordering." "We want to manage both in
>software, if you give us the primitives."
>Which is fine... except that caches really do work pretty well in many
>cases. Some people think that we will have domains that are cache
>coherent, in combination with domains that share memory, but are not
Yes. I like barriers. I can understand their implications ....
More seriously, I agree, though my remark about the supercomputer
people is that their claims that they can manage shared-memory
parallelism are currently unsupported by evidence, and many of us
don't believe them. Oh, yes, there are programs for which that is
possible, but for general HPC work?
>But I can't change cache line size... not because of cache coherency,
>but because software that was allocating stuff at 64B cache lines will
>break if I have a different cache line size, larger, or strided, or ...
>So that's why I am thinking about tyhe bitmasks. Whether byte or word.
>I'm wlling to suspend disbelief and believe software when they say they
>can manage memory ordering and cache consistency in software. That
>reduces a lot of overhead; ad if we make the cache line granularity
>invisible to software correctness wise, it opens up a whol;ew slew of
>possibilities for hardware.
No dissent there.
From: Mayan Moudgill on 31 Dec 2009 08:30
> But what the languages DON'T do is to specify exactly which memory
> locations are read and 'updated' by other primitives in the language
> [T]he the only reliable rule for a system is
> to provide sequential consistency of all operations ...
With the exception of languages where processes and communication are
made explicit (e.g. CSP + derivatives, Linda).
And another exception for languages with communcation restricted to
immutable types (CLU, functional languages).
> [T]heir claims that they can manage shared-memory
> parallelism are currently unsupported by evidence, and many of us
> don't believe them. Oh, yes, there are programs for which that is
> possible, but for general HPC work?
Any problem that is not I/O bound that can be made solved using a
non-imperative language can be made to work faster in an imperative
language with the appropriate library (i.e. if you can write it in ML,
it can be re-written in C-as-high-level-assembly, and it will perform
The questions that arise are:
- how much speedup does the C version get?
- how much more effort is it to write the C version?
- how many machines will this version need to support, and how much
rewrite is required to support all these machines?
- what is the economic benefit of the effort vs. speedup?
From: nmm1 on 31 Dec 2009 08:57
In article <36adnfDfyf77OqHWnZ2dnUVZ_sCdnZ2d(a)bestweb.net>,
Mayan Moudgill <mayan(a)bestweb.net> wrote:
>> But what the languages DON'T do is to specify exactly which memory
> > locations are read and 'updated' by other primitives in the language
> > ...
>> [T]he the only reliable rule for a system is
>> to provide sequential consistency of all operations ...
>With the exception of languages where processes and communication are
>made explicit (e.g. CSP + derivatives, Linda).
Or Fortran/C/C++ together with MPI (not one-sided).
>And another exception for languages with communcation restricted to
>immutable types (CLU, functional languages).
Neither of those were in the class under consideration. Functional
languages effectively don't have a memory model, anyway.
From: "Andy "Krazy" Glew" on 31 Dec 2009 13:45
> The key is that you can't do ANYTHING when handicapped by a memory
> model like those of C, C++ or Java - once the address of a subobject
> is required to be fixed, you are dead.
Ah, now we are getting somewhere.
Now, I *like* the fact that C in some ways is essentially a structured
assembly - at a particular point in time an object or subobject has an
address, and that can be taken and used.
However, I agree that it is the possibly of such an address persisting
forever that gets in the way of so many things.
Although... garbage collection systems can at least locate all such.
However, we would not want to have to do such a scan from the GC roots
frequently, as part of some synchronization or parallelization operations.
So, I agree that it might be nice to have a language "mode" that forbids
or deprecates the address-of operator. E.g. one that had a special
repreesentation for parameters, that allowed reference, copy-in/out,
etc. semantics. E.g. one that had a special representation for the
pointers that so many people use to walk down an array or linked list.
E.g. a mode in which such a special-reference was globally registered,
or otherwise made easier to reason about parallelism and aliasing-wise.
E.g. where certain casts and other operations that, in C, make
aliasing analysis almost impractical, were disallowed.
One of the reasons I am hopeful for C# is that it has at least the safe
and unsafe modes. I am hopeful that modes in a language - essentially
sublanguages - may be useful in making memory better behaved. I am
hopeful that more language moding or subsetting mechanisms will be
created. Sure, you could create new languages as necessary - but it is
a pity when certain library mechanisms that could be used in all of the
new languages are prevented, because they are new, separate, languages.
E.g. I want separate languages or sublanguages, but I want the STL to
be portable across them all.
Note that if you can overload the address of operator &(T) you can get
much of the above. But then you will want to have a mode that prevents
standard user code from using &int=>int*; you will want to ensure that,
in one of your safe sublanguages, &int=>well_behaved_ptr<int>. Or, if
you don't like overloading, is illegal.