How Many Processor Cores Are Enough? [Computer Architecture]

Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6

From: Scott A Crosby on 22 Nov 2006 17:44

On 21 Nov 2006 09:26:10 GMT, nmm1(a)cus.cam.ac.uk (Nick Maclaren) writes:

> While using functional code to express functional concepts is good for
> many reasons, contorting stateful designs into a functional style makes
> them much HARDER to parallelise, by obscuring their design.

Too true, but mostly stateful functional code isn't per-se a problem
for parallelism, as long as the statefulness is constrained to a
single thread. So a mostly functional program could still have all of
the positive parallelization properties of a purely functional
program. In fact, I believe in Standard-ML's CML it is *undefined* for
more than one thread to access a mutable state.

Allowing mutable state to be shared --- weakening this assumption ---
introduces other complexities because the compiler now has to insure
that mutations to the mutable state get propagated to all other
threads which may have cached the old value in registers, the stack,
etc. Even with locking, the compiler has to make conservative
assumptions about what shared mutable state is protected by what lock.

Now, assume every piece of shared mutable state was paired with a
lock, where read/write of the state required grabbing the lock. Also
assume that the runtime system forbids all cross-thread access to
non-shared mutable state. (Depending on the type system, this might be
statically verifiable with whole program analysis.) In this idea, all
concurrent access to shared state is explicit and must be explicitly
thought about. And, maybe a compiler with good data-flow analysis
could track which shared state is protected by which lock and not
flush cached data and reread the shared state.

If there was a language with access to shared state (and the lock
required for that access) and communication explicit in the language,
how much useful parallelism would be exposed and is it possible for a
compiler to find it? Also, since all shared state mutation and
communication are explicit, the system has the ability to eagerly
paralellize and only commit state changes when it knows they're good.

Ideas?

Scott

From: Nick Maclaren on 23 Nov 2006 05:02

In article <oyd3b8b2j9x.fsf(a)cs.rice.edu>,
Scott A Crosby <scrosby(a)cs.rice.edu> writes:
|>
|> > While using functional code to express functional concepts is good for
|> > many reasons, contorting stateful designs into a functional style makes
|> > them much HARDER to parallelise, by obscuring their design.
|>
|> Too true, but mostly stateful functional code isn't per-se a problem
|> for parallelism, as long as the statefulness is constrained to a
|> single thread. So a mostly functional program could still have all of
|> the positive parallelization properties of a purely functional
|> program. In fact, I believe in Standard-ML's CML it is *undefined* for
|> more than one thread to access a mutable state.

You may be surprised, but Fortran was there first - by decades!

That is PRECISELY the rule for accessing global objects and arguments
in Fortran functions and has been since Fortran 66 (with it being
recommended practice in Fortran II) ....

The problem is contorting algorithms and designs to fit into those
rules; the ones that fit naturally are no problem, and never have been,
but not all are so well-suited. In my experience, explicit, shared
global update (see below) is usually clearer and more reliable for
such codes.

|> Now, assume every piece of shared mutable state was paired with a
|> lock, where read/write of the state required grabbing the lock. Also
|> assume that the runtime system forbids all cross-thread access to
|> non-shared mutable state. (Depending on the type system, this might be
|> statically verifiable with whole program analysis.) In this idea, all
|> concurrent access to shared state is explicit and must be explicitly
|> thought about. And, maybe a compiler with good data-flow analysis
|> could track which shared state is protected by which lock and not
|> flush cached data and reread the shared state.

That turns a problem at one level (undefined behaviour) into a problem
at another (indeterminable whether deadlock or livelock occur). Yes,
it helps by making the situation explicit, but that is the ONLY way in
which it does. But I agree that is worthwhile, on the grounds that
making problems explicit is almost always worthwhile.

The claimed downside (that it forces the taking of a lock even when
unnecessary) is answered by your data-flow analysis. If the compiler
can't tell that the use is 'safe', then an unchecked access is a bug
waiting to bite you; it it can, it can optimise it. It neither helps
nor hinders that.

Regards,
Nick Maclaren.

From: jsavard on 23 Nov 2006 07:26

Terje Mathisen wrote:
> ranjit_mathews(a)yahoo.com wrote:
> > How does the Itanium compare in terms of SpecInts PER GHz?
> >
> Spec/GHz is very nearly totally meaningless.
>
> Why do you care?

In *itself*, yes.

But it lets one translate (somewhat meaningful) GHz into (much more
meaningful) SpecInts.

Also, it might be considered a figure of merit for architectural
elegance or something.

John Savard

From: Brian Hurt on 27 Nov 2006 21:05

Sorry for the delay in getting back to this.

"Chris Thomasson" <cristom(a)comcast.net> writes:

>> The biggest problem with parallelized code is the race condition- which
>> arise from mutable data. Every peice of mutable data is a race condition
>> waiting to happen. Mutable data needs to be kept to an absolute minimum,
>> and then handled in such a way that it's correct in the presence of
>> threads.
>>
>http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b192c5ffe9b47926

>Your conclusion is wrong. No need to be afraid of mutable data... Just stick
>it all in the writer side of a lock-free reader pattern.

It's a question of correctness. Race conditions, deadlocks, livelocks,
etc. are like wild C pointers- it only takes one of them to totally
screw up a program. Can you write correct programs with C-style pointers?
Yes. Demonstratably so. But it only takes one stupid programmer on the
project to mess things up for the whole team- which is why pretty much
the entire evolution of programming languages since C has been trying to
get as far away from C pointers as possible.

Just like any pointer is potientially a wild pointer, any mutable variable
is a race condition waiting to happen. So the language needs to make
sure that every mutable variable is not a race condition *somehow*- like
Java makes sure every pointer dereference is safe.

Note that every programming language has mutable data- even Haskell. The
question isn't *if* mutable data, it's *how* mutable data- and as a subset
of that, how little?

>> There are two languages I know of in which it may
>> be possible to write non-trivial parallel programs correctly and
>> maintainably- concurrent haskell with STM and erlang- and both are, at
>> their core, purely functional languages.

>STM is a no go. Way too much overhead... Here are some of the reasons why:

First of all, I'll take one published paper going "we actually implemented
it and here are the benchmarks":
http://citeseer.ist.psu.edu/shavit95software.html

Second, I don't care how fast your program is if it keeps segfaulting- or
deadlocking, or acting weird because of race conditions. And neither do
most programmers, I comment- the definate trend of the last 25 or so years
have been to languages that are lower performance but offer greater ease
of development and correctness. C and Fortran were replaced by C++, which
was slower (albeit not by much), which was replaced by Java and C#, which
were slower than C++, and they in the process of being replaced by Python
and Ruby, which are slower yet. It's not cpu cycles we're optimizing for,
it's programmer cycles.

The trick with parallel programming is mainly not in expressing the
parallelism- well, there are some gains to be made there, but the real
room for improvement is correctness. Anyone who has actually tracked down
a race condition can tell you that they are the toughest category of bugs
to track down- because they generally work. Unit tests are worthless.

Brian

From: Brian Hurt on 27 Nov 2006 21:31

nmm1(a)cus.cam.ac.uk (Nick Maclaren) writes:

>In article <e0t8h.7975$lq.2268(a)newsread1.mlpsca01.us.to.verio.net>,
>Brian Hurt <bhurt(a)spnz.org> writes:
>|> nmm1(a)cus.cam.ac.uk (Nick Maclaren) writes:
>|>
>|> >That experience debunked the claims of the
>|> >functional programming brigade that such methodology gave automatic
>|> >parallelisation.
>|>
>|> Automatic parallelization, no. You're looking for a silver bullet that
>|> probably doesn't exist. On the other hand, functional programming makes
>|> writting parallel code much easier to do.

>Oh, no, I am not! I pointed out to those idiots at the time that silver
>bullets didn't work, but there is no arguing with religious fanatics :-(

OK.

>While using functional code to express functional concepts is good for
>many reasons, contorting stateful designs into a functional style makes
>them much HARDER to parallelise, by obscuring their design.

My response to that is to avoid statefull designs. At which point functional
programming generally makes the design *less* obscure.

To date, there are two categories of algorithms I can't do purely
functionally:

1) Algorithms that rely on the O(1) nature of hash tables *and* where the
number of elements being hashed is large enough that the constant factor
disadvantage of hashing is overtaken by the O(log N) nature of a tree
struture *and* the worst case O(N) performance of the hash table is
acceptable. Remember that a string hash can easily be an order of magnitude
more expensive than a string, meaning that for small structures, the cost
of the single hash function is larger than that of the several string
compares.

2) Graph algorithms, were I need to find the edges from a given node i
fast. Arguably, this falls under 1.

>Sorry, no. That may be the biggest LOW-LEVEL problem, but there are much
>worse ones :-(

There are four problems with multithreaded programming I've encountered:
1) race conditions
2) deadlocks
3) livelocks
4) priority inversions

STM solves 2 and 4, with Haskell's type system ensuring that all mutable
data is in STMs it also solves 1. I'm not sure it solves 3. livelocks,
which is why I still wimble.

>I have come to the conclusion that it isn't even desirable - but that
>needs qualification.

>I fully agree that functional programming should be the first port of
>call. If your code can be expressed naturally in pure functional terms,
>then that should be done - and NOT just for parallelism! We are in
>absolute agreement there.

>It's when it can't that the problem arises. Replacing iteration by
>recursion with assumed tail removal is a neat idea that doesn't work;
>it ends up being equivalent in all respects, problems and all.

I'm not sure loops are the right place to be looking for parallelism.
Outside of dense numeric work (the baliwick of Fortran), most loops
simply don't loop that often- a few dozen times at most.

For the foreseeable future, I don't see the compiler discovering signifigant
amounts of parallelism. It's just too complicated. Functional programming
might make it easier- but I don't think it'll make it easy *enough*.

This means the parallelism that is there has to be expressed by the
programmer explicitly. This means two things:

1) it has to be easy for the programmer to express the parallelism, to
say that code A and code B may or shall run in parallel (the may case is
the one I think is important- I'd love to see a STM Haskell with Cilk-style
microthreads)

2) both code A and code B have to be able to work together in a
multithreaded program. Preferably by default- running both A and B in
parallel might not have been the plan when those peices of code were
originally developed,

Brian

First | Prev | Next | Last
Pages: 43 44 45 46 47 48 49 50 51 52 53 54 55
Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6