Why so little parallelism? [Computer Architecture]

Prev: Historical question, what went wrong with bubble memory?
Next: Need SimpleScalar GCC Compiler

From: BDH on 25 Oct 2006 17:10

> And trying to design processors with
> sufficiently varied instruction sets to handle everyone's needs with
> purpose-designed instructions doesn't seem to be very practical.

There are only three important expensive operations: sorting, matrix
multiplication, and fourier transforms. Current architectures fail
miserably at all three.

From: BDH on 25 Oct 2006 17:13

> Current architectures fail
> miserably at all three.

Well, I suppose GPUs do a half decent job with matrices.

From: Greg Lindahl on 25 Oct 2006 21:00

In article <1161810625.532746.188850(a)f16g2000cwb.googlegroups.com>,
BDH <bhauth(a)gmail.com> wrote:

>There are only three important expensive operations: sorting, matrix
>multiplication, and fourier transforms. Current architectures fail
>miserably at all three.

Wow, now that's a sweeping over-generalization.

Would you care to give some examples? For example, floating point
matrix-matrix multiply gets close to peak for most array sizes on most
current architectures. Perhaps you're talking about bit-matrix
multiply? In which case saying so would generate much better
discussion.

-- greg

From: BDH on 26 Oct 2006 03:49

> >There are only three important expensive operations: sorting, matrix
> >multiplication, and fourier transforms. Current architectures fail
> >miserably at all three.

> Wow, now that's a sweeping over-generalization.

You're right. Let me correct it. Prefix array calculation, vector
permutation, matrix arithmetic, and fourier and number theoretic
transforms are the only important expensive computations, and current
architectures generally do very badly with all of these.

> Would you care to give some examples? For example, floating point
> matrix-matrix multiply gets close to peak for most array sizes on most
> current architectures.

Leaving aside algorithms with order less than n^3 (matrix
multiplication is probably n^2 or n^2 log n) - does matrix
multiplication of arrays of size n take more than O(n) time with O(n^2)
computational elements on your computer?

From: Nick Maclaren on 26 Oct 2006 04:53

In article <memo.20061025191139.3812C(a)jgd.compulink.co.uk>,
jgd(a)cix.co.uk (John Dallman) writes:
|> In article <1161776197.073940.210000(a)f16g2000cwb.googlegroups.com>,
|> bhauth(a)gmail.com (BDH) wrote:
|>
|> > Arbitrary vector permutation by a permutation vector, vector reversal,
|> > insertion of a vector into a vector, and so forth in log(n) time and
|> > O(n) switches if you allow pipelining is pretty obvious. Building on
|> > that, log(n) time sorting with O(n) elements with pipelining is pretty
|> > obvious. I just spent hours waiting on a suffix array because my
|> > computer is badly designed. Why?
|>
|> Err, are you suggesting that processors should have instructions for
|> doing these operations? Because I'm afraid that viewpoint is now sadly
|> out of date. That kind of approach was tried, but it was found that the
|> instructions were often not usable, for a wide assortment of reasons.
|> One always feels that one's own problem is the simple and obvious case,
|> but this is not normally true. And trying to design processors with
|> sufficiently varied instruction sets to handle everyone's needs with
|> purpose-designed instructions doesn't seem to be very practical.

That is, indeed, true, and the reason the misapprehension is widespread
is due to the sorry state of "computer science" "complexity theory".
Real mathematical complexity theory is fine, but does not pretend to
match computer hardware; it is the part that claims to be applied that
I am damning.

Back in the late 1960s and all the 1970s, a few of us were saying that
we should NOT be measuring KFlops but KMaccs (memory accesses); even
now, that mindset is not universal. And 90% of the double-quoted
nonsense uses completely unrealistic and often inconsistent models.
If you do an analysis using a simple, but more realistic model[*], you
get VERY different complexities for most such operations. But such
approaches are VERY hard to get published, as the establishment is not
keen to have people point out that it has been bullshitting for many
decades ....

So most of the books on algorithms are misleading to the point of being
actually incorrect, which is almost certainly where the original poster
got his misapprehensions from.

|> Looking up the history of "RISC" should explain it tolerably well. The
|> Wikipedia article at http://en.wikipedia.org/wiki/RISC seems to cover
|> the basics.

Er, no, but I agree that it is entirely relevant. However, if you don't
understand the issues, it will merely get you more confused, because you
may get taken in by the RISC dogmas rather than the RISC engineering
philosophy.

[*] E.g. by counting the number of non-sequential, non-repeated, scalar
memory accesses - i.e. those that aren't amenable to optimisation by
caching or prefetching. That gives very good results for operations
like sorting.

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: Historical question, what went wrong with bubble memory?
Next: Need SimpleScalar GCC Compiler