Big OOO, SpMT, and possible designs [Computer Architecture]

Prev: FAKE CONFERENCE Call for papers : HPCS-10, USA, July 2010
Next: No lock for bt instruction ?

From: ned on 17 May 2010 21:43

Hello all,

nmm1(a)cam.ac.uk wrote:

> In article <hsr6qe$i8f$1(a)news.eternal-september.org>,
> nedbrek <nedbrek(a)yahoo.com> wrote:
>>For example:

Don't think in terms of 1 or 2 big cores vs. 4 or 8 small cores. Think
32 "big" cores vs. 64 or 128 small cores. In this region, the many
core curve is much flatter.

>>2) More cores use more bandwidth. Bandwidth can be expensive (additional
>>memory controllers and RAM chips all burn lots of power). You can think of
>>OOO as a technique to get more performance per memory access.
>
> Sorry, but that is NOT true. X performance on Y cores needs precisely
> the same bandwidth as XY performance on a single core, all other factors
> being the same. You are correct that some attempts at multithreading
> serial code increase the bandwidth requirement, but that's an artifact
> of the current approaches, and is dubiously a general rule.

If you think in terms of "I want 1e9 FMACs, thats 3 loads, and 1 store
per" - yes. But remember the bandwidth requirements for Istream, and
commo overhead.

>>3) Off chip bandwidth costs pins. Pins are expensive in themselves (and
>>limited). They also burn lots of power.
>
> True. But that's irrelevant to whether the chip has lots of slow
> cores or one fast one.

But if 4x cores require 2x bandwidth - you need 2x pins (or 2x faster
pins). And you need 2x the memory channels, and 2x the minimum memory
config.

Ned

From: Robert Myers on 17 May 2010 23:53

On May 17, 8:45 pm, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote:

>
> Let's say you want to compare two scenarios
> a) Running at Pactive_hi for a small fraction of time, %T_active_hi
> and sleeping at a low power Psleep for the rest of the time
> b) always running at Pactive_lo
>
> I.e. you are comparing
>
> Pactive_lo
> to
> Pactive_hi*%Tactive_hi + (1-Pactive_hi)*Psleep + Ntrans * Ptrans
>
> You just solve the inequality, and find the situations in which it makes sense to power down.
>
> Hint: often %Tactive_hi can be < 1% of the time.
> And Pactive_lo is often > 1/6th of Pactive_hi.

Thanks for spelling that out.

Robert.

From: Torben �gidius Mogensen on 20 May 2010 05:02

nmm1(a)cam.ac.uk writes:

> In article <78b2b354-7835-4357-92e1-21700cc0c05a(a)z17g2000vbd.googlegroups.com>,
> Robert Myers <rbmyersusa(a)gmail.com> wrote:
>>On May 14, 12:11=A0am, MitchAlsup <MitchAl...(a)aol.com> wrote:
>>
>>> Thus, I conclude that:
>>> 6) running out of space to evolved killed of microarchitectural
>>> inovation.
>
> Only because they hamstrung themselves with the demented constraint
> that the only programs that mattered were made up of C/C++ spaghetti,
> written in as close to a pessimally efficient style as the O-O
> dogmatists could get to. Remove that, and there is still room to
> evolve.

Yup. The requirement to look good on benchmarks written in sequential,
hard-to-parallelise C has made it really hard to market parallel
computers except for multicores that essentially run each their own
sequential programs.

I guess it is a consequence of hardware now being much cheaper than
software. The cost of rewriting software can be much higher than the
cost of having to buy more expensive and slower hardware.

Now, if APL had been the dominant language through the 1990s, we would
have seen very different hardware now.

Torben

From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 20 May 2010 07:40

Andy 'Krazy' Glew <ag-news(a)patten-glew.net> wrote:

> At the beginning of Willamette I remember Dave Sager coming back from an
> invitation only meeting - Copper Mountain? - of computer architects who
> all agreed that OOO could not be pushed further. Nobody asked my opinion.
> And, I daresay, that nobody at that conference had actually built a
> successful OOO processor; quite possibly, the only OOO experience at that
> conference was with the PPC 670.

Did you mean PPC 630?

--
Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark

From: Morten Reistad on 20 May 2010 08:56

In article <7zfx1neyk0.fsf(a)ask.diku.dk>,
Torben �gidius Mogensen <torbenm(a)diku.dk> wrote:
>nmm1(a)cam.ac.uk writes:
>
>> In article <78b2b354-7835-4357-92e1-21700cc0c05a(a)z17g2000vbd.googlegroups.com>,
>> Robert Myers <rbmyersusa(a)gmail.com> wrote:
>>>On May 14, 12:11=A0am, MitchAlsup <MitchAl...(a)aol.com> wrote:
>>>
>>>> Thus, I conclude that:
>>>> 6) running out of space to evolved killed of microarchitectural
>>>> inovation.
>>
>> Only because they hamstrung themselves with the demented constraint
>> that the only programs that mattered were made up of C/C++ spaghetti,
>> written in as close to a pessimally efficient style as the O-O
>> dogmatists could get to. Remove that, and there is still room to
>> evolve.
>
>Yup. The requirement to look good on benchmarks written in sequential,
>hard-to-parallelise C has made it really hard to market parallel
>computers except for multicores that essentially run each their own
>sequential programs.

May I suggest apache and squid web page serving, mysql lookups and
stores, and phone calls through asterisk as benchmarks?

All of these are well multithreaded, and pretty fine tuned to keep
down locking.

I work a bit with the latter. The test is simple; how many alaw based
calls can you switch through asterisk and keep a synthetic mos score
of 3.7 (the normal benchmark for an ISDN call).

We have done 12700. The limiting factors are the linux to hardware
layer.

>I guess it is a consequence of hardware now being much cheaper than
>software. The cost of rewriting software can be much higher than the
>cost of having to buy more expensive and slower hardware.
>
>Now, if APL had been the dominant language through the 1990s, we would
>have seen very different hardware now.

Java became the dominant language with their special requirements.
We have seen very little hardware to support that.

-- mrr

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: FAKE CONFERENCE Call for papers : HPCS-10, USA, July 2010
Next: No lock for bt instruction ?