Big OOO, SpMT, and possible designs (Was Re: Free/Open x86 Sim) [Computer Architecture]

Prev: A post to comp.risks that everyone on comp.arch should read
Next: Call for papers : HPCS-10, USA, July 2010

From: Andy 'Krazy' Glew on 20 May 2010 02:11

On 5/19/2010 1:00 PM, Kai Harrekilde-Petersen wrote:
> Andy 'Krazy' Glew<ag-news(a)patten-glew.net> writes:

>> > I want to work on the computers that will run my contact lens
>> > displays. They gotta be low power. (Unless you want to extract circa
>> > 10W from the body somehow - buy our wearable computer system and lose
>> > weight!)
> Even if extracting 10W from the body was doable, you'd still have the
> formidable task of making sure that the dissipation of your 10W
> contact lens don't burn your eyes into charcoal.
>
> For contact-lens sized computers, I'd say significantly less than
> 1W. After all, 1W is a lot of heat, when applied directly to your
> skin!

A wearable computer system with contact lens displays would not be dissipating 10W into your eyeballs. The contact lens
displays would have to be much lower power, with the major power lying in a wearable PC somewhere else - in your
clothes? in your backpack? your fannypack?

There are good arguments that glasses mounted displays for such truly personal computer systems will come first and will
dominate contact lenses for quite a while: (1) ease of communicating data from the PCs scattered about your person, and
(2) heat.

From: Andy 'Krazy' Glew on 20 May 2010 02:23

On 5/19/2010 10:16 AM, Robert Myers wrote:

> To be fair, IBM nearly burned itself to the ground with cost-no-object
> research, Intel made a couple of ballsy bets that it survived only
> because it is Intel, and the government has completely run out of
> money for anything except trying to explain and fix the mistakes of
> its second-stringers and political hacks. The problem, though, is not
> that there are no possibilities left to explore.

I assume that the bets you say Intel survived were

(1) Itanium (VLIW)

(2) Pentium 4 (high frequency "fireball")

Neither were bets on out-of-order instruction window size or microarchitecture. In fact, both were bets against OOO
microarchitecture.

For Itanium this is obvious.

For the Willamette "fireball" it is less obvious, but as someone who was there at the time that Willamette started, I
avow: Willamette was started with a viewpoint contrary to out-of-order. The first designs were
semi-statically-scheduled (scheduled by the front-end) and thereafter in-order or run-ahead. Out-of-order only crept
back into Willamette when these other approaches failed. But at no time was out-of-order the primary agenda item for
Willamette. The out-of-order microarchitecture work for Willamette was not about advancing the state of the art of
out-of-order; it was about keeping up with Willamette's fireball frequencies. Willamette's big microarchitecture bet
was multithreading, which, while not exactly opposed to OOO, is certainly not fully aligned.

At no time has Intel paid any heed to advancing the state of the art of out-of-order. Even Nehalem is derivative and
evolutionary. Next chips... ?

I must admit that I am puzzled as to why this happened. I thought that P6 showed (and is still showing) that OOO
microarchitecture can be successful. I would have expected Intel to bet on the proven winners, by doing it over again.
Didn't happen.

From: Andy 'Krazy' Glew on 20 May 2010 02:29

On 5/19/2010 11:23 PM, Andy 'Krazy' Glew wrote:

> I must admit that I am puzzled as to why this happened. I thought that
> P6 showed (and is still showing) that OOO microarchitecture can be
> successful. I would have expected Intel to bet on the proven winners, by
> doing it over again. Didn't happen.

One hypothesis, based on observation:

Many non-x86 processor companies failed at about this time:
DEC, IBM downsized, RISC.

Many refugees from these companies spread throughout the rest of the industry, including Intel and AMD, carrying their
attitudes that of course OOO could not be pushed further.

At the beginning of Willamette I remember Dave Sager coming back from an invitation only meeting - Copper Mountain? - of
computer architects who all agreed that OOO could not be pushed further. Nobody asked my opinion. And, I daresay, that
nobody at that conference had actually built a successful OOO processor; quite possibly, the only OOO experience at that
conference was with the PPC 670.

From: Ken Hagan on 20 May 2010 04:35

On Wed, 19 May 2010 22:12:00 +0100, ned <nedbrek(a)yahoo.com> wrote:

> The programs that matter are the ones customers have paid for (or paid
> to have developed). A lot of those programs are spaghetti. It's not
> our place to tell customers to rewrite their software. We are to serve
> them.

Up to a point. True, the processor's job is to run software, but the
software's job is to perform some function. If the software doesn't meet
the changing demands of the customer, it is perfectly reasonable to ask if
the software could be re-written rather than pray for a hardware miracle.

From: Morten Reistad on 20 May 2010 08:50

In article <ht1k70$lvu$1(a)news.eternal-september.org>,
ned <nedbrek(a)yahoo.com> wrote:
>nmm1(a)cam.ac.uk wrote:
>
>> In article <78b2b354-7835-4357-92e1-21700cc0c05a(a)z17g2000vbd.googlegroups.com>,
>> Robert Myers <rbmyersusa(a)gmail.com> wrote:
>>>On May 14, 12:11=A0am, MitchAlsup <MitchAl...(a)aol.com> wrote:

>> Only because they hamstrung themselves with the demented constraint
>> that the only programs that mattered were made up of C/C++ spaghetti,
>> written in as close to a pessimally efficient style as the O-O
>> dogmatists could get to. Remove that, and there is still room to
>> evolve.
>
>The programs that matter are the ones customers have paid for (or paid
>to have developed). A lot of those programs are spaghetti. It's not
>our place to tell customers to rewrite their software. We are to serve
>them.

Yes, the compatibility argument is important. But noone can do magic.
We are now at the end, or pretty close, of the rope regarding
single processor performance on von-neumann computers. We used
pipelining, oo execution and lots of other tricks to push this
envelope a hundredfold or more. Now we are up against handling the
logic expression in the code.

There is no substitute for attacking that from the software side,
but we still have to support the old code. So, we may try to encapsulate
the single threaded code into virtual machines, and have many of those.

We still have to attack the three walls, memory, energy and latency; and
the optima for coping with those are definatly not one of the corners
of the graph, where the old software resides.

>>>When the race for single-threaded performance was still on, the path
>>>to further evolution seemed fairly obvious to me: open up the
>>>instruction window, get more instructions in flight, and let the
>>>processor claw its way forward by finding opportunities to speculate,
>>>perhaps by inspired guesses as to where to start a new (hardware-
>>>generated or compiler-hinted) thread, fixing the errors of mis-
>>>speculation on the fly.
>>>
>>>That is to say, I would have put my money on the hardware guys to find
>>>parallelism in the instruction stream while software guys were still
>>>dithering about language aesthetics. I thought this way all the way
>>>up to the 90nm step for the P4.
>>
>> As you know, I didn't. The performance/clock factor (which is what
>> the architecture delivers) hasn't improved much.

But it kept up, and designeds managed to compensate for the effects
of getting closet to the latency wall.

>Near the end of my uarch career, I came to realize that much of "the
>game" is keeping perf/clock from collapsing while ramping clock. At
>least, that is about the only thing that has been successful.

Indeed.

In the mean time the software guys took a big step backwards
in terms of speed by going for java, running in a virtual machine
that is software-controlled; and has a fantastically big footprint
compared to regular compiled code.

Java has about 15 times as big a footprint (in cache, memory etc) as
klh10, the emulator for the PDP10.

I have been remotely involved in around 50 java based server setups,
around 5 with r€a££¥ £arg€ budg€t$. None managed to perform
to the satisfaction of their owners. Some were fudged by external
assists (like caches, multiple databases etc). Around 1/3rd failed
outright on performance.

-- mrr

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: A post to comp.risks that everyone on comp.arch should read
Next: Call for papers : HPCS-10, USA, July 2010