From: Quadibloc on
On Apr 21, 7:46 pm, Bengt Larsson <bengtl8....(a)telia.NOSPAMcom> wrote:

> Indeed, HP decides how long Itanium is alive. What else could they
> use? x86?

Well, now that the x86 architecture has available for it the same
mainframe-like RAS features that Itanium had all along, that, at
least, is a possibility.

John Savard
From: Bengt Larsson on
Robert Myers <rbmyersusa(a)gmail.com> wrote:

>On Apr 20, 9:19�pm, timcaff...(a)aol.com (Tim McCaffrey) wrote:
>
>>
>> But, it was pretty obvious that Itanium was dead 4 or 5 years ago. �Why is
>> Intel still wasting money?
>>
>
>Because Itanium isn't dead. HP appears to be doing just fine with it.

Indeed, HP decides how long Itanium is alive. What else could they
use? x86? HP are doing the same thing as when they did HP-PA, except
they have outsourced microarchitecture design to Intel.
From: Morten Reistad on
In article <4BCB4C2A.8080601(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>On 4/18/2010 1:36 AM, nmm1(a)cam.ac.uk wrote:
>> As I have
>> posted before, I favour a heterogeneous design on-chip:
>>
>> Essentially uninteruptible, user-mode only, out-of-order CPUs
>> for applications etc.
>> Interuptible, system-mode capable, in-order CPUs for the kernel
>> and its daemons.
>
>This is almost opposite what I would expect.
>
>Out-of-order tends to benefit OS code more than many user codes. In-order coherent threading benefits manly fairly
>stupid codes that run in user space, like multimedia.
>
>I would guess that you are motivated by something like the following:
>
>System code tends to have unpredictable branches, which hurt many OOO machines.
>
>System code you may want to be able to respond to interrupts easily. I am guessing that you believe that OOO has worse
>interrupt latency. That is a misconception: OOO tends to have better interrupt latency, since they usually redirect to
>the interrupt handler at retirement. However, they lose more work.

...... interesting perspectives deleted ....

This general approach about throwing resources at the cpu and at
the compiler so we can work around all kinds of stalls has rapidly
diminishing returns at this point, with our deep pipelines, pretty
large 2-4 levels of cache, and code that is written without regard
to deep parallellism.

We can win the battle, but we will lose the war if we continue down
that path. We must the facts sink in, and that is that the two main
challenges for modern processing are the "memory wall" and the "watt
per mips" challenge.

The memory wall is a profound problem, but bigger and better caches
can alleviate it. At the current point, I mean lots and lots of
caches, and well interconnected ones too.

Return to the risc mindset, and back down a little regarding cpu
power, and rather give us lots of them, and lots and lots of cache.

It is amazing how well that works.

Then we will have to adapt software, which happens pretty fast
in the Open Source world nowadays, when there are real performance
gains to be had.

For the licensing problems, specificially windows, perhaps a
hypervisor can address that, and keep the core systems like databases,
transaction servers etc. running either under some second OS
or directly under the hypervisor, and let windows be a window
onto the user code. And I am sure licensing will be adapted
if such designs threaten the revenue stream.

For the recalcitrant, single thread code I would suggest taking
the autotranslation path. Recode-on-the-fly. The Alpha team
and Transmeta has proven that this is viable.

Or, we may keep a 2-core standard chip for the monolithic
code, and add a dozen smaller cores and a big cache for the
stuff that is already parallellized. This seems like the
path the gpu-coders are taking. Just integrate the GPUs
with the rest of the system, and add a hypervisor.


-- mrr
From: nmm1 on
In article <4b0ga7-iqg.ln1(a)laptop.reistad.name>,
Morten Reistad <first(a)last.name> wrote:
>In article <4BCB4C2A.8080601(a)patten-glew.net>,
>Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>
>This general approach about throwing resources at the cpu and at
>the compiler so we can work around all kinds of stalls has rapidly
>diminishing returns at this point, with our deep pipelines, pretty
>large 2-4 levels of cache, and code that is written without regard
>to deep parallellism.
>
>We can win the battle, but we will lose the war if we continue down
>that path. We must the facts sink in, and that is that the two main
>challenges for modern processing are the "memory wall" and the "watt
>per mips" challenge.

Agreed. And we must face up to the fact that a critical part of the
problem is that most of the programming languages and paradigms are
unsuitable for modern systems (as well as being dire for RAS).

>The memory wall is a profound problem, but bigger and better caches
>can alleviate it. At the current point, I mean lots and lots of
>caches, and well interconnected ones too.

I like preloading, but that needs a language and programming paradigm
where reasonably reliable preloading is feasible. We know that it
can be done, for some programs, and there are known techniques to
extend it (though not to all programs, of course).

>Return to the risc mindset, and back down a little regarding cpu
>power, and rather give us lots of them, and lots and lots of cache.
>
>It is amazing how well that works.
>
>Then we will have to adapt software, which happens pretty fast
>in the Open Source world nowadays, when there are real performance
>gains to be had.

Don't bet on it :-( Changing the generated code, yes; changing the
language, usually; changing the language concepts and programming
paradigms, no.


Regards,
Nick Maclaren.
From: Morten Reistad on
In article <8u3s97-9bt2.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>nmm1(a)cam.ac.uk wrote:
>> Well, yes, but that's no different from any other choice. As I have
>> posted before, I favour a heterogeneous design on-chip:
>>
>> Essentially uninteruptible, user-mode only, out-of-order CPUs
>> for applications etc.
>> Interuptible, system-mode capable, in-order CPUs for the kernel
>> and its daemons.
>
>This forces the OS to effectively become a message-passing system, since
>every single os call would otherwise require a pair of migrations
>between the two types of cpus.

With modern transaction systems, somewhat loosely defined, like most
of the kernels, database and server code we will already have to act
as a message multiplexer between subsystems. It then becomes critical
to arbitrate and schedule the code on the right cpus, and get the access
to the right bits of cache.

Which is very close to actually doing it as message passing through
a blazingly fast fifo in the first place.

>I'm not saying this would be bad though, since actual data could still
>be passed as pointers...

It would possibly save a copy operation or two, but you still have
to do the cache and scheduling operations upon reference.

The time may have come for message passing systems.

-- mrr