From: Bill Todd on
Terje Mathisen wrote:
> On Oct 23, 8:11 pm, ga...(a)allegro.com (Gavin Scott) wrote:
>> For PA-RISC capability HP had very high hopes for dynamic translation.
>> One slide from fairly early on suggests they expected to get to 50%
>> of native performance using translation. In reality they failed to
>> scrounge up enough cleverness to do it well, and the PA-RISC
>> compatibility on IPF has always been poor enough that the performance
>> is commonly considered unacceptable even for business applications.
>
> That's interesting:
>
> IA64 seemed to have a close to complete superset of all PA-RISC
> features/instructions, including some very funky address shift/
> combination operations specifically claimed to be there to support PS-
> RISC features.
>
> The register set was so much larger that it could be mapped
> statically.
>
> If all this hw support didn't get at least 50%, then the clock rate
> must have been very disappointing (which it was, right?).

In a general sense, yes - but not by comparison with PA-RISC's clock
rate, which was always slower (only about 2/3 the Itanic clock rate
since 2003).

- bill
From: Mayan Moudgill on
Terje Mathisen wrote:
> On Oct 23, 1:45 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:
>
>>Andy "Krazy" Glew wrote:
>>
>>>E.g. Terje, you're known to be a Larrabee fan. Can you vectorize CABAC?
>>
>>Not a chance.
>>
>>
>>>For example: divide the image up into subblocks, and run CABAC on each
>>>subblock in parallel.
>>
>>Problem is with the standard. H.264 specifies that the frame is CABAC
>>encoded.
>
>
> Not quite:
>
> H.264 defines two alternate encoding schemes, of which CABAC gets the
> better compression, but it is fully compliant to use the other (I
> don't remember the name of it) if the encoder wants to.

VLC.

IIRC, also defined on a per frame basis, also requires inherent serial
decoding (i.e. non-vectorizable), just has a better constant factor.

> However, since a decoder has to be able to handle CABAC as well, that
> limits the maximum bitrate that you can support in sw.
>
> Terje
From: jgd on
In article <46ednbx9zPp4JULXnZ2dnUVZ_tqdnZ2d(a)metrocastcablevision.com>,
billtodd(a)metrocast.net (Bill Todd) wrote:

> Why not? It ran x86 code natively in an integrated manner on a
> native Itanic OS. As with most things Merced the original cut wasn't
> impressive in terms of speed, but the relative sizes of the x86 and
> Itanic processors (especially given the amount of the chip area
> dedicated to cache) made it clear that full-fledged x86 cores could
> be included later if necessary as soon as the next process
> generations appeared.

I used it bit. On both Merced and McKinley, the x86 had about one-third
of the throughput of native Itanium code: I was benchmarking with the
same source built both ways. The reasons for the poor performance seemed
to be:

(a) It was an x86 front-end driving the Itanium back-end execution
units. This didn't allow for the kind of speculative and out-of-order
execution that was normal in the x86 world by that time with the Pentium
Pro/II/III family, Athlon and Pentium 4. You were dropping back to
something that was essentially a fast-clocked 486.

(b) At least under Windows, you had to go through a complete execution
transition to Itanium mode and back again on every system call. This was
kind of slow, and meant that running the compilers that ran on x86 and
generated Itanium code on an Itanium was much slower than 1/3
performance.

The only other Itanium platform I ever used was HP-UX, where the x86 was
not significant. By the time people were asking for our software on
Itanium Linux, our answer was "That's going to cost you more than you
are willing to pay."

The kind of guys who take pride in being corporate "power users", who
often drive uptake of technology, even if they don't have much insight
into it, hit severe problems with Itanium. They thought "Wow, here's
this amazing new 64-bit thing that also runs my MS Office work", got
one, and found that Office had slowed down a lot for them. That kind of
ego-driven customer really hates being wrong, and holds it against the
platform, rather than questioning their own judgement. By contrast,
AMD64 gave them just what they wanted. These people can be quite
significant, even if they are basically idiots: my employers used to
belong to EDS, and there were regular corporate edicts against buying
Alpha boxes in the nineties and Itania around the turn of the
millennium, to prevent those guys wasting money. If you had a real need
for the kit, and could explain why, you could buy them through the
company, but it was a long explanation each time.

--
John Dallman, jgd(a)cix.co.uk, HTML mail is treated as probable spam.
From: jgd on
In article <7kfe6aF39sdhsU1(a)mid.individual.net>,
delcecchiofthenorth(a)gmail.com (Del Cecchi) wrote:
> "Bill Todd" <billtodd(a)metrocast.net> wrote in message
> > Save for the grace of AMD it still might have: without a credible,
> > inexpensive, and pervasive 64-bit alternative Intel could have just
> > waited until desktops began to demand 64-bit processors.

Yup. I remain grateful to AMD for saving me from a lifetime of Itanium
low-level debugging.

> I don't put the death of PA-Risc at Itaniums door, since HP was from
> all appearances one of the parents of the Itanium architecture and
> perhaps the ones that sold it to Intel, rather than vice versa.
>
> They certainly were co-conspirators, so to speak.

As the Intel porting training course explained it in mid-1999, the
project had started as PA-RISC 3.0 at HP. HP had realised that it would
be too expensive to develop just for the PA-RISC replacement market, and
sought a partnership with Intel.

--
John Dallman, jgd(a)cix.co.uk, HTML mail is treated as probable spam.
From: Robert Myers on
On Oct 24, 3:08 am, Terje Mathisen <terje.wiig.mathi...(a)gmail.com>
wrote:
> On Oct 23, 8:11 pm, ga...(a)allegro.com (Gavin Scott) wrote:
>
> > For PA-RISC capability HP had very high hopes for dynamic translation.
> > One slide from fairly early on suggests they expected to get to 50%
> > of native performance using translation. In reality they failed to
> > scrounge up enough cleverness to do it well, and the PA-RISC
> > compatibility on IPF has always been poor enough that the performance
> > is commonly considered unacceptable even for business applications.
>
> That's interesting:
>
> IA64 seemed to have a close to complete superset of all PA-RISC
> features/instructions, including some very funky address shift/
> combination operations specifically claimed to be there to support PS-
> RISC features.
>
> The register set was so much larger that it could be mapped
> statically.
>
> If all this hw support didn't get at least 50%, then the clock rate
> must have been very disappointing (which it was, right?).
>
If I remember the numbers Anton provided, 50% per clock for untuned
code and a less than optimal compiler seems about right, even without
accounting for translation overhead, and I doubt that the existence of
a natural mapping to the instruction set provides much relief.

As I'm writing this, I'm wondering how code translators interact with
branch predictors. It seems like a hard problem to me, and Itanium
doesn't like surprises.

Robert.