From: nmm1 on
In article <4ADFDC40.6060001(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>Mayan Moudgill wrote:
>> About making windows bigger: my last work on this subject is a bit
>> dated, but, at that time, for most workloads, you pretty soon hit a
>> point of exponentially smaller returns. Path mispredicts & cache misses
>> were a couple of the gating factors, but so were niggling little details
>> such as store-queue sizes, retire resources & rename buffer sizes. There
>> is also the nasty issue of cache pollution on mispredicted paths.
>
>Not what I have seen. Unless square root law is what you call
>diminishing returns. Which it is, but there is a big difference between
>square root, and worse.

Very, very true.

>Branch prediction:
>
>(1) branch predictors *have* gotten a lot better, and will continue to
>get better for quite a few more years. Seznec's OLGEH predictor opened
>up a whole new family of predictors, with extremely long history. The
>multilevel branch predictor techniques - some of which I pioneered but
>did not publish (apart from a thesis proposal for the Ph.D. I abandoned
>when my daighter was born; which Adam Butts also pushed; and which
>Daniel Jimenez published in combination with his neural net predictor -
>provide a way in which you can get the accuracy of a big predictor, but
>the latency of a small predictor.

Multi-level branch prediction for a fixed number of simultaneous
paths gives a log law return, which is much, much worse than square
root.

>(2) In a really large window, many, most, branches that are predicted
>are actually repaired before the instruction retires.

Oh, you can do extremely well - if you can afford to increase the
number of simultaneous paths according to an exponential law in the
size of the window. Now, back in the real world ....

>(3) Recall that I am a fan of skip-ahead, speculative multithreading
>architectures such as Haitham Akkary's DMT. If you can't predict a
>branch, skip ahead to the next loop iteration or function return, and
>execute code that you know will be executed with high probability.
>Control independence.

You are STILL thinking serially! In a hell of a lot of codes, the
simultaneous paths don't remerge until after an arbitrary amount of
other code has been executed. That includes complicated O-O methods,
but is not limited to that.

>Cache misses:
>
>That's the whole point: you want to get as many cache misses outstanding
>as possible. MLP. Memory level parallelism.

It's one approach. However, taking a program with X cache misses
per Y executed instructions and converting it to one with A.X cache
misses per A.Y executed instructions (of which only Y are used)
isn't a great gain - except in benchmarketing terms :-(

>If you are serialized on the cache misses, e.g. in a linear linked list
>
>a) skip ahead to a piece of code that isn't. E.g. if you are pointer
>chasing in an inner loop, skip ahead to the next iteration of an outer
>loop. Or, to a next function.

Er, side-effects in the code you have skipped, which will generally
(in C++-like languages) include function calls?

>OK, so linear linked lists are not a problem. What remains problematic
>are data structures that have high valency - e.g. a B-tree with 70
>children per node. Visited randomly. Hash tables that are accessed,
>one after the other: hash(A0) -> A1 => hash(A1) -> A2 => hash(A3) -> ...
> I don't know how to solve such "high valency" or "hash chasing" MLP
>problems. Except by skipping ahead, or running another thread.

And they're not the only important examples of such unpredictable
codes.

> > There is also the nasty issue of cache pollution on mispredicted paths.
>
>Again, not a problem I have seen, once you (a) have good branch
>predictors, (b) have a good memory dependency predictor, and (c) can
>skip ahead to code that you are more confident of.

Well, it's one I have seen, BADLY. Your conditions are right, but
your conjunction is wrong. "Once" should be "if". 'Tain't always
possible, and some of the codes where it isn't are the ones that
many important customers most want to optimise.

In particular, the programming styles used to write the GUIs and
'Internet' applications that send people up the wall with their
slowness (just as bad as it was in the 1960s, in many cases) fall
into that category. That's bad news.


Regards,
Nick Maclaren.
From: Andrew Reilly on
On Wed, 21 Oct 2009 17:31:08 -0700, Robert Myers wrote:

> On Oct 21, 6:32 pm, Andrew Reilly <andrew-newsp...(a)areilly.bpc-
> users.org> wrote:
>> On Wed, 21 Oct 2009 13:56:13 -0700, Robert Myers wrote:
>> > As crazy as that sounds, it's the only way I can make sense of
>> > Intel's idea that Itanium would replace x86 as a desktop chip.
>>
>> I don't think that it's as crazy as it sounds (today).  At the time
>> Microsoft had Windows NT running on MIPS and Alpha as well as x86: how
>> much effort would it be to run all of the other stuff through the
>> compiler too?
>
> Maybe Rob Warnock would tell us a little more candidly than he did
> before just how hard it was to wring those SpecFP numbers out of
> Itanium. I think he already said you needed a black belt in compiling,
> or something like that.

I'm sorry. I think that I completely missed your point (or vice-versa).
I don't believe that winning Spec performance numbers has anything to do
with success on the desktop. If intel had won race to 64 bits *and* had
a good backwards-compatibility story, and Microsoft had gone along for
the ride, (ie, both had wanted to), they'd have done it. I don't think
that the Itanium plan makes for a compelling laptop processor, though.

> If you accept that proposition, then all you need to do is to get enough
> code to run well to convince everyone else that it's either make their
> code do well on the architecture or die. I'm pretty sure that Intel
> tried to convince developers that that was the future they should
> prepare for.

They can't have tried very hard, because it was pretty clear even then
that portability was important for client machines, and I don't recall
*ever* hearing a low-power story associated with ia64.

>> > To add spice to the mix of speculation, I suspect that Microsoft
>> > would have been salivating at the prospect, as it would have been a
>> > one-time opportunity for Microsoft, albeit with a huge expenditure of
>> > resources, to seal the doom of open source.
>>
>> How so?  Open source runs fine on the Itanium, in general.  (I think
>> that most of the large SGI Itanium boxes only run Linux, right?)
>
> RedHat Enterprise still supports Itanium, so far as I know. Open source
> depends on gcc, perhaps the cruftiest bit of code on the planet. Yes,
> gcc will run on Itanium, but with what level of performance?

Didn't SGI open-source their own in-house itanium compiler, (open64 or
something like that)? Intel have their own compiler of course, and
that's both Linux and gcc compatible. Not sure if llvm does itanium. I
don't think that it was significant at the time, though.

> Could the
> open source community, essentially founded on x86, turn on a dime and
> compete with Microsoft running away with Itanium? Maybe with IBM's
> muscle behind Linux, open source would have stood a chance, but I'm not
> so sure. After all, IBM would always have preferred an Itanium-free
> world. Had I been at Microsoft, I might have seen a Wintanium future as
> really attractive.

I still don't understand the argument. Sure, itanium wasn't going to
make anything any easier for open source, but there doesn't seem to be
any particular win for Microsoft, either. It has never seemed to me that
winning benchmarks was ever the argument for using open source, anyway.

> It's true: Itanium never came close to meeting hardware goals, but
> whether Itanium was even possible at all was as much a matter of
> software and compilers as it was a matter of hardware, and Microsoft is
> all about software.

Sure, but they're not the *only* folk who are about software, and I don't
recall them being any kind of force behind itanium compiler tech. That
was coming from intel, HP, SGI and a few others.

--
Andrew
From: Ken Hagan on
On Thu, 22 Oct 2009 01:31:08 +0100, Robert Myers <rbmyersusa(a)gmail.com>
wrote:

> Yes, gcc will run on Itanium, but with what level of performance?

Non-zero, which is infinitely better than the alternative. (see below)

> Could the open source community, essentially founded on
> x86, turn on a dime and compete with Microsoft running away with
> Itanium?

How could they not?

The Windows eco-system is closed source. *Most* of Microsoft's customers
are not in a position to obtain recompiled versions of all their software.
In most cases, even the original vendor no longer has the source and it
doesn't matter how good Microsoft's compiler is if you don't have anything
to feed to it. Where the source exists, vendors regard recompilation as
a resale opportunity.

End-users will judge ANY rival architecture by its benchmark performance
when running an x86 binary, because *that's what they've got* and they've
typically invested far more in software than hardware.

The only remotely sane game plan for ia64 was that a combination of binary
translation for the apps plus vastly superior native performance for the
OS core would trump anything x86 could deliver. But OoO x86-es worked well
and the dream compilers for ia64 didn't, so it was game over.
From: Bill Todd on
nmm1(a)cam.ac.uk wrote:
> In article <1b3a5ckrqn.fsf(a)snowball.wb.pfeifferfamily.net>,
> Joe Pfeiffer <pfeiffer(a)cs.nmsu.edu> wrote:
>> Robert Myers <rbmyersusa(a)gmail.com> writes:
>>> On Oct 21, 8:16 pm, Bill Todd <billt...(a)metrocast.net> wrote:

....

>>>> Did you forget that the original plan (implemented in Merced and I'm
>>>> pretty sure McKinley as well) was to include x86 hardware on the chip to
>>>> run existing code natively?
>
> It wasn't in the original plan. It was in the first post-panic
> redesign.

If that's indeed the case, my bad. My impression is that it was already
on the chip when samples first came out in 1999, but I suppose that
decision could have occurred as late as 1997.

....

> The original plan was that ISA translation technology was advancing
> fast enough that they could convert x86 code to IA64 code and beat
> the best x86s by a factor of three. Like Alpha, only more so.

While the Alpha on-the-fly translation work was likely pretty mature by
1994-5, how much knowledge Intel would have had of it may be debatable.

- bill
From: Bill Todd on
Robert Myers wrote:
> On Oct 22, 12:00 am, Bill Todd <billt...(a)metrocast.net> wrote:
>> Robert Myers wrote:
>
>>> The die area may have been available, but I don't think the watts
>>> were. It's hard to remember with any accuracy what I knew when, but
>>> it's pretty easy to tell at least some of what Intel knew. By the
>>> second half of the nineties, Intel knew and briefed that power was
>>> going to be a big problem.
>> Not necessarily. Intel didn't have working silicon until some time in
>> 1998 and were holding out hope for power reductions before shipping
>> product well beyond that date (and further hope that McKinley would
>> achieve whatever power targets Merced failed to). The decision to
>> include the x86 core occurred far earlier (and Intel x86 cores at that
>> time were still relatively stingy in their power requirements).
>
> I'm sorry. I should have been more explicit. Intel never admitted
> that power was a problem for Itanium, but I have a Gelsinger
> (Otellini?) briefing somewhere that extrapolates emitted power per
> area to that of a space shuttle heat shield tile for x86, c.
> 1997-1998. It was clear that they were headed to multicore,
> especially since Patterson was a consultant and he'd been saying
> similar for several years by then.

That's true but doesn't bear upon the current discussion - in part
because it's a general statement pertaining to *all* architectures
rather than to the point under discussion here, in part because it's
looking forward beyond the time frames in question, in part because it
completely ignores what actually happened.

This particular sub-question involves whether in 1994-5, when the
"Itanic takes over the desktop not long after Y2K" strategy was being
touted (at least internally), Intel had any reason to believe that
running a full-fledged x86 core alongside an Itanic core on the same
chip would be impractical due to power concerns.

If Intel indeed believed in 1994-5 that multi-core approaches using
simpler, more efficient cores would be required *before there was room
on the Itanic die for a full-fledged x86 symbiotic core*, they most
likely would have taken steps to ship their first multi-core products
before early 2006 (the 65 nm Yonah; the mid-2005 'dual core' Pentium Ds
were in fact single core with two chips in the same package) or mid-2006
(the 90 nm Montecito Itanic). And neither of those Intel products (nor
for that matter any since that time) incorporated simplified cores to
reduce power consumption (unless you consider backing away from the
ill-conceived NetBurst architecture 'simplification'): rather, Intel
just didn't know what to do with the additional transistors that
continuing process shrinks made available other than place additional
traditional, complex cores on the chip - though in some cases they've
had to throttle cores back a bit to avoid generating more heat than can
be dissipated by conventional means and they've certainly taken steps to
help those complex cores operate more efficiently than they used to.

(In fact, the major change that took place during the decade-plus
between planning for Itanic desktop domination and Intel's introduction
of multi-core products was the explosion of relatively cool on-chip
cache, which helped dissipate additional power used by the core itself.)

Intel had even less reason to believe in 1994-5 that power would become
a real problem any time soon than developed later. It thought then that
Itanic would require *less* power than x86 by virtue of its simplified
(in-order) approach. Its x86 line had just introduced the relatively
efficient superscalar P5 architecture and was looking forward to the
similarly efficient OoO P6: AMD's K6 (the first competing x86 product
to challenge Intel's until-then-clear superiority) wouldn't ship until
1997, the even more threatening Athlon wouldn't ship until mid-1999, and
it thus seems unlikely that Intel had yet committed to the significantly
less efficient clock-rate-focused NetBurst architecture in 1994-5
(though it was starting to see Apple as competition back then).

There was plenty of room on the Itanic die for a full-fledged x86 core
at least by the time they got to the 130 nm (Madison) node, which was
still single-core. Had Itanic in fact used less power than even a
relatively efficient full-fledged x86 implementation (as was planned in
1994-5), and had Intel stuck with such relatively efficient full-fledged
x86 implementations (as probably also was planned in 1994-5), then the
resulting chip would have used significantly *less* power than Madison used.

So I see no reason whatsoever to suspect that in 1994-5 Intel would have
considered the power requirements of a symbiotic full-fledged x86 core
to be a problem for Itanic - though I'd certainly be receptive to
credible information that could challenge that view.

- bill