From: nedbrek on
Hello all,

"MitchAlsup" <MitchAlsup(a)aol.com> wrote in message
news:26c1c35a-d687-4bc7-82fd-0eef2df0f714(a)c7g2000vbc.googlegroups.com...
> There is risk in taking a few steps backwards, but there is reward in
> repositioning a microarchitecture so that it is tuned at the
> performace per power end of the spectrum. And I think it is unlikely
> that we can evolve from the Great Big OoO machines to the medium OoO
> (or partialy ordered) machines the battery powered world wants/needs.
> If the x86 people are not going to figure out how to get into the 10mW-
> to-50mW power envelope with decent performance, then they are leaving
> a very big vacuum in which new architectures and microarchitectures
> will find fruitful hunting grounds. I sure would like to be on a team
> that did put x86s into this kind of power spectrum, and I think one
> could get a good deal of the performance of a modern low power (5W)
> laptop into that power range, given that as the overriding goal of the
> project.

According to
http://arstechnica.com/gadgets/news/2010/05/intel-fires-opening-salvo-in-x86-vs-arm-smartphone-wars.ars
Atom is now at idle 21-23 mW. Assuming idle is half of normal (total power
is half leakage, half dynamic), that gives ~50 mW. Of course, this is
probably some low voltage, power gated state...

Power envelope can be a moving target. Batteries get better, process
(sometimes) gets better. Give things a while, and what used to not fit,
suddenly does.

Does anyone want to talk Atom? I don't feel comfortable saying anything...
they are multi-threaded in-order. Right direction? Temporary half-step?

Ned


From: nedbrek on
Hello all,

<nmm1(a)cam.ac.uk> wrote in message
news:hskb6d$t96$1(a)smaug.linux.pwf.cam.ac.uk...
> In article
> <26c1c35a-d687-4bc7-82fd-0eef2df0f714(a)c7g2000vbc.googlegroups.com>,
> MitchAlsup <MitchAlsup(a)aol.com> wrote:
>>
>>I am saying that the way forward will necessarily take a step
>>backwards to less deep pipes and less wide windows and less overall
>>complexity in order to surmount (or better optimize for) the power
>>wall.
>
> It's also better for RAS and parallelism!

Aggressive in-order is bad for RAS (Itanium). Parallelism is a broad term,
obviously it is not better for extracting parallelism in a single thread! :)

Ned


From: nmm1 on
In article <hslv9c$oav$1(a)news.eternal-september.org>,
nedbrek <nedbrek(a)yahoo.com> wrote:
><nmm1(a)cam.ac.uk> wrote in message
>news:hskb6d$t96$1(a)smaug.linux.pwf.cam.ac.uk...
>> In article
>> <26c1c35a-d687-4bc7-82fd-0eef2df0f714(a)c7g2000vbc.googlegroups.com>,
>> MitchAlsup <MitchAlsup(a)aol.com> wrote:
>>>
>>>I am saying that the way forward will necessarily take a step
>>>backwards to less deep pipes and less wide windows and less overall
>>>complexity in order to surmount (or better optimize for) the power
>>>wall.
>>
>> It's also better for RAS and parallelism!
>
>Aggressive in-order is bad for RAS (Itanium). Parallelism is a broad term,
>obviously it is not better for extracting parallelism in a single thread! :)

Eh?

The Itanic is most definitely NOT aggressively in-order, and its
most aggressively out-of-order aspect was dropped with the Merced.
The fact that its out-of-order properties are entirely different
to the now widespread behind-the-scenes action reordering doesn't
change that.

And I can assure you that it is NOT obvious that you get better
single-thread parallelism by deeper pipes and more complexity, or
even wider windows. It just looks that way at first glance.


Regards,
Nick Maclaren.
From: Bernd Paysan on
Andy 'Krazy' Glew wrote:
> 7) Cost of fabs. High cost => risk aversion.

Moore's Law on fab costs (going up exponentially) leads to the situation
that design houses and fabs separate. Intel is about the only company
left where design and fab are closely tied together - the other
companies with both design and fab capability at least offer fab service
to third parties.

So where's the risk? When you are a fabless design house, a state-of-
the-art process project costs you a few millions external costs plus
internal R&D. This is pretty affordable. If you are not fabless, the
same project costs you a few billions fab costs, plus internal R&D.
This is basically the death for anyone with less market power than
Intel, and therefore, they all don't do it anymore.

For this situation, ARM has not only the right product (targeted at low
power), it also has the right way to distribute that product (as IP,
including Verilog sources for those who want to tinker with it).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
From: Andy 'Krazy' Glew on
On 5/15/2010 4:44 AM, nedbrek wrote:
> "Andy 'Krazy' Glew"<ag-news(a)patten-glew.net> wrote in message news:4BED6F7D.9040001(a)patten-glew.net...
>> On 5/14/2010 5:31 AM, nedbrek wrote:

>> But: when have academics ever really done advanced microarchitecture
>> research? Where did OOO come from: well, actually, Yale Patt and his
>> students kept OOO alive with HPSm, and advanced it. But it certainly was
>> not a popular academic field, it was a single research group. The hand-me-down
>> stories is that OOO was discriminated against for years, and had to
>> publish at second tier conferences.
>
> I know CMU worked on a uop scheme for VAX. Do you know if Bob Colwell was
> exposed to that while doing his graduate work? That would seem like a
> pretty big boost from academia... I can imagine this work getting panned in
> the journals.

I have no recollection of such CMU work, or of Bob talking about it during P6.

>
> Jim Smith did 2bit branch predictors, Wikipedia says while at CDC. I'm so
> used to him being an academic... Yale and Patt take credit for the gshare
> scheme, although that looks like independent invention.

Yale *and* Patt? I think you mean Yeh and Patt, for gselect.

AFAIK gshare is due to McFarling.


> Rotenberg gets credit for trace cache. I don't think we can blame the
> failure of the P4 trace cache on him... It seemed like pretty popular stuff
> in most circles.

Uri Weiser and Alex Peleg's patent is the first publication (if you consider a patent a publication; the law does) on
trace cache. From wikipedia:

The earliest widely acknowledged academic publication of trace cache
was by Eric Rotenberg, Steve Bennett, and Jim Smith in their 1996 paper
"Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching."

An earlier publication is US Patent 5,381,533,
"Dynamic flow instruction cache memory organized around trace segments independent of virtual address line",
by Alex Peleg and Uri Weiser of Intel Corp., patent filed March 30, 1994,
a continuation of an application filed in 1992, later abandoned.

My own work in trace caches was done 1987-1990, and I brought it to Intel, where it was evaluated for the P6.
Admittedly, I was a part-time grad student under Hwu at the University of Illinois at the time, working for some of that
time at Gould and Motorola. Peleg and Weiser may have gotten the patent, but I am reasonably sure that I originated the
term "trace cache".




> But you are mostly right. Academics are no more immune to bandwagon fever
> than industry. It is worse, because academics are not constrained by
> economics or even (depending on their models) physics.

Tenure gives an academic the economic freedom to pursue unppular lines of research.

Pre-tenure, the young academic must seek popular topics.