From: Brett Davis on
In article <qejbb594tjah6s64vff144lickg1m5erat(a)4ax.com>,
Emil Naepflein <netnewsegn(a)kabelmail.de> wrote:

> On Sun, 20 Sep 2009 06:25:09 GMT, Brett Davis <ggtgp(a)yahoo.com> wrote:
>
> >Of course when adding PREFETCH slows down your code, that benefit is
> >academic.
>
> I don't agree here. About 10 years ago I did a lot of performance
> optimizations for TCP checksum and bcopy on R10K cpus. I got performance
> improvements for the this functions of up to 90 %, just by adding PREF
> instructions. In total this reduced cpu consumption per transfered TCP
> byte by about 30 %.

Now I have to point out that the MIPS and PowerPC CPUs I work on are
modern embedded designs, and that PREFETCH on these chips is useless.

The MIPS R10K was a nosebleed high end RISC chip, and likely implemented
PREFETCH in the memory/cache controller, as opposed to using up one of
the two read ports, and crippling your memory accesses.

As to why PREFETCH is useless on Intel chips, that is outside my
experience base. One would think Intel could go to the expense of
implementing PREFETCH correctly. It could be used as a benchmark win
against AMD.

> Of course, this also depends on your hardware, and whether you operate
> on data in cache or in memory, and how the memory is organized (UMA,
> NUMA, ...).
From: "Andy "Krazy" Glew" on
Mayan Moudgill wrote:
>
> I've been reading comp.arch off and on for more than 20 years now. In
> the past few years the SNR has deteriorated considerably, and I was
> wondering why. Maybe people who used to post at comp.arch are on other
> formums? Maybe its that I've gotten a little harder to impress? Then I
> thought about the quality of most papers at ISCA and Micro, the fact
> that both EDF and MPF have gone away, and I think the rot is not
> confined to just comp.arch.

Mayan, you would post this just as I am changing jobs, leaving Intel for
the second and last time. Not only have I been busy, but it probably
would not have been a smart thing for me to post while in transition.

But, the fact that I have left Intel says something: it says that I, at
least, don't see much opportunity to do interesting computer
architecture at Intel. Similarly, the fact that Mitch Alsup also posts
to this list, and is not at any CPU company that I am aware of, also
says something.


> So, whats going on? I'm sure part of it is that the latest generation of
> architects is talking at other sites.

If so, they haven't told me. (Sob!)

Dave Kanter may pitch realworldtech.com, and there's a lot of good stuff
there.

But as for me, I got my first real computer architecture job mainly
because Bob Colwell liked my posts on comp.arch. And I'll end it here.

In fact, making sure that I was allowed to post to comp.arch was a major
condition for me accepting my new job.


> However, equally important is that there are far fewer of them. The
> number of companies designing processors has gone down and there are
> fewer startups doing processors. So, less architects.

Certainly, fewer companies. Probably fewer teams, even though Intel now
has more teams than ever before doing CPUs: Oregon big-core, Israel
big-core, Atom and Lrb. Not to forget the Intel integrated graphics teams.

At Intel, there are probably more people called "computer architects"
now than ever before. But, the scope of the job has narrowed. There
are a dozen people, probably more, doing the job that I did as a single
person on P6.


> Within those processors there is less architecture (or micro
> architecture) being done; instead, the imperative that clock cycle has
> to be driven down leaves less levels of logic per cycle, which in turn
> means that the "architecture" has to be simpler. So, less to talk about.

Less architecture, I agree.

Not necessarily less levels of logic per cycle. The "right hand turn:
turned away from such high speed design as Willamette and Prescott.
Mitch can talk to this.



> There is less low-hanging fruit around; most of the simpler and
> obviously beneficial ideas are known, and most other ideas are more
> complex and harder to explain/utilize.

I believe that there are good new ideas, in both single processor
microarchitecture, and in multiprocessor.

But we are in a period of retrenchment - one of the downward zigs of the
"sawtooth wave" that I described in my Stanford EE380 talk so many years
ago.

There are several reasons for this, including

(1) what Fred Pollack called "The Valley of Death" for applications.
Many of the applications that I can imagine wanting improved - I want a
computer that can think, talk, anticipate my needs - are still a few
years out, maybe decades, in terms of CPU power, but also data access,
organization, and just plain programming.

(2) Low Power and Small Form Factor: combine with this that I don't
really want those applications on a desktop or laptop PC. I want those
applications on a cell phone - and the cell phone is the largest,
highest power, device I want. I want those applications on an ear bud
whispering into my ear. I want those applications on glasses drawing
into my eyes, or on smart contact lenses. In part we are waiting for
these new form factors to be created. In part, the new generation of
low power devices - Atom, ARM - are recapitulating CPU evolution, in
much the same way microprocessors recapitulated the evolution of
mainframes and minicomputers. It's not clear if we ever really
surpassed them - while I think that the most recent members of the Intel
P6 family surpassed the most advanced IBM mainframe processors people
rumor about Poughkeepsie, I'm not sure. Anyway, ARM and Atom reset to
simple in-order processors, and are climbing the complexity ladder
again. ARM Cortex A9 has, at least, reached the OOO level. When will
they surpass the desktop and lapptop microprocessors?

Mike Haertel says that the big value of Atom was allowing Intel to take
a step back, and then get onto the Moore's Law curve again, for a few years.

(3) Simple Applications and Parallelism: Also, since about 1999 many of
the most important applications have been simple: video, multimedia.
Relatively brute force algorithms. MPEG, rectangular blocks. Not model
based. Easy to run SIMD vectors on. Easy to parallelize. Not very
smart. Graphics algorithms have been much the same, in the earlier
generation of GPUs. We are just now getting to the point where flexibly
programmable shader engines are in GPUs.

Couple this to the fact that there are throughput applications, and we
have been in a space where there was more value, and certainly less
project and career risk, in increasing the number of cores, making
relatively minor modifications to the existing cores, than in improving
the cores. And this will go on, until the low hanging fruit in
multicore is taken up - by 4 or 8 processors per chip. Beyond that...
well, most server guys don't want many more processors per chip; they
want more powerful processors. Somewhere, I suspect soon, multicore
will run out of steam. Although, as I have said before, there are
applications that can use many, many, CPUs. Graphics, if nothing else;
and there are others. So it may be that we switch our collective
mindshare from multicore to manycore.


> A larger number of decisions are being driven by the details of the
> process, libraries and circuit families. This stuff is less accessible
> to a non-practitioner, and probably propietary to boot.
>
> A lot of the architecture that is being done is application-specific.
> Consequently, its probably more apt to be discussed in
> comp.<application> than comp.arch. A lot of the trade-offs will make
> sense only in that context.
>
> Basically, I think the field has gotten more complicated and less
> accessible to the casual reader (or even the gifted well read amateur).
> The knowledge required of a computer architect have increased to the
> point that its probably impossible to acquire even a *basic* grounding
> in computer architecture outside of actually working in the field
> developing a processor or _possibly_ studying with one of a few PhD
> programs. The field has gotten to the point where it _may_ require
> architects to specialize in different application areas; a lot of the
> skills transfer, but it still requires retraining to move from, say,
> general-purpose processors to GPU design.
>
> I look around and see a handful of guys posting who've actually been
> doing computer architecture. But its a shrinking pool....
>
> Ah, well - I guess I can always go hang out at alt.folklore.computers.

I may have to do that as well. Work on my book-and-wiki-site.
From: "Andy "Krazy" Glew" on
Mayan Moudgill wrote:
>
> I've been reading comp.arch off and on for more than 20 years now. In
> the past few years the SNR has deteriorated considerably, and I was
> wondering why. Maybe people who used to post at comp.arch are on other
> formums? Maybe its that I've gotten a little harder to impress? Then I
> thought about the quality of most papers at ISCA and Micro, the fact
> that both EDF and MPF have gone away, and I think the rot is not
> confined to just comp.arch.

Mayan, you would post this just as I am changing jobs, leaving Intel for
the second and last time. Not only have I been busy, but it probably
would not have been a smart thing for me to post while in transition.

But, the fact that I have left Intel says something: it says that I, at
least, don't see much opportunity to do interesting computer
architecture at Intel. Similarly, the fact that Mitch Alsup also posts
to this list, and is not at any CPU company that I am aware of, also
says something.


> So, whats going on? I'm sure part of it is that the latest generation of
> architects is talking at other sites.

If so, they haven't told me. (Sob!)

Dave Kanter may pitch realworldtech.com, and there's a lot of good stuff
there.

But as for me, I got my first real computer architecture job mainly
because Bob Colwell liked my posts on comp.arch. And I'll end it here.

In fact, making sure that I was allowed to post to comp.arch was a major
condition for me accepting my new job.


> However, equally important is that there are far fewer of them. The
> number of companies designing processors has gone down and there are
> fewer startups doing processors. So, less architects.

Certainly, fewer companies. Probably fewer teams, even though Intel now
has more teams than ever before doing CPUs: Oregon big-core, Israel
big-core, Atom and Lrb. Not to forget the Intel integrated graphics teams.

At Intel, there are probably more people called "computer architects"
now than ever before. But, the scope of the job has narrowed. There
are a dozen people, probably more, doing the job that I did as a single
person on P6.


> Within those processors there is less architecture (or micro
> architecture) being done; instead, the imperative that clock cycle has
> to be driven down leaves less levels of logic per cycle, which in turn
> means that the "architecture" has to be simpler. So, less to talk about.

Less architecture, I agree.

Not necessarily less levels of logic per cycle. The "right hand turn:
turned away from such high speed design as Willamette and Prescott.
Mitch can talk to this.



> There is less low-hanging fruit around; most of the simpler and
> obviously beneficial ideas are known, and most other ideas are more
> complex and harder to explain/utilize.

I believe that there are good new ideas, in both single processor
microarchitecture, and in multiprocessor.

But we are in a period of retrenchment - one of the downward zigs of the
"sawtooth wave" that I described in my Stanford EE380 talk so many years
ago.

There are several reasons for this, including

(1) what Fred Pollack called "The Valley of Death" for applications.
Many of the applications that I can imagine wanting improved - I want a
computer that can think, talk, anticipate my needs - are still a few
years out, maybe decades, in terms of CPU power, but also data access,
organization, and just plain programming.

(2) Low Power and Small Form Factor: combine with this that I don't
really want those applications on a desktop or laptop PC. I want those
applications on a cell phone - and the cell phone is the largest,
highest power, device I want. I want those applications on an ear bud
whispering into my ear. I want those applications on glasses drawing
into my eyes, or on smart contact lenses. In part we are waiting for
these new form factors to be created. In part, the new generation of
low power devices - Atom, ARM - are recapitulating CPU evolution, in
much the same way microprocessors recapitulated the evolution of
mainframes and minicomputers. It's not clear if we ever really
surpassed them - while I think that the most recent members of the Intel
P6 family surpassed the most advanced IBM mainframe processors people
rumor about Poughkeepsie, I'm not sure. Anyway, ARM and Atom reset to
simple in-order processors, and are climbing the complexity ladder
again. ARM Cortex A9 has, at least, reached the OOO level. When will
they surpass the desktop and lapptop microprocessors?

Mike Haertel says that the big value of Atom was allowing Intel to take
a step back, and then get onto the Moore's Law curve again, for a few years.

(3) Simple Applications and Parallelism: Also, since about 1999 many of
the most important applications have been simple: video, multimedia.
Relatively brute force algorithms. MPEG, rectangular blocks. Not model
based. Easy to run SIMD vectors on. Easy to parallelize. Not very
smart. Graphics algorithms have been much the same, in the earlier
generation of GPUs. We are just now getting to the point where flexibly
programmable shader engines are in GPUs.

Couple this to the fact that there are throughput applications, and we
have been in a space where there was more value, and certainly less
project and career risk, in increasing the number of cores, making
relatively minor modifications to the existing cores, than in improving
the cores. And this will go on, until the low hanging fruit in
multicore is taken up - by 4 or 8 processors per chip. Beyond that...
well, most server guys don't want many more processors per chip; they
want more powerful processors. Somewhere, I suspect soon, multicore
will run out of steam. Although, as I have said before, there are
applications that can use many, many, CPUs. Graphics, if nothing else;
and there are others. So it may be that we switch our collective
mindshare from multicore to manycore.


> A larger number of decisions are being driven by the details of the
> process, libraries and circuit families. This stuff is less accessible
> to a non-practitioner, and probably propietary to boot.
>
> A lot of the architecture that is being done is application-specific.
> Consequently, its probably more apt to be discussed in
> comp.<application> than comp.arch. A lot of the trade-offs will make
> sense only in that context.
>
> Basically, I think the field has gotten more complicated and less
> accessible to the casual reader (or even the gifted well read amateur).
> The knowledge required of a computer architect have increased to the
> point that its probably impossible to acquire even a *basic* grounding
> in computer architecture outside of actually working in the field
> developing a processor or _possibly_ studying with one of a few PhD
> programs. The field has gotten to the point where it _may_ require
> architects to specialize in different application areas; a lot of the
> skills transfer, but it still requires retraining to move from, say,
> general-purpose processors to GPU design.
>
> I look around and see a handful of guys posting who've actually been
> doing computer architecture. But its a shrinking pool....
>
> Ah, well - I guess I can always go hang out at alt.folklore.computers.

I may have to do that as well. Work on my book-and-wiki-site.
From: "Andy "Krazy" Glew" on
Tim McCaffrey wrote:
> In article
> <da524b6d-bc4d-4ad7-9786-3672f7e9e52c(a)j19g2000yqk.googlegroups.com>,
> MitchAlsup(a)aol.com says...
>> On Sep 10, 10:04=A0pm, Mayan Moudgill <ma...(a)bestweb.net> wrote:
>>> Well, synchronization can be pretty easy to implement - depends on what
>>> you are trying to accomplish with it (barriers, exclusion, queues,
>>> etc.).
>> If it is so easy to implement then why are (almost) all
>> synchronization models at lest BigO( n**2 ) in time? per unit of
>> observation. That is, it takes a minimum of n**2 memory accesses for 1
>> processor to recognize that it is the processor that can attempt to
>> make forward progress amongst n contending processors/threads.

Although my MS thesis was one of the first to make this observation of
O(n^2) work, it also points out that there are O(1) algos, chiefly among
the queue based locks. I liked Graunke Thakkar, but MCS gets the acclaim.

From: "Andy "Krazy" Glew" on
Robert Myers wrote:
> Chrome creates a separate process for each tab, and I have *usually*
> been able to regain control by killing a single process.


Hallelujah!

Processes are the UNIX way.

I may have to start using Chrome.