From: Terje Mathisen on
Robert Myers wrote:
> Prefetch is hugely important, but how it actually works must involve a
> great deal of reverse-engineering on the part of competitors, because
> meaningful details never seem to be forthcoming from manufacturers.
> I'm assuming that Microsoft's compiler designers, for example, know
> lots of useful things that most others don't, and that they got them
> from the horse's mouth under an NDA.

I haven't seen a single x86-type CPU, from any manufacturer, where
letting the compiler issue PREFETCH instructions turns out to be a
general win.

Yes, they obviously do help a little with SPEC, particularly after the
compiler has been carefully tuned for these benchmarks, but in real life
I haven't seen this.

OTOH, hardware-based prefetch, in the form of stream detection in the
memory interface is indeed a huge win, but the compiler isn't involved
at all.
>
> It must be frustrating to see so much semi-ignorant discussion, but
> the little gems that occasionally fall on the carpet are well worth it
> to some of us.
>
> Why *didn't* the P4 have a barrel shifter? Because the watts couldn't

To frustrate me and my asm code?

> be spared, I'm sure, but why was NetBurst jammed into that box? I'm
> sure there is an answer that doesn't involve involve hopelessly arcane
> details. Whether it's worth the time of any real computer achitect to
> talk about it would have to be an individual decision.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: nmm1 on
In article <7gltkvF2lqp22U1(a)mid.individual.net>,
Del Cecchi <delcecchiofthenorth(a)gmail.com> wrote:
>
>The obstacle is that a modern processor chip design costs a LOT of
>money, tens to hundreds of millions of dollars. And that is just the
>processor chip.

Part of that is because of the complexity, and that would be one
thing to get away from. As you know, you should NEVER make the
first version too complicated, because the second will be beyond
description, and there won't be a third ....

But I agree that the entry fee for even a small chip, suitable for
field testing (as distinct from lab. testing) is not going to
leave any change from a ten million dollar note, and you will
probably have to unpeel several.

>You can probably count the number of folks willing to put up that kind
>of money on your fingers.

Notice the companies that I suggest that could do this? Intel, IBM,
Hitachi, Samsung? :-)

It's a pity that the FPGA revolution never happened, but it seems
unlikely that a new design on a FPGA scale could steal enough of a
march on the commodity silicon to take off.


Regards,
Nick Maclaren.
From: Robert Myers on
On Sep 8, 3:11 am, n...(a)cam.ac.uk wrote:
> In article <af12055e-adbd-4d70-97b0-3380e2114...(a)s31g2000yqs.googlegroups..com>,
> Robert Myers  <rbmyers...(a)gmail.com> wrote:

> >I'm not a writer of browsers, but I suspect there is a ton of
> >embarrassing or nearly-embarrassing parallelism to exploit.
>
> I have a lot of experience with such applications, over several
> decades, and I am sure that there isn't.  If you investigate the
> time taken by such things, it is normal for most of it to go in
> critical paths.  Almost none of the protocols are either designed
> or suitable for parallelism.
>
I didn't say things would go a lot faster. ;-)

No way to revoke Amdahl's law.

> >> >No longer
> >> >does the browser freeze because of some java script in an open tab.
>
> >> Oh, YEAH. =A0I use a browser that has been multi-threaded for a fair
> >> number of versions, and it STILL does that :-(
>
> >Yes, they sometimes do, but you can still regain control without
> >killing everything--if you know which process to kill. ;-)
>
> Eh?  When I said "multi-threaded", I meant multi-threaded.  If you
> kill the browser process, you lose EVERYTHING you are doing, in all
> of its tabs.  And I hope that you aren't imagining that you can kill
> one thread in a process, from outside, and expect the process to
> carry on.
>
Chrome creates a separate process for each tab, and I have *usually*
been able to regain control by killing a single process.

> >General parallelism is indeed very hard.  We differ in the estimation
> >of how much low-hanging fruit there is.
>
> I have been watching this area closely (and doing some work on it)
> for about 40 years, and have been actively and heavily involved for
> 15 years.  Neither I nor the major vendors nor the application
> developers think that there is much low-hanging fruit left.
>
> You may also have missed the point that most low-hanging fruit can
> be picked equally easily by writing the application to use multiple
> processes, and that also allows the use of distributed memory
> systems.  So, if there is masses of it, why have so few people
> tackled it in so many decades?
>
No particular payoff. One process per tab browsers didn't appear
until multi-core processors were common on the desktop. I don't see
distributed processing as all that interesting without very expensive
interconnect hardware. Eight-way eight core Xeons will be a game-
changer, and, I suspect, relatively common (four-way eight-core
systems will be even more common).

Robert.

From: Noob on
Nick Maclaren wrote:

> Because you are running Macrosloth Bloatware - and even Linux seems
> to be competing on that front :-(

Are you reviling generic or custom Linux kernels? The ability to build custom
kernels is an important advantage of Linux over Windows. (Indeed, of open source
over closed source.)

What makes software bloatware? Code size? Run-time? Another metric?

Regards.
From: Noob on
Jim Haynes wrote:

> Today we don't see any demand for that kind of performance variation
> in the single processor. There's no market for brand new PCs with
> 486 CPUs.

What about so-called netbooks?

Consider the performance of Atom vs that of Nehalem.

http://en.wikipedia.org/wiki/Netbook
http://en.wikipedia.org/wiki/Intel_Atom
http://en.wikipedia.org/wiki/Intel_Nehalem_(microarchitecture)

Regards.