RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Brett Davis on 19 Jun 2010 15:50

In article
<0922168e-6d6f-4480-85ec-fa5996c336a7(a)z10g2000yqb.googlegroups.com>,
MitchAlsup <MitchAlsup(a)aol.com> wrote:

> On Jun 18, 2:28�pm, r...(a)clozure.com (R. Matthew Emerson) wrote:
> > "nedbrek" <nedb...(a)yahoo.com> writes:
> > > But Mitch has it right. �Architecture does not matter.
>
> I think it might be better to say, Instruction sets don't mater, the
> rest of what we call architecture does mater, now and again.
>
> > You guys keep saying this, and maybe for large majority of people it is
> > even true.
> >
> > But I still say that ISA makes a difference. �As an example, our Common
> > Lisp implementation targeted only PowerPC for a long time. �
>
> Here we have the classic mismatch of architecture and application. I
> might note that those machines that had the instruction set
> infrastructure to support <the various> LISPs did not end up surviving
> into the present (save, <ahem> SPARC). These architectures were also
> pretty good at Prolog, and at emulating other instruction sets.

PowerPC has some nice kitchen sink features that have multiple uses.
But I always thought of SPARC as just RISC with a registers windows mistake.

What RISC CPU features does LISP need besides lots of registers?

From: Bakul Shah on 19 Jun 2010 16:28

On 6/19/10 5:26 AM, nedbrek wrote:
> Sorry, the fully qualified statment should be "ISA does not make (much of) a
> difference to performance". It comes up so often, it gets abbreviated a
> lot. :)
>
> It can be a pain for compiler writers (especially ones trying to get the
> most performance). But y'all are 0% of the market.

The following may be relevant here.

Compile-time optimization means throwing away as much
generality as you can while preserving the semantics of the
operation IN CONTEXT. Since simple operations run very fast
on RISCs, and since the majority of instructions emitted by a
compiler (any compiler) are simple instructions anyway, and
since streams of simple instructions are easier to analyze
and improve (although at the cost of having more of them to
look at), the net result is faster programs.

[From a June 20, 1988 comp.compilers post by Dain Samples]

This may be somewhat relevant: Looks like ARM is already
eating X86's lunch at the low end (90+% of cell phones have
them. 5 billion estimated to ship next year). Attack of the
Killer Micros: the energy efficient generation? :-)

From: nmm1 on 19 Jun 2010 16:29

In article <1jkcsdg.28nffa169emgN%nospam(a)ab-katrinedal.dk>,
=?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= <nospam(a)ab-katrinedal.dk> wrote:
>Andy 'Krazy' Glew <ag-news(a)patten-glew.net> wrote:
>
>> Unfortunately, there are several different types of integer overflow.
>> E.g. the overflow conditions for each of the following are different
>>
>> unsigned + unsigned
>> signed + signed
>> signed + unsigned
>
>Are you talking about D or some other language that is not C?

He doesn't need to. They're all different in C.

Regards,
Nick Maclaren.

From: Brett Davis on 19 Jun 2010 16:29

In article <hvi9bp$1v7$1(a)news.eternal-september.org>,
"nedbrek" <nedbrek(a)yahoo.com> wrote:
> "Brett Davis" <ggtgp(a)yahoo.com> wrote in message
> news:ggtgp-6935B6.02064019062010(a)news.isp.giganews.com...
> >
> > 2.1.2.6 Micro-fusion
> >
> > You can do the same with a OoO RISC chip, but its harder, I believe
> > you would need an extra write port. I do not know of a RISC chip
> > that does fusion with reads, I do know that PowerPC does do some
> > Micro-fusion on other opcodes.
> >
> > We are back to my original question, is Add from Memory RISCier than RISC
> > for a hugely OoO design?
> >
> > (The real win is less than 50%, far less, you have to be starved for issue
> > slots.) The power savings is real, and important.
>
> I believe the sequence you are describing is:
> add r1 += [r2]
>
> The advantage CISC has is that the uop sequence looks like:
> ld tmp = [r2]
> add r1 += tmp
>
> Since tmp is not an architected register, it does not have to be preserved
> for an interrupt, or seen past the use in add (it is known dead). Thus, it
> can exist strictly in the bypass network (it is not allocated a rename
> register, it is not visible to later instructions [does not participate in
> renaming], and has no architected effects at retirement).
>
> The RISC sequence will always be (ld r3 = [r2]; add r1 += r3). r3 is live
> out, and must be architecturally visible. You can smash ops together,
> giving you r3,r1 = load-op [r2] + r1
>
> You can't say just "need an extra write port" unless you have a simple 5
> stage pipeline. In a modern machine, this means extra decode bits (in the
> scheduler and ROB), extra RAT ports, extra complexity come retirement time
> (do you allow every instruction to update two entries in the retirement
> register table?)

I forgot that on both PowerPC and x86 the Load Unit has its own register
write port, so you do have an extra write port, and I assume that on x86
the Load Unit is involved in the Add from Memory, if only for the bypass data.

So the real savings is having the Add instruction issue the [r2] register
read instead of a r3 register read, saving an issue slot and cycle?

Yes, lots of complexity in a full OoO design.

From: Anne & Lynn Wheeler on 19 Jun 2010 16:52

Brett Davis <ggtgp(a)yahoo.com> writes:
> PowerPC has some nice kitchen sink features that have multiple uses.
> But I always thought of SPARC as just RISC with a registers windows mistake.
>
> What RISC CPU features does LISP need besides lots of registers?

old email from long ago and far away referencing the lisp machine
group trying to get an 801:
http://www.garlic.com/~lynn/2003e.html#email790711

in that time-frame there was effort to replace large variety of
different internal microprocessors with 801. 801 iliad chips had
specialized features for aid in emulation.

--
42yrs virtualization experience (since Jan68), online at home since Mar1970

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?