From: Owen Shepherd on
Jeremy Linton wrote:
> Well, they doesn't hit all of your bullet points, but marvell kirkwood
> processors are currently available. They have DDR2/DDR3, 2-3 GigE ports,
> SATA ports, DMA engines, couple PCIe lanes etc.
>
> Having used a couple of these processors, they are more than capable.
> You probably don't want to run compute intensive applications on them,
> but they easily can keep a couple GiGE ports busy (80%+ utilization)
> serving as file servers. With a little creativity i'm sure you could use
> them for web servers, or any number of other tasks. Plus beyond the
> basics they have numerous useful on chip devices. For example hardware
> encryption or XOR operations in the DMA controllers.

If only Marvell would stop with their belief that data-sheets are trade
secrets...
From: Michael S on
On Jul 28, 6:15 am, Andy Glew <"newsgroup at comp-arch.net"> wrote:
> On 7/27/2010 11:34 AM, j...(a)cix.compulink.co.uk wrote:
>
> > In article
> > <284da124-7934-42bb-a58c-899a935a0...(a)5g2000yqz.googlegroups.com>,
> > gni...(a)gmail.com (gnirre) wrote:
>
> >> Will Microsofts [sic] design an ARM processor?
>
> > I doubt it very much. They probably want to put an ARM with custom
> > peripherals onto a chip in some piece of equipment. They do sell quite a
> > lot of electronics in various forms: this could well go into a Zune
> > successor, for example.
>
> If that was what they wanted, they could have bought a much cheaper
> license, that allows them to use an existing ARM core design, and build
> an SOC out of it.
>

AFAIR, both Motorola/Freescale and TI (and Apple too?) own ARM
architecture license despite not only never developing ARM cores of
their own, but never really having a concrete plans to do so.
Sometimes big corporations buy expensive things just because big
corporations buy expensive things.


From: Paul Gotch on
Owen Shepherd <owen.shepherd(a)e43.eu> wrote:
> > Of course the real question is whether they added conditionals
> > for marketing reasons, or because it actually helps performance
> > and/or code size...

> I'd expect it does help code size and performance to a degree (since
> it takes load off the branch predictor, both reducing the probability
> of a misprediction and allowing it to profile the rest of the code
> better)

LOL

Back when the original 26 bit ARM architecture was designed I don't
think Acorn had a marketing department to speak of.

Predicated instruction sets help codesize if you have a compiler which
does if conversion. Branch prediction isn't something original ARMs
had, this was 1983...

-p
--
Paul Gotch
--------------------------------------------------------------------
From: Owen Shepherd on
Paul Gotch wrote:
>
> LOL
>
> Back when the original 26 bit ARM architecture was designed I don't
> think Acorn had a marketing department to speak of.
>
> Predicated instruction sets help codesize if you have a compiler which
> does if conversion. Branch prediction isn't something original ARMs
> had, this was 1983...
>
> -p

I'm aware of the design of the ARM architecture; it is quite an interesting
story! However, the above discussion was in the context of the AVR32.

(And to those who'll say "26 bit? How short sighted!", note that Intel had
only just released the 286, and Acorn's previous computers had been 6502
based. 64MB was a lot of RAM back then, and it meant that the flags could be
pushed into the same slot as the PC - a great performance win back in those
days)

(And I've had the pleasure of using a RISC OS machine. They really were
quite innovative for their time; for example, RISC OS had sub pixel anti-
aliasing before PCs even had vector fonts...)
From: nedbrek on
Hello all,

"Brett Davis" <ggtgp(a)yahoo.com> wrote in message
news:ggtgp-776ECA.23240209082010(a)news.isp.giganews.com...
>
> So is CMOVE still implemented internally as a branch?
> (I know this is crazy sounding, but that is what both did...)

The biggest problem with CMOV is the renamer (so, it is easy to handle for
an in-order machine).

Given the sequence
ld r4 = [r0]
add r1 += r3
cmov r1 = zf ? r1 : r4
sub r6 -= r1

When you rename the subtract, you need to connect it to either the
instruction producing r1 (the add) or the producer of r4 (based on the
flags, which have [potentially] a third producer).

Your options are:
1) Stall on the condition codes at the producer or the consumer (producer is
easier to implement, consumer gives better perf)

2) Predict at the cmov, the cmov then becomes the check, and flush if wrong

3) Hack the renamer to support two (or more!) producers and add support in
the execution core to bypass multiple (ouch!)

4) Emit select uops

For case 4 (assuming 2 srcs per), you get
cmov ->
concat tmp = {flags,r1}
select r1 = tmp, r4

The concat connects the producer of flags and the producer of r1. The
select can then use the flags to select r1 or r4. Consumers of r1 depend on
the select.

If you have 3 srcs per, you can do the select directly
select r1 = flags, r1, r4

Ned