From: Owen Shepherd on
wrote:

> Owen Shepherd wrote:
>> Certainly AVR32 and Thumb show between them that many of the
>> features can be kept; its just a matter of value - note that
>> neither Thumb or AVR32 are highly orthogonal architectures like
>> more traditional designs, as they're much more aimed towards
>> compilers.
>
> Huh?
>
> So are you saying that Thunb/AVR32 are non-orthogonal, because they are
> aimed at compiled code, or that 'more traditional designs' have that aim?
>
> The latter does make sense, but not to my non-native reading of your
> english. :-(
>
> Terje
>

More traditional RISC designs tend to be highly orthagonal; Thumb(2) and
AVR32 have a bunch of oddness about them because they are designed for
compilers to target (rather than with the expectation of humans writing
assembly by hand).

A couple of simple examples from Thumb2:
1. Registers r0-r7 are preferred to r8-r12*, because most instructions
only use 3 bits to encode each opcode (Thumb2 added a bunch of longer
opcodes to make the upper registers more accessible, but they're
32-bit instructions)
2. ARM has an array of modes for the STM/LDM modes: increment before,
increment after, decrement before, decrement after. Thumb only has
STM decrement before (STMDB) and LDM increment after (LDMIA). This
is not coincidentally the way the stack operates

Owen

* Remember that r13=SP, r14=LR, r15=PC, so they're somewhat less useful
from many perspectives
From: Owen Shepherd on
Brett Davis wrote:
> In article <PeJ7o.74094$gM.43799(a)hurricane>,
> ARM squandered opcode bits, it would have been better to save
> more bits for future extensions. They made the same mistake
> with THUMB 1, fully using the opcode map.
>
>> Much of the focus these days is on the Thumb2 instruction set.
>> This is still fully predicated, but does this by using special
>> instructions (Called "IT," for some reason beyond my knowledge).
>> However, the shifted add has gone away - mainly because under
>> Thumb doing a shift followed by an add takes as much code space
>> (and is the same speed).
>
> Opcode size has nothing to do with decode/execute rate, that
> was the mistake THUMB 1 made. If longer instructions took longer
> to execute then THUMB 2 would have the same sucky performance
> as THUMB 1.
> I believe the lack of shifted adds is due to simply running out
> of opcode space due to being mostly compatible with THUMB 1.
> Being THUMB 1 compatible was mistake, and they have know it, now.
>
> They could have designed a nice opcode set that was trivially
> upgradable to 64 bits. Instead we get flavor of the year
> instruction sets. A clear sign that they do not know what they
> are doing, and yet they are the most successful of the RISC
> chip companies. (Besides IBM.)
>
For the targets Thumb1 was designed for, size definitely was speed.
Remember that even today a lot of ARMs are deployed with 16 bit
memory busses. Even with cache, this heavily penalises 32 bit code
unless its very loopy.

I have to ask how you would expect them to add a new fully
interworking instruction set to the architecture? The only addressing
bit that could be used to select it - bit 0 - is already used for
Thumb. Thumb can't be removed (Its in the ARMv6 definition, and
piles upon piles of software depends upon its continued existence).
Your only option would be to make Thumb and Thumb2 mutually exclusive
in a process - and, well, that poses other issues.

No, Thumb2 is nowhere near perfect. But saying that ARM can't design
instruction sets is like saying Intel can't. Both ARM and x86 are old
and have evolved; wasteful use of the opcode space is unfortunately
common, particularly when you're aiming at minimizing code size.

(I wonder what MIPS16e is like? I'll have to look at it.. Hopefully
they've learned some lessons from ARM's mess)

>> I haven't investigated AVR32 in depth, but a friend of mine who
>> has worked with it and ARM heavily has, and it can pretty much
>> be said that its very, very similar to Thumb. I also note that
>> AVR32 "Version 2" adds predication to a bunch of data processing
>> opcodes; and that a bunch of them also provide pre-shifted
>> second operands.
>
> I stand corrected twice in one week.
> The 2 bit shift is kinda small, forgot about it.
> Version 2 adds conditionals to all the common math opcodes,
> and stores. This rocks, really liking this architecture.
>
> Of course the real question is whether they added conditionals
> for marketing reasons, or because it actually helps performance
> and/or code size...

I'd expect it does help code size and performance to a degree (since
it takes load off the branch predictor, both reducing the probability
of a misprediction and allowing it to profile the rest of the code
better)

>> Neither I or him have investigated RX yet.
>
> RX is a boring variable width RISC chip, besides the
> memcopy/string opcodes. (Shades of x86)
> Big lesson there maybe?
>
> But having byte sized opcode parts is a poor idea, unless
> your long range plan is to add x86 compatibility.
> Even then I do not think it is a good idea.
> Lack of conditionals and shifted adds points at a x86 goal?

I expect byte-sized instruction parts is a bad idea also;
though it depends upon how its handled. If you can get the
full size of the instruction from the first byte, then perhaps
it will work. I think that, in general, you'll end up wasting
more bits overall though if going that route.

16-bit parts seem to work well enough anyway (Thumb2 code is generally
1.1x the size of x86; you should obviously be able to make your code
smaller anyway, since both ARM and x86 are horribly messy)

Owen
From: Andy Glew "newsgroup at on
On 8/8/2010 11:23 PM, Terje Mathisen wrote:
> Owen Shepherd wrote:
>> Certainly AVR32 and Thumb show between them that many of the
>> features can be kept; its just a matter of value - note that
>> neither Thumb or AVR32 are highly orthogonal architectures like
>> more traditional designs, as they're much more aimed towards
>> compilers.
>
> Huh?
>
> So are you saying that Thumb/AVR32 are non-orthogonal, because they are
> aimed at compiled code, or that 'more traditional designs' have that aim?
>

One of the original motivations of RISC was that a regular, orthogonal,
instruction set might be easier for compilers to deal with.

Now the line goes that a compiler can deal with an irregular,
non-orthogonal, instruction set.
From: Anne & Lynn Wheeler on

Andy Glew <"newsgroup at comp-arch.net"> writes:
> One of the original motivations of RISC was that a regular,
> orthogonal, instruction set might be easier for compilers to deal
> with.
>
> Now the line goes that a compiler can deal with an irregular,
> non-orthogonal, instruction set.

the other scenario ... I've periodically claimed that John's motivation
was to go to the opposite complexity extreme of the (failed) future
system effort for 801 in the mid-70s ... not only simplifying
instruction set ... but also making various hardware/compiler/softare
complexity trade-offs ... decreasing hardware complexity and
compensating with more sophisticated compilers and softare.

another example was lack of hardware protection (reduced hardware
complexity) compensated for by compiler that only generated correct code
.... and closed operating system that would only load correct programs.

this was the displaywriter follow-on from the early 80s with romp
(chip), pl.8 (compiler) and cp.r (operating system). when that product
got killed, the group looked around for another market for the box and
hit on the unix workstation market. they got the company that did the
port to the pc for pc/ix ... to do one for their box ... and marketed it
as aix & pc/rt. one issue was that the unix & c environment is
significantly different than "only correct programs" and "closed
operating system" from the original design (requiring at least some
additions to the hardware for the different paradigm/environment).

misc. past posts mentioning 801, iliad, risc, romp, rios, power,
power/pc, etc
http://www.garlic.com/~lynn/subtopic.html#801

misc. past posts mentioning (failed) future system effort
http://www.garlic.com/~lynn/submain.html#futuresys

.... trivia ... predating romp was effort to replace the large variety of
internal microprocessors (used in controllers and for low/mid range
processor engines) with 801 ... some number of 801 Iliad chips
configured for that purpose.

an example was the original as/400 (replacing the s/38) was going to be
801 iliad chip ... but when that ran into trouble ... a custom cisc
chip was quickly produced for the product. as/400 did finally move off
cisc to 801 power/pc varient a decade or so later.

--
virtualization experience starting Jan1968, online at home since Mar1970
From: Jeremy Linton on
On 8/2/2010 11:44 PM, Brett Davis wrote:
> The idea of a low power ARM server sounds good on paper, but lets look
> at the facts:
> No gigabit ethernet support.
> No 10 gigabit ethernet support, at all.
> No server class 10 gigabit ethernet. (Fast DMA.)
> A feeble 32 bit memory interface, not 64, (low end everyone)
> not 96, (Intel) not 128. (AMD)
> RAM support for obsolete DDR, or at most DDR2, without the
> memory controller support to actually stream at those rates.
> A 32 bit address space, not 64, too small to be called a
> desktop much less a server today.
> No support for large memory configurations. I have 12 gigs of RAM
> in my desktop, and only half the slots are full.
> A 32 bit address space can only talk to 2 gigs of RAM.

Well, they doesn't hit all of your bullet points, but marvell kirkwood
processors are currently available. They have DDR2/DDR3, 2-3 GigE ports,
SATA ports, DMA engines, couple PCIe lanes etc.

Having used a couple of these processors, they are more than capable.
You probably don't want to run compute intensive applications on them,
but they easily can keep a couple GiGE ports busy (80%+ utilization)
serving as file servers. With a little creativity i'm sure you could use
them for web servers, or any number of other tasks. Plus beyond the
basics they have numerous useful on chip devices. For example hardware
encryption or XOR operations in the DMA controllers.