From: jacko on

Göran Bilski wrote:

> Frank Buss wrote:
> > Göran Bilski wrote:
> >
> >
> >>If the interesting part is to create this solution without any time
> >>limits than you should create most from scratch.
> >
> >
> > Yes, this is what I'm planning.
> >
> > I have another idea for a CPU, very RISC like. The bits of an instructions
> > are something like micro-instructions:
> >
> > There are two internal 16 bit registers, r1 and r2, on which the core can
> > perform operations and 6 "normal" 16 bit registers. The first 2 bits of an
> > instructions defines the meaning of the rest:
> >
> > 2 bits: operation:
> > 00 load internal register 1
> > 01 load internal register 2
> > 10 execute operation
> > 11 store internal register 1
> >
> > I think it is a good idea to use 8 bits for one instruction instead of
> > using non-byte-aligned instructions, so we have 6 bits for the operation.
> > Some useful operations:
> >
> > 6 bits: execute operation:
> > r1 = r1 and r2
> > r1 = r1 or r2
> > r1 = r1 xor r2
> > cmp(r1, r2)
> > r1 = r1 + r2
> > r1 = r1 - r2
> > pc = r1
> > pc = r1, if c=0
> > pc = r1, if c=1
> > pc = r1, if z=0
> > pc = r1, if z=1
> >
> > For the load and store micro instructions, we have 6 bits for encoding the
> > place on which the load and store acts:
> >
> > 6 bits place:
> > 1 bit: transfer width (0=8, 1=16 bits)
> > 2 bits source/destination:
> > 00: register:
> > 3 bits: register index
> > 01: immediate:
> > 1 bit: width of immediate value (0=8, 1=16 bits)
> > next 1 or 2 bytes: immediate number (8/16 bits)
> > 10: memory address in register
> > 3 bits: register index
> > 11: address
> > 1 bit: width of address (0=8, 1=16 bits)
> > next 1 or 2 bytes: address (8/16 bits)
> >
> > The transfer width and the value need not to be the same. E.g. 1010xx
> > means, that the next byte is loaded into the internal register and the
> > upper 8 bits are set to 0.
> >
> > But for this reduced instruction set a compiler would be a good idea. Or
> > different layers of assembler. I'll try to translate my first CPU design,
> > which needed 40 bytes:
> >
> > ; swap 6 byte source and destination MACs
> > .base = 0x1000
> > p1: .dw 0
> > p2: .dw 0
> > tmp: .db 0
> > move #5, p1
> > move #11, p2
> > loop: move.b (p1), tmp
> > move.b (p2), (p1)
> > move.b tmp, (p2)
> > sub.b p2, #1
> > sub.b p1, #1
> > bcc.b loop
> >
> > With my new instruction set it could be written like this (the normal
> > registers 0 and 1 are constant 0 and 1) :
> >
> > load r1 immediate with 5
> > store r1 to register 2
> > load r1 immediate with 11
> > store r1 to register 3
> > loop: load r1 from memory address in register 2
> > load r2 from memory address in register 3
> > store r1 to memory address in register 3
> > store r2 to memory address in register 2
> > load r1 from register 3
> > load r2 from register 1
> > operation r1 = r1 - r2
> > store r1 in register 3
> > load r1 in register 2
> > operation r1 = r1 - r2
> > store r1 in register 2
> > operation pc = loop if c=0
> >
> > This is 20 bytes long. As you can see, there are micro optimizations
> > possible, like for the last two register decrements, where the subtrahend
> > needs to be loaded only once.
> >
> > I think this instruction set could be implemented with very few gates,
> > compared to other instruction sets, and the memory usage is low, too.
> > Another advantage: 64 different instructions are possible and orthogonal
> > higher levels are easy to implement with it, because the load and store
> > operations work on all possible places. Speed would be not the fastest, but
> > this is no problem for my application.
> >
> > The only problem is that you need a C compiler or something like this,
> > because writing assembler with this reduced instruction set looks like it
> > will be no fun.
> >
> > Instead of 16 bits, 32 bits and more is easy to implement with generic
> > parameters for this core.
> >
>
> Things to keep in mind is to handle larger arithmetic than 16 bits.
> That will usually introduce some kind of carry bits (stored where?).
> You seems to have a c,z bits somewhere but you will need two versions of
> each instruction, one which uses the carry and one which doesn't

or you will have to clear the carry when you want to add without carry.

> Running more than just simple programs in real-time applications
> requires interrupt support which messes things up considerable in the
> control part of the processor.

a register swap for interrupt processing is the easiest.

> Do you consider using only absolute branching or also doing relative
> branching?

either would work, but relative has code size advantage, and absolute
has execution advantage.

> If you really are wanting to have a processor which is code efficient,
> you might want to look at a stack machine.
> If I was to create a tiny tiny processor with little area and code
> efficient I would do a stack machine.
> But they are much nastier to program but they can be implemented very
> efficiently.
>

search for MSL16 as a compact example of stack machine, i would use
slightly different ops, and things if i did it.

2/ ??? i'd have full bit reversal
get rid of the subtract.

umm??

cheers
jacko

From: Ray Andraka on
Jim Granville wrote:
> Frank Buss wrote:
> <snip>
>
>> The only problem is that you need a C compiler or something like this,
>> because writing assembler with this reduced instruction set looks like it
>> will be no fun.
>
>
> Since this is a very specifica application, do you have a handle on the
> code size yet ?
>
> Another angle to this, would be to choose the smallest CPU for which a
> C compiler exists.
>
> Here, Freescale's new RS08 could be a reasonable candidate ?
>
> Or chose another more complex core and then scan the compiled output,
> to check the Opcode usages, and subset that.
>
> -jg
>
>

Quite a while back I designed a small microcontroller for a Xilinx
XC4000E series part that used approximately 80 LUTs and ran at IIRC, 105
MHz, I think it was in a 4020XL. It was a simple risc machine that was
sort of a cross between a PIC microcontroller and an RCA1802. It had a
register file with 16 registers like the 1802, and had a small
instruction set similar to a PIC. If I recall correctly, it was a
harvard architecture. The ISA was specifically designed for the FPGA
architecture.

Anyway the difficult part about it was that it had no programming tools
to support it. We did write a crude assembler for it, but that was
about as far as we took it. The point is, the hardware and ISA design
is only part of the job. The tools development is as big a piece as the
processor design itself.
From: JJ on

Frank Buss wrote:
> For implementing the higher level protocols for my Spartan 3E starter kit
> TCP/IP stack implementation, I plan to use a CPU, because I think this
> needs less gates than in pure VHDL. The instruction set could be limited,
> because more instructions and less gates is good, and it doesn't need to be
> fast, so I can design a very orthogonal CPU, which maybe needs even less
> gates. The first draft:
>
> http://www.frank-buss.de/vhdl/cpu.html
>
> It is some kind of a 68000 clone, but much easier. What do you think of it?
> Any ideas to reduce the instruction set even more, without the drawback to
> need more instructions for a given task?
>
> --
> Frank Buss, fb(a)frank-buss.de
> http://www.frank-buss.de, http://www.it4-systems.de

I did a google for <tiny tcp stack> and saw lots of things

I was looking specifically for Adam Dunkels , he gets alot of press on
OSNews and other sites for his various embedded OS projects.

His uIP stack claims to be the worlds smallest stack, uses 4-5KB of
code space and only a few 100 bytes of ram. uIP has been ported to a
wide range of systems and many commercial projects. He mentions ABB,
Altera, BMW, Cisco Systems, Ericsson, GE, HP, Volvo Technology, Xilinx.
The IwIP is a bigger faster version of uIP.

http://www.sics.se/~adam/

Besides uIP he also has a tiny OS Contiki, a ProtoThreads package.

John Jakson
transputer_guy

From: Jim Granville on
radarman wrote:
>>>Maybe you are right and the core and programs are smaller with Forth, I'll
>>>think about it. Really useful is that it is simple to write an interactive
>>>read-eval-print loop in Forth (like in Lisp), so that you can program and
>>>debug a system over RS232.
>>>
>
>
> Simpler solution - have the microcode FSM push the flags to the stack.
> It's a simple alteration, and saves a lot of heartache. I have
> contemplated even pushing the entire context to the stack, since I can
> burst write from the FSM a lot faster than I can with individual
> PSH/POP instructions, but I figure that would be overkill.

For someone doing a fully custom/own assembler/compiler :

The tiniest CPUs do not need a stack, and interupts do not need to be
re-entrant, so a faster context switch is to re-map the Registers, Flags
(and even PC ? ) onto a different area in BRAM.
You can share this resource by INTs re-map top-down, and calls re-map
bottom up - with a hardware trap when they collide :)

-jg

From: Jim Granville on
Gran Bilski wrote:
>
> Things to keep in mind is to handle larger arithmetic than 16 bits.
> That will usually introduce some kind of carry bits (stored where?).
> You seems to have a c,z bits somewhere but you will need two versions of
> each instruction, one which uses the carry and one which doesn't
>
> Running more than just simple programs in real-time applications
> requires interrupt support which messes things up considerable in the
> control part of the processor.
>
> Do you consider using only absolute branching or also doing relative
> branching?
>
> If you really are wanting to have a processor which is code efficient,
> you might want to look at a stack machine.
> If I was to create a tiny tiny processor with little area and code
> efficient I would do a stack machine.
> But they are much nastier to program but they can be implemented very
> efficiently.

One stack machine, that is still small, but could help greatly with
software flows (being an already defined std)
is the Instruction List language of IEC 61131-1

http://www.3s-software.com/index.shtml?CoDeSys_IL

and

http://en.wikipedia.org/wiki/Instruction_list

in a 'full' system, this supports many data type sizes, but you could
nominate a single size for this task.

IEC Conventions:
Prefix Meaning
% variable
I Input Location
Q Output Location
M Memory Location
(none) Single Bit default
X Single Bit
B Byte ( 8 bits )
W Word ( 16 bits )
D Double ( 32 bits )
L Long ( 64 bits )
N Nibble ( 4 bits )


Opcode encoding is quite small, and you can see some commonality with
PicoBlaze and Mico8 cores.

You do need to define a memroy-reach limit quite early on any CPU design.

-jg