CPU design [FPGA]

Prev: JOP as SOPC component
Next: uclinux on spartan-3e starter kit

From: jacko on 22 Aug 2006 00:12

Jim Granville wrote:

> Frank Buss wrote:
> <snip>
> > The only problem is that you need a C compiler or something like this,
> > because writing assembler with this reduced instruction set looks like it
> > will be no fun.
>

just got quartus II after 1/2 hr seems ok, after setting top level!!

i wonder if the avalon sopc includes usb?

not sure if c compilier for it.

well at least i have a vhdl compilier now which looks good.

must start on the micron design soon.

From: Göran Bilski on 22 Aug 2006 02:57

Frank Buss wrote:
> Gran Bilski wrote:
>
>
>>If the interesting part is to create this solution without any time
>>limits than you should create most from scratch.
>
>
> Yes, this is what I'm planning.
>
> I have another idea for a CPU, very RISC like. The bits of an instructions
> are something like micro-instructions:
>
> There are two internal 16 bit registers, r1 and r2, on which the core can
> perform operations and 6 "normal" 16 bit registers. The first 2 bits of an
> instructions defines the meaning of the rest:
>
> 2 bits: operation:
> 00 load internal register 1
> 01 load internal register 2
> 10 execute operation
> 11 store internal register 1
>
> I think it is a good idea to use 8 bits for one instruction instead of
> using non-byte-aligned instructions, so we have 6 bits for the operation.
> Some useful operations:
>
> 6 bits: execute operation:
> r1 = r1 and r2
> r1 = r1 or r2
> r1 = r1 xor r2
> cmp(r1, r2)
> r1 = r1 + r2
> r1 = r1 - r2
> pc = r1
> pc = r1, if c=0
> pc = r1, if c=1
> pc = r1, if z=0
> pc = r1, if z=1
>
> For the load and store micro instructions, we have 6 bits for encoding the
> place on which the load and store acts:
>
> 6 bits place:
> 1 bit: transfer width (0=8, 1=16 bits)
> 2 bits source/destination:
> 00: register:
> 3 bits: register index
> 01: immediate:
> 1 bit: width of immediate value (0=8, 1=16 bits)
> next 1 or 2 bytes: immediate number (8/16 bits)
> 10: memory address in register
> 3 bits: register index
> 11: address
> 1 bit: width of address (0=8, 1=16 bits)
> next 1 or 2 bytes: address (8/16 bits)
>
> The transfer width and the value need not to be the same. E.g. 1010xx
> means, that the next byte is loaded into the internal register and the
> upper 8 bits are set to 0.
>
> But for this reduced instruction set a compiler would be a good idea. Or
> different layers of assembler. I'll try to translate my first CPU design,
> which needed 40 bytes:
>
> ; swap 6 byte source and destination MACs
> .base = 0x1000
> p1: .dw 0
> p2: .dw 0
> tmp: .db 0
> move #5, p1
> move #11, p2
> loop: move.b (p1), tmp
> move.b (p2), (p1)
> move.b tmp, (p2)
> sub.b p2, #1
> sub.b p1, #1
> bcc.b loop
>
> With my new instruction set it could be written like this (the normal
> registers 0 and 1 are constant 0 and 1) :
>
> load r1 immediate with 5
> store r1 to register 2
> load r1 immediate with 11
> store r1 to register 3
> loop: load r1 from memory address in register 2
> load r2 from memory address in register 3
> store r1 to memory address in register 3
> store r2 to memory address in register 2
> load r1 from register 3
> load r2 from register 1
> operation r1 = r1 - r2
> store r1 in register 3
> load r1 in register 2
> operation r1 = r1 - r2
> store r1 in register 2
> operation pc = loop if c=0
>
> This is 20 bytes long. As you can see, there are micro optimizations
> possible, like for the last two register decrements, where the subtrahend
> needs to be loaded only once.
>
> I think this instruction set could be implemented with very few gates,
> compared to other instruction sets, and the memory usage is low, too.
> Another advantage: 64 different instructions are possible and orthogonal
> higher levels are easy to implement with it, because the load and store
> operations work on all possible places. Speed would be not the fastest, but
> this is no problem for my application.
>
> The only problem is that you need a C compiler or something like this,
> because writing assembler with this reduced instruction set looks like it
> will be no fun.
>
> Instead of 16 bits, 32 bits and more is easy to implement with generic
> parameters for this core.
>

Things to keep in mind is to handle larger arithmetic than 16 bits.
That will usually introduce some kind of carry bits (stored where?).
You seems to have a c,z bits somewhere but you will need two versions of
each instruction, one which uses the carry and one which doesn't

Running more than just simple programs in real-time applications
requires interrupt support which messes things up considerable in the
control part of the processor.

Do you consider using only absolute branching or also doing relative
branching?

If you really are wanting to have a processor which is code efficient,
you might want to look at a stack machine.
If I was to create a tiny tiny processor with little area and code
efficient I would do a stack machine.
But they are much nastier to program but they can be implemented very
efficiently.

Gran

From: Frank Buss on 22 Aug 2006 05:34

Gran Bilski wrote:

> You seems to have a c,z bits somewhere but you will need two versions of
> each instruction, one which uses the carry and one which doesn't

Yes, I have carry and zero flag. To make the implementation of the core
easier, I think I'll use one bit of the instruction set to determine if the
flags are updated or not.

> Running more than just simple programs in real-time applications
> requires interrupt support which messes things up considerable in the
> control part of the processor.

Why? I think I can implement a "call" instruction like in 68000:

r2=pc
pc=r1

In the sub routine I can save r2, if I need more call stack.

Interrupts could be implemented by saving the PC register in a special
register and restoring it by calling a special return instruction.

> Do you consider using only absolute branching or also doing relative
> branching?

64 instructions are possible, so relative branching is a good idea and I'll
use the same concept with one bit for deciding, if it is absolute or
relative.

> If you really are wanting to have a processor which is code efficient,
> you might want to look at a stack machine.
> If I was to create a tiny tiny processor with little area and code
> efficient I would do a stack machine.
> But they are much nastier to program but they can be implemented very
> efficiently.

I've implemented a simple Forth implementation for Java and it's just
different, not more difficult to program in Forth:

http://www.frank-buss.de/forth/

The MARC4 from Atmel uses qForth:

http://www.atmel.com/journal/documents/issue5/pg46_48_Atmel_5_CodePatch_A.pdf

Maybe you are right and the core and programs are smaller with Forth, I'll
think about it. Really useful is that it is simple to write an interactive
read-eval-print loop in Forth (like in Lisp), so that you can program and
debug a system over RS232.

--
Frank Buss, fb(a)frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Martin Schoeberl on 22 Aug 2006 05:57

>
> Forth looks interesting, too: http://www.ultratechnology.com/f21cpu.html
>
and Java also: http://www.jopdesign.com/

could not resist ;-)

Martin

From: Frank Buss on 22 Aug 2006 06:16

Martin Schoeberl wrote:

> and Java also: http://www.jopdesign.com/

You have tested both: a "normal" instruction set and a stack machine. For
the stack machine you wrote that it is two times faster. What about code
size and the size of the core?

I've downloaded your code and looks like it is implemented very close to
the hardware instead of using arbitrary VHDL and let the synthesizer decide
how to implement it. A good idea for my implementation :-)

--
Frank Buss, fb(a)frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

| Next | Last
Pages: 1 2 3 4 5 6 7
Prev: JOP as SOPC component
Next: uclinux on spartan-3e starter kit