From: Walter Banks on




> Jim Granville wrote:
>
> > The tiniest CPUs do not need a stack, and interupts do not need to be
> > re-entrant, so a faster context switch is to re-map the Registers, Flags
> > (and even PC ? ) onto a different area in BRAM.
> > You can share this resource by INTs re-map top-down, and calls re-map
> > bottom up - with a hardware trap when they collide :)
>
> Once you get into seeing clearly the relationship between features and
> cost a lot can be removed.
>
> Interrupts can be removed at extremely low cost to applications. Both the
> Microchip PIC12 and Freescale RS08 do not have interrupts. In the
> RS08 C compiler we developed some software IP to where possible
> go into a power down mode and launch execution threads that compiled as
> execution to completion.
>
> The threads are typically short and a as a side effect run to completion
> makes local re-use easy
>
> C compilers implemented for small processors work well with out either
> a data or subroutine return stack. Two of the processors we have written
> compilers for in the last couple years both used an addressable return
> register. Flow control analysis in the compiler make nested subroutines
> user transparent.
>
> The instruction set reduction in the RS08 from the S08 parent had a
> 4-6% impact on application performance.
>
> Walter..

From: Martin Schoeberl on
>> What do you mean with 'very close to the hardware'? I try to
>> avoid vendor specific library elements as much as possible and
>> stay with plain VHDL. If you mean that the VHDL coding style
>> is more hardware oriented, than I agree.
>
> Yes, this was what I mean, e.g. figures 5.6 to 5.9 of your thesis, where
> you describe the processor pipeline with gates and which is implemented
> like this in VHDL. But maybe this is the normal case and I'm just to new to
> VHDL to write and interconnect components in this way.
>
> http://www.jopdesign.com/thesis/thesis.pdf

nice that you read it ;-)

>
>> I started directly
>> in an FPGA implementation and did almost no simulation.
>
> Why not? When I was implementing my CRC32 check for my network core, I've
> tested the algorithm with a VHDL testbench (ethernet packet send and
> receive works at 10 Mbit and 100 Mbit on my Spartan 3E starter kit now).
> The turnaround times are faster with simulation and it is very easy to
> debug it, instead of debugging a synthesized core in hardware. The same was
> true for my DS2432 ROM id reader, where I've written the testbench, first
> and then implemented the reader.
> http://www.frank-buss.de/vhdl/spartan3e.html

Ok, the main reason for not using simulation was just because
I had no ModelSim and the Quartus simulator was a pain (actually
I started with MaxPlus II). However, I wrote my own kind of
debugging device using the printer port on the PC. Clocked the
design with the printer port and read back the interesting
signals with a small state machine. Kind of creasy ;-)

Now, a lot has changed. E.g. ModelSim for Xilinx is free. So
there is now a testbench for JOP available that you can use
with ModelSim XE. For all FPGA specific parts (on-chip memories)
I wrote plain VHDL models. So you can now debug with ModelSim XE
and compile for Altera....

And I agree, simulation can save you a lot of time (and sometimes
waste a lot of time - I still like to look on the code till I
find the issue).

Martin



From: Jim Granville on
Walter Banks wrote:

>
> Jim Granville wrote:
>
>
>>The tiniest CPUs do not need a stack, and interupts do not need to be
>>re-entrant, so a faster context switch is to re-map the Registers, Flags
>>(and even PC ? ) onto a different area in BRAM.
>>You can share this resource by INTs re-map top-down, and calls re-map
>>bottom up - with a hardware trap when they collide :)
>
>
> Once you get into seeing clearly the relationship between features and
> cost a lot can be removed.
>
> Interrupts can be removed at extremely low cost to applications. Both the
> Microchip PIC12 and Freescale RS08 do not have interrupts. In the
> RS08 C compiler we developed some software IP to where possible
> go into a power down mode and launch execution threads that compiled as
> execution to completion.
>
> The threads are typically short and a as a side effect run to completion
> makes local re-use easy
>
> C compilers implemented for small processors work well with out either
> a data or subroutine return stack. Two of the processors we have written
> compilers for in the last couple years both used an assessable return
> register. Flow control analysis in the compiler make nested subroutines
> user transparent.
>
> The instruction set reduction in the RS08 from the S08 parent had a
> 4-6% impact on application performance.
>
> Walter..

Hi Walter,
Have you ever thought about doing a Compiler+FPGA_CPU (+Sim+Debug?)
bundle ?

-jg



From: PeteS on
Frank Buss wrote:
> PeteS wrote:
>
> > Do you want a processor you can simply instantiate, or are you willing
> > to tweak so you get the features you want? If so, you could take one of
> > the less ambitious cores and adjust the instruction set to optimise it
> > for your application.
>
> Adjusting the instruction set to the problem domain is a good idea. I'll
> try to write the functions, first, maybe using domain specific instructions
> (like a block copy command), and then I'll implement the core for it.
>
> --
> Frank Buss, fb(a)frank-buss.de
> http://www.frank-buss.de, http://www.it4-systems.de

I did exactly this in a previous job. Picoblaze was nice, but there
were things it did not have, and conversely things I would never use.

So I did the code (pseudocode first) and then designed the device to do
the necessary functions at the microcode level. Because my problem
domain was very constrained, I needed only 16 instructions (I like it
when I get nice numbers like that as a solution) to do what I needed.

Then I wrote (well, I changed :) an assembler to program it.

Worked very well, and took about half the space of a picoblaze,
including a DMAC engine (excluding the memory interface which was there
anyway).

Cheers

PeteS

From: jacko on

PeteS wrote:
> Frank Buss wrote:
> > PeteS wrote:
> >
> > > Do you want a processor you can simply instantiate, or are you willing
> > > to tweak so you get the features you want? If so, you could take one of
> > > the less ambitious cores and adjust the instruction set to optimise it
> > > for your application.
> >
> > Adjusting the instruction set to the problem domain is a good idea. I'll
> > try to write the functions, first, maybe using domain specific instructions
> > (like a block copy command), and then I'll implement the core for it.
> >
> > --
> > Frank Buss, fb(a)frank-buss.de
> > http://www.frank-buss.de, http://www.it4-systems.de
>
> I did exactly this in a previous job. Picoblaze was nice, but there
> were things it did not have, and conversely things I would never use.
>
> So I did the code (pseudocode first) and then designed the device to do
> the necessary functions at the microcode level. Because my problem
> domain was very constrained, I needed only 16 instructions (I like it
> when I get nice numbers like that as a solution) to do what I needed.
>
> Then I wrote (well, I changed :) an assembler to program it.
>
> Worked very well, and took about half the space of a picoblaze,
> including a DMAC engine (excluding the memory interface which was there
> anyway).
>
> Cheers
>
> PeteS

AHDL for a two register NOP, INC, DEC, WRITE unit

http://indi.joox.net link to quartus II files, BIREGU.bdf

good for interruptable stack pointers