From: kendor on
hello there

for a measuring utility (running @ 100MHZ) I need a counter of 42-bit width
whose value is used by several sub blocks of my design. As a first, somehow
dirty solution I have implemented this like follows. Since this approach
needs quite a huge amount of FFs and leads to long delaytimes (bit 0 to 42)
I am looking for an alternative. I was thinking about using Block RAM
(Spartan3) to reduce routing effort and delaytimes. (see also
http://courses.ece.illinois.edu/ece412/References/datasheets/xapp463.pdf)

Has anyone ever done such a thing or do you have any suggestions on solving
my task?

current code:
-------------------------------------
# i have to use std_logic_unsigned since numeric_std has as integer width
the normal 4 bytes width (32bit - which for 42 bits is not enough ...
overflow,..)

# ...
GENERIC (
t : NATURAL := 42; --! counter width
wd: NATURAL := 5 --! divider (clk/(2*wd))
);

# ...
ARCHITECTURE rtl OF worldtimeCtr IS
SIGNAL cnt: std_logic_vector(t-1 downto 0);
BEGIN
PROCESS(clk,rst)
VARIABLE temp : NATURAL RANGE 0 to wd;
BEGIN
IF(rst='0')THEN
cnt <= (others =>'0');
temp := 0;
ELSIF(clk'event and clk='1')THEN
IF(en='1' and temp = wd)THEN
temp := 0;
cnt <= STD_LOGIC_VECTOR(cnt + 1);
END IF;
temp := temp+1;
END if;

END process;
o_worldtime <= cnt;
END rtl;

# ...
-------------------------------------

thank you in advance

kendor


From: Rob Gaddi on
On Fri, 04 Dec 2009 12:15:24 -0600
"kendor" <jonas.reber(a)bfh.ch> wrote:

> hello there
>
> for a measuring utility (running @ 100MHZ) I need a counter of 42-bit
> width whose value is used by several sub blocks of my design. As a
> first, somehow dirty solution I have implemented this like follows.
> Since this approach needs quite a huge amount of FFs and leads to
> long delaytimes (bit 0 to 42) I am looking for an alternative. I was
> thinking about using Block RAM (Spartan3) to reduce routing effort
> and delaytimes. (see also
> http://courses.ece.illinois.edu/ece412/References/datasheets/xapp463.pdf)
>
> Has anyone ever done such a thing or do you have any suggestions on
> solving my task?
>
> current code:
> -------------------------------------
> # i have to use std_logic_unsigned since numeric_std has as integer
> width the normal 4 bytes width (32bit - which for 42 bits is not
> enough ... overflow,..)
>
> # ...
> GENERIC (
> t : NATURAL := 42; --! counter width
> wd: NATURAL := 5 --! divider (clk/(2*wd))
> );
>
> # ...
> ARCHITECTURE rtl OF worldtimeCtr IS
> SIGNAL cnt: std_logic_vector(t-1 downto 0);
> BEGIN
> PROCESS(clk,rst)
> VARIABLE temp : NATURAL RANGE 0 to wd;
> BEGIN
> IF(rst='0')THEN
> cnt <= (others =>'0');
> temp := 0;
> ELSIF(clk'event and clk='1')THEN
> IF(en='1' and temp = wd)THEN
> temp := 0;
> cnt <= STD_LOGIC_VECTOR(cnt + 1);
> END IF;
> temp := temp+1;
> END if;
>
> END process;
> o_worldtime <= cnt;
> END rtl;
>
> # ...
> -------------------------------------
>
> thank you in advance
>
> kendor
>
>

Another option would be to pipeline the block into, say, 3 segments of
14 bits a piece, so that you don't have that one LONG carry chain
trying to propagate up the whole thing.

Depending on how willing your toolchain is to rebalance registers (ISE
11 _may_ be smart enough), you might just be able to add a few stages
of pipeline delay on the output of the entire 43 bits, and let it push
things around across the logic. Otherwise you'd have to code it
manually, which isn't the end of the world.

--
Rob Gaddi, Highland Technology
Email address is currently out of order
From: Gabor on
On Dec 4, 1:15 pm, "kendor" <jonas.re...(a)bfh.ch> wrote:
> hello there
>
> for a measuring utility (running @ 100MHZ) I need a counter of 42-bit width
> whose value is used by several sub blocks of my design. As a first, somehow
> dirty solution I have implemented this like follows. Since this approach
> needs quite a huge amount of FFs and leads to long delaytimes (bit 0 to 42)
> I am looking for an alternative. I was thinking about using Block RAM
> (Spartan3) to reduce routing effort and delaytimes. (see alsohttp://courses.ece.illinois.edu/ece412/References/datasheets/xapp463.pdf)
>
> Has anyone ever done such a thing or do you have any suggestions on solving
> my task?
>
> current code:
> -------------------------------------
> # i have to use std_logic_unsigned since numeric_std has as integer width
> the normal 4 bytes width (32bit - which for 42 bits is not enough ...
> overflow,..)
>
> # ...
> GENERIC (
>   t : NATURAL := 42;  --! counter width
>   wd: NATURAL := 5    --! divider (clk/(2*wd))
> );
>
> # ...
> ARCHITECTURE rtl OF worldtimeCtr IS
>         SIGNAL cnt: std_logic_vector(t-1 downto 0);
> BEGIN
>         PROCESS(clk,rst)
>                 VARIABLE temp : NATURAL RANGE 0 to wd;
>         BEGIN
>                 IF(rst='0')THEN
>                         cnt <= (others =>'0');
>                         temp := 0;
>                 ELSIF(clk'event and clk='1')THEN
>                         IF(en='1' and temp = wd)THEN
>                            temp := 0;
>                            cnt <= STD_LOGIC_VECTOR(cnt + 1);
>                         END IF;
>                         temp := temp+1;
>                 END if;
>
>         END process;
>         o_worldtime <= cnt;
> END rtl;
>
> # ...
> -------------------------------------
>
> thank you in advance
>
> kendor

If you mean the input clock is running 100 MHz, then after
your prescaler (temp) your 42-bit count runs at 1/6 of 100 MHz
if I read this code correctly? That means the entire counter
has a multicycle propagation delay to itself of about 60 ns.
Did you try adding a from : to style timing constraint to
let the tools realize this?

Regards,
Gabor
From: Curt Johnson on
kendor wrote:
> hello there
>
> for a measuring utility (running @ 100MHZ) I need a counter of 42-bit width
> whose value is used by several sub blocks of my design. As a first, somehow
> dirty solution I have implemented this like follows. Since this approach
> needs quite a huge amount of FFs and leads to long delaytimes (bit 0 to 42)
> I am looking for an alternative. I was thinking about using Block RAM
> (Spartan3) to reduce routing effort and delaytimes. (see also
> http://courses.ece.illinois.edu/ece412/References/datasheets/xapp463.pdf)
>
> Has anyone ever done such a thing or do you have any suggestions on solving
> my task?
>
<snip>
>
> thank you in advance
>
> kendor
>

Do you need a binary output?
Before carry chains, I used linear feedback shift registers for wide
counters and converted to result to binary in software.

Curt
From: Peter Alfke on
I would use the DSP48 circuit. It easily runs at well over 100 MHz in
Spartan6 and much faster in Virtex 5 and 6.
No need for pre-scaling or fancy carry tricks. It's all done for you!
Look at the short description in the Spartan 6 User Guide Lite:

"Each DSP48A1 slice consists of a dedicated 18 - 18 bit two's
complement multiplier and a 48-bit accumulator, both capable of
operating at 250 MHz. The DSP48A1 slice provides extensive pipelining
and extension capabilities that enhance speed and efficiency of many
applications, even beyond digital signal processing, such as wide
dynamic bus shifters, memory address generators, wide bus
multiplexers, and memory-mapped I/O register files.

The accumulator can also be used as a synchronous up/down counter. "

Peter Alfke