Block RAM unusually long setup time ? [FPGA]

Prev: Xilinx' partition flow in ISE12.1
Next: Programming Digilent Nexys 2 from Linux

From: John_H on 1 Jun 2010 06:34

You're still concerned about this? Maybe you don't yet understand the
issues from how Gabor explained the situation.

FPGAs have flexible routing resources able to implement generic logic
interconnects. The placement and routing of logic will determine
explicitly how fast the FPGA can possibly run. The earliest days of
FPGA place & route may have seen a stronger attempt at getting "best
times" but runtimes were miserable and results were often short of
complete. A competing tool adopted a "just enough" approach to place
& route, coming up with solutions which meet "at least" the
constraints given to the tools. The quality of results improved
significantly and Xilinx eventually bought the technology.

The "just enough" philosophy results in placements and routes that
meet the constraints given to the tool but do not strive to improve
upon those numbers. If the clock period is such that the registers
feeding the BlockRAM don't need to have the minimum achievable delays,
they typically won't. If you expect to have small setup and hold
times from I/O pins to the BlockRAM, there's a fundamental disconnect:
the setup and hold times are "internal times" for the FPGA and do not
include the system level implementation of the I/O pins and global
clock buffers. If implementing with I/O pins, the tools will try
their best to meet the setup and hold constraints the user provides
but no better and will often adjust the input delays of the various
pins (including the clock) to help attain those numbers.

Understanding the timing models means getting to know the chip better
at the silicon level. If you understand I/O, clocking, CLBs, and
routing, you're well on your way to interpreting timing results
properly. Being able to take the internal timing numbers for an FPGA
and apply those before you design takes a higher level of undestanding
often acquired from reading the FPGA user guide, app notes, and
running through timing analysis with the timing details turned on in
the logic path analysis.

The tool tries to give the user what's needed, not what's "best."
Even then there are limits based on what *can* be implemented within
the constraints of placement and routing.

From: Ed McGettigan on 1 Jun 2010 11:26

On Jun 1, 12:06 am, Sharath Raju <brshar...(a)gmail.com> wrote:
> On May 29, 1:16 am, Sharath Raju <brshar...(a)gmail.com> wrote:
>
>
>
>
>
> > On May 28, 7:05 pm, Gabor <ga...(a)alacron.com> wrote:
>
> > > On May 28, 9:47 am, Sharath Raju <brshar...(a)gmail.com> wrote:
>
> > > > I am afraid I forgot to include the code in the previous email:
>
> > > > DBR : Core512 port map (
> > > > -- Ram A
> > > > ena => ENA,
> > > > enb => ENA,
> > > > wea => WE,
> > > > web => WE,
> > > > ssra => SSR,
> > > > ssrb => SSR,
> > > > clka => CLOCK,
> > > > clkb => CLOCK,
> > > > addra => addr_1,
> > > > addrb => addr_2,
> > > > douta => DOUT(71 downto 36),
> > > > doutb => DOUT(35 downto 0),
> > > > dina => DIN(71 downto 36),
> > > > dinb => DIN(35 downto 0)
> > > > );
>
> > > > -- Address Declaration
> > > > addr_1 <= '0' & ADDR(7 downto 0);
> > > > addr_2 <= '1' & ADDR(7 downto 0);
>
> > > > The code isn't much. Essentially, I am trying to pretend to have a 256
> > > > locations X 72 bits deep memory, whereas the BLOCK RAM is physically a
> > > > 512 locations X 36 bits wide.
>
> > > > On May 28, 6:31 pm, Sharath Raju <brshar...(a)gmail.com> wrote:
>
> > > > > Hello,
>
> > > > > We are working on a project which involves using BLOCK RAMs. Since we
> > > > > were new to Block RAMs, I (my colleague actually) instantiated a BLOCK
> > > > > RAM in VHDL using Xilinx's Block RAM IP core.
>
> > > > > The question is regarding timing:
>
> > > > > The datasheet for the target Spartan 3ADSP XC3SD1800-4 device
> > > > > specifies the best case (setup + hold) time to be less than 1 ns, and
> > > > > the maximum frequency of operation to be 280 MHz. Worst case figures
> > > > > are not specified.
>
> > > > > However, we checked the static timing report and found the setup
> > > > > times for the data, address and control signals to be approximately 4
> > > > > ns.
>
> > > > > Why is there such a substantial difference ?
>
> > > The static timing report includes clock to output delays
> > > of the driver as well as routing delays in addition to
> > > the actual Tsu of the RAM itself. This should be broken
> > > into individual parts and well described in the timing
> > > report. Generally speaking, you should always assume
> > > that routing delays will constitute a significant
> > > portion of your timing budget for any path. According
> > > to Xilinx, the tools target 60% / 40% as a goal for
> > > logic delay / routing delay.
>
> > > HTH,
> > > Gabor
>
> > Thanks gabor .. shall check the static timing report in more detail
> > for the routing and clock to out delays.
>
> I checked the timing report..It explicitly mentions the setup time to
> be about 4ns.
>
> Here is a section of the report:
>
> Data Sheet report:
> -----------------
> All values displayed in nanoseconds (ns)
>
> Setup/Hold to clock CLOCK
> ------------+------------+------------+------------------+--------+
> | Setup to | Hold to | | Clock |
> Source | clk (edge) | clk (edge) |Internal Clock(s) | Phase |
> ------------+------------+------------+------------------+--------+
> ADDR<0> | 0.792(R)| 0.598(R)|CLOCK_BUFGP | 0.000|
> ADDR<1> | 1.335(R)| 0.164(R)|CLOCK_BUFGP | 0.000|
> ADDR<2> | 0.574(R)| 0.773(R)|CLOCK_BUFGP | 0.000|
> ADDR<3> | 1.590(R)| -0.040(R)|CLOCK_BUFGP | 0.000|
> ADDR<4> | 0.729(R)| 0.648(R)|CLOCK_BUFGP | 0.000|
> ADDR<5> | 2.400(R)| -0.688(R)|CLOCK_BUFGP | 0.000|
> ADDR<6> | 2.837(R)| -1.037(R)|CLOCK_BUFGP | 0.000|
> ADDR<7> | 3.441(R)| -1.521(R)|CLOCK_BUFGP | 0.000|
>
> The complete report can be accessed here:http://sites.google.com/site/brsharath/DBR.twr?attredirects=0&d=1
> and here is the source:http://sites.google.com/site/brsharath/512x36.vhd?attredirects=0&d=1- Hide quoted text -
>
> - Show quoted text -

This report is for external IO (your ADDR<*> ports) timing relative to
an external clock (CLOCK_BUFGP). These paths are variable depending
on where the IOs are placed, where the BlockRAM are placed and the net
delays between the two. These paths are not the same as the data
sheet values for the BlockRAM that specify the internal component
timing.

Ed McGettigan
--
Xilinx Inc.

From: Sharath Raju on 3 Jun 2010 11:01

On Jun 1, 3:34 pm, John_H <newsgr...(a)johnhandwork.com> wrote:
> You're still concerned about this? Maybe you don't yet understand the
> issues from how Gabor explained the situation.
>
> FPGAs have flexible routing resources able to implement generic logic
> interconnects. The placement and routing of logic will determine
> explicitly how fast the FPGA can possibly run. The earliest days of
> FPGA place & route may have seen a stronger attempt at getting "best
> times" but runtimes were miserable and results were often short of
> complete. A competing tool adopted a "just enough" approach to place
> & route, coming up with solutions which meet "at least" the
> constraints given to the tools. The quality of results improved
> significantly and Xilinx eventually bought the technology.
>
> The "just enough" philosophy results in placements and routes that
> meet the constraints given to the tool but do not strive to improve
> upon those numbers. If the clock period is such that the registers
> feeding the BlockRAM don't need to have the minimum achievable delays,
> they typically won't. If you expect to have small setup and hold
> times from I/O pins to the BlockRAM, there's a fundamental disconnect:
> the setup and hold times are "internal times" for the FPGA and do not
> include the system level implementation of the I/O pins and global
> clock buffers. If implementing with I/O pins, the tools will try
> their best to meet the setup and hold constraints the user provides
> but no better and will often adjust the input delays of the various
> pins (including the clock) to help attain those numbers.
>
> Understanding the timing models means getting to know the chip better
> at the silicon level. If you understand I/O, clocking, CLBs, and
> routing, you're well on your way to interpreting timing results
> properly. Being able to take the internal timing numbers for an FPGA
> and apply those before you design takes a higher level of undestanding
> often acquired from reading the FPGA user guide, app notes, and
> running through timing analysis with the timing details turned on in
> the logic path analysis.
>
> The tool tries to give the user what's needed, not what's "best."
> Even then there are limits based on what *can* be implemented within
> the constraints of placement and routing.

thanks for your detailed reply!! .. It cleared some misconceptions..

First | Prev |
Pages: 1 2
Prev: Xilinx' partition flow in ISE12.1
Next: Programming Digilent Nexys 2 from Linux