From: Nico Coesel on
Gabor <gabor(a)alacron.com> wrote:

>On May 21, 6:19=A0pm, Philip Pemberton <usene...(a)philpem.me.uk> wrote:
>> OK, this is nuts...
>>
>> With ISE Synthesizer set up like this:
>> =A0 Optimisation Goal: =A0 AREA
>> =A0 Optimisation Effort: NORMAL
>>
>> The core works fine (the timing is a little out, but not bad enough to
>> pooch the whole thing). If I set it up like this:
>> =A0 Optimisation Goal: =A0 SPEED
>> =A0 Optimisation Effort: NORMAL
>>
>> Then the whole thing stops working -- it outright fails to read/write the
>> SDRAM. I can access the SDRAM controller's cache (32 bytes of the current
>> page), but accessing an out-of-page address returns garbage.
>>
>> If I do the same thing on Quartus? Well, the timing looks better in SPEED
>> mode, but it still works fine on the DE1.
>>
>> What the *bleep* is going on?
>>
>> --

>As for SPEED vs. AREA, in Xilinx FPGA's you very often
>get the best overall timing results using AREA optimization
>rather than speed. This is probably because the route
>portion of your total path delay is large. This shows up
>in larger designs and larger parts especially since the
>worst case routing delays grow with the design size.

Actually this is a bit of black art. I also get good results by
adjusting the 'pack factor' (IIRC) which puts related logic closer
together. IMHO it takes some trial and error to find the optimum place
& route settings for a design which gets close to the limits of the
FPGA regarding speed and/or size.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico(a)nctdevpuntnl (punt=.)
--------------------------------------------------------------
From: Philip Pemberton on
On Sat, 22 May 2010 20:11:25 -0700, Gabor wrote:

> As others have mentioned, you probably have some unconstrained paths
> causing timing violations. [...]

OK, I've just set up these constraints:

#Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/21
NET "CLOCK" TNM_NET = CLOCK;
TIMESPEC TS_CLOCK = PERIOD "CLOCK" 25 MHz HIGH 50%;
#Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23
INST "SDRAM_A<0>" TNM = sdram_outs;
INST "SDRAM_A<1>" TNM = sdram_outs;
INST "SDRAM_A<2>" TNM = sdram_outs;
INST "SDRAM_A<3>" TNM = sdram_outs;
INST "SDRAM_A<4>" TNM = sdram_outs;
INST "SDRAM_A<5>" TNM = sdram_outs;
INST "SDRAM_A<6>" TNM = sdram_outs;
INST "SDRAM_A<7>" TNM = sdram_outs;
INST "SDRAM_A<8>" TNM = sdram_outs;
INST "SDRAM_A<9>" TNM = sdram_outs;
INST "SDRAM_A<10>" TNM = sdram_outs;
INST "SDRAM_A<11>" TNM = sdram_outs;
INST "SDRAM_BA<0>" TNM = sdram_outs;
INST "SDRAM_BA<1>" TNM = sdram_outs;
INST "SDRAM_CAS_N" TNM = sdram_outs;
INST "SDRAM_CKE" TNM = sdram_outs;
INST "SDRAM_CLK" TNM = sdram_outs;
INST "SDRAM_CS_N" TNM = sdram_outs;
INST "SDRAM_DQ<0>" TNM = sdram_outs;
INST "SDRAM_DQ<1>" TNM = sdram_outs;
INST "SDRAM_DQ<2>" TNM = sdram_outs;
INST "SDRAM_DQ<3>" TNM = sdram_outs;
INST "SDRAM_DQ<4>" TNM = sdram_outs;
INST "SDRAM_DQ<5>" TNM = sdram_outs;
INST "SDRAM_DQ<6>" TNM = sdram_outs;
INST "SDRAM_DQ<7>" TNM = sdram_outs;
INST "SDRAM_DQ<8>" TNM = sdram_outs;
INST "SDRAM_DQ<9>" TNM = sdram_outs;
INST "SDRAM_DQ<10>" TNM = sdram_outs;
INST "SDRAM_DQ<11>" TNM = sdram_outs;
INST "SDRAM_DQ<12>" TNM = sdram_outs;
INST "SDRAM_DQ<13>" TNM = sdram_outs;
INST "SDRAM_DQ<14>" TNM = sdram_outs;
INST "SDRAM_DQ<15>" TNM = sdram_outs;
INST "SDRAM_DQ<16>" TNM = sdram_outs;
INST "SDRAM_DQ<17>" TNM = sdram_outs;
INST "SDRAM_DQ<18>" TNM = sdram_outs;
INST "SDRAM_DQ<19>" TNM = sdram_outs;
INST "SDRAM_DQ<20>" TNM = sdram_outs;
INST "SDRAM_DQ<21>" TNM = sdram_outs;
INST "SDRAM_DQ<22>" TNM = sdram_outs;
INST "SDRAM_DQ<23>" TNM = sdram_outs;
INST "SDRAM_DQ<24>" TNM = sdram_outs;
INST "SDRAM_DQ<25>" TNM = sdram_outs;
INST "SDRAM_DQ<26>" TNM = sdram_outs;
INST "SDRAM_DQ<27>" TNM = sdram_outs;
INST "SDRAM_DQ<28>" TNM = sdram_outs;
INST "SDRAM_DQ<29>" TNM = sdram_outs;
INST "SDRAM_DQ<30>" TNM = sdram_outs;
INST "SDRAM_DQ<31>" TNM = sdram_outs;
INST "SDRAM_DQM<0>" TNM = sdram_outs;
INST "SDRAM_DQM<1>" TNM = sdram_outs;
INST "SDRAM_DQM<2>" TNM = sdram_outs;
INST "SDRAM_DQM<3>" TNM = sdram_outs;
INST "SDRAM_RAS_N" TNM = sdram_outs;
INST "SDRAM_WE_N" TNM = sdram_outs;
#Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23
TIMEGRP "sdram_outs" OFFSET = OUT 10 ns AFTER "CLOCK";
TIMEGRP "sdram_outs" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLOCK";

Now I can build the core with OPTIMIZE=area or OPTIMIZE=speed, and it
works fine.

Question: do these timing constraints look sane? I figured since I'm
using a 270-degree shifted version of a DCM'd version of the input clock,
the timing settings should be around a quarter of Tclk_period (Clk period
is 40ns for 25MHz, so that would be 10ns).

CLOCK is the 25MHz crystal input, MCLK is the output from the first DCM
(a *25, /25 "multiplier" that effectively acts as a buffer and duty cycle
corrector). SDRAM_CLK is an output from the FPGA to the SDRAM, which is
sourced from the CLK270 output of the second DCM.

Thanks,
--
Phil.
usenet10(a)philpem.me.uk
http://www.philpem.me.uk/
If mail bounces, replace "10" with the last two digits of the current year
From: Brian Drummond on
On 23 May 2010 09:14:47 GMT, Philip Pemberton <usenet10(a)philpem.me.uk>
wrote:

>On Sat, 22 May 2010 20:11:25 -0700, Gabor wrote:
>
>> As others have mentioned, you probably have some unconstrained paths
>> causing timing violations. [...]
>
>OK, I've just set up these constraints:

>#Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23
>TIMEGRP "sdram_outs" OFFSET = OUT 10 ns AFTER "CLOCK";
>TIMEGRP "sdram_outs" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLOCK";
>
>Now I can build the core with OPTIMIZE=area or OPTIMIZE=speed, and it
>works fine.
>
>Question: do these timing constraints look sane? I figured since I'm
>using a 270-degree shifted version of a DCM'd version of the input clock,
>the timing settings should be around a quarter of Tclk_period (Clk period
>is 40ns for 25MHz, so that would be 10ns).

Given such a slow clock they look OK.

But seeing that has prompted some memories (it's a few years since I set
up constraints for SDR SDRAM).

The key to getting good I/O timing is to ensure the tools place the I/O
registers in the right place - the IOBs rather than the core logic. Then
there is no routing involved, and the constraints really only act as a
sanity check. (at 200MHz they may alert you to the wrong output
standard)

If some of your registers were in the IOBs and others weren't, the
latter are subject to additional routes of random lengths, and here the
constraints WILL help, by forcing PAR to keep these routes down. (and
10ns should be easily achievable).

Look at the I/O report near the end of the Map Report (.mrp) file.
For each I/O pin you will see a lot of information including the I/O
standard, and the registers in the IOB for that pin. For an output pin
(e.g. address) I want to see OFF or OUTFF in that list. For an I/O pin
(data) I want to see IFF/INFF, OFF/OUTFF and ENBFF which tristates the
pin. (Signal names seem to have changed with tool versions)

Getting what you want can take some fiddling. For example, you may need
to duplicate registers in your code; one to feed the pins and another to
use the signal internally. Then you need to convince the synthesis tool
to leave them alone; apply the "equivalent-register-removal = no"
attribute to the appropriate regs. And check the .MRP file. Loop until
done.

A few tool versions ago, you also needed to replicate the tristate
signal for each ENBFF, and ensure it was the right polarity (active low)
but this may have been improved.

Downside to all this is that while you have REALLY GOOD external
timings, you have lengthened the internal routes by a few ns. So I keep
heavy processing hidden behind a second register where that is likely to
be a problem.

At 25MHz, feel free to ignore all the above, but it may help to see some
of what's going on beneath the hood.

- Brian
From: Philip Pemberton on
On Sun, 23 May 2010 11:07:37 +0100, Brian Drummond wrote:

> Given such a slow clock they look OK.

Always good to know :)

I'm toying with the idea of running the SDRAM controller faster than the
CPU core (the limiter is the CPU -- it manages about 60MHz on a Cyclone2
IIRC; Xst reckons about 47MHz for the entire SoC on a Spartan3A
XC3S700A-4C).

> But seeing that has prompted some memories (it's a few years since I set
> up constraints for SDR SDRAM).

Yeah, it seems a lot of folk have moved onto DDR or DDR2. SDR-SDRAM seems
to have the edge in ease-of-use, but loses out on raw speed. But that
said, neither of them can match an SRAM clock-for-clock because of the
refresh, precharge and select cycles, and the access latency.

Although the caching in the sdram_wb core makes that a bit of a moot
point, especially for sequential WISHBONE accesses.

> Look at the I/O report near the end of the Map Report (.mrp) file. For
> each I/O pin you will see a lot of information including the I/O
> standard, and the registers in the IOB for that pin. For an output pin
> (e.g. address) I want to see OFF or OUTFF in that list. For an I/O pin
> (data) I want to see IFF/INFF, OFF/OUTFF and ENBFF which tristates the
> pin. (Signal names seem to have changed with tool versions)

Oh, that explains a lot!

The "broken" version shows blanks under "Reg(s)" for all the SDRAM pins.
The "working" version shows a mix of "OFF1", "IFF1" and blank (only
SDRAM_CLK and SDRAM_CKE are blank, which is fair enough -- CLK comes from
the DCM, CKE is grounded).

Thanks, I'd looked at the Map report, but previously didn't really know
what I was looking for, which explains why I didn't pick up on the FFs
not being pushed into the IOBs...

It seems I set "Pack I/O Registers into IOBs" to "Yes" on the working
version (which causes A LOT of warnings), while it's set to "Auto" in the
"broken" version. Can I force FFs in the IOBs in the UCF constraints, or
do I need to do that with a "// synthesis IOB=FORCE" constraint in the
Verilog source?

> At 25MHz, feel free to ignore all the above, but it may help to see some
> of what's going on beneath the hood.

Well, I'm trying it out at 25MHz because I figure the lower my master
clock is, the easier it's going to be to make the thing work. Then once
it's working, I can look into making it work on a faster clock. Ideally
I'd like to get it going at 50MHz or so -- a lot of processing is going
to happen in the FPGA (using hardware implementations of the algorithms
I'm using) but the CPU (a hacked up version of the LatticeMico32) will be
doing a lot of the integer work, framebuffer updating, and so on.

Plan #2 is to rig up an LCD controller that can act as a WISHBONE master,
then wire that up to one of the spare master ports on the CONMAX bus
arbiter. Then I can use any area of main RAM as the framebuffer, and do
away with the messy business of having a separate framebuffer RAM.

If any of you guys want to see this code, let me know and I'll stick it
online. It's pretty ropey code, but it might do as an example to show how
to make the LM32 work on non-Lattice hardware (and how to make the
toolchain behave itself).

On a final note: the ISSI datasheet for the RAM chip appears to be
outright WRONG. It specifies 4096 refresh cycles per 64ms, but if the
refresh rate is that low I get data readback errors. If I use the refresh
rate for the Industrial-graded chip (4096 per 32ms), or even 4096 cycles
per 50us, then it works fine... Yes, I'm using a "Commercial" grade part,
not the "Industrial" part. Unless mine has been mismarked....

--
Phil.
usenet10(a)philpem.me.uk
http://www.philpem.me.uk/
If mail bounces, replace "10" with the last two digits of the current year
From: Brian Drummond on
On 23 May 2010 18:21:25 GMT, Philip Pemberton <usenet10(a)philpem.me.uk>
wrote:

>It seems I set "Pack I/O Registers into IOBs" to "Yes" on the working
>version (which causes A LOT of warnings), while it's set to "Auto" in the
>"broken" version. Can I force FFs in the IOBs in the UCF constraints, or
>do I need to do that with a "// synthesis IOB=FORCE" constraint in the
>Verilog source?

UCF is a bit too late for synthesis... the only tool that reads it is
NGDbuild, aka "Translate", which embeds the UCF information in other
files passed downstream.

I don't do Verilog but it makes sense that there's an equivalent to
setting attributes for such things in VHDL. And applying them directly
to the correct signals will save warnings elsewhere...

Be aware that XST is finicky though. Your "FORCE" attributes may merely
result in "constraint is being ignored" warnings unless everything else
lines up right (duplicate regs not being optimised away) so if you don't
get what you expect in the .mrp, check the synth report carefully...

- Brian
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4
Prev: hi
Next: speed grade and temperature grade aren't marked??