Advice on Xilinx Spelunking [FPGA]

Prev: mux behavior
Next: Software bloat (Larkin was right)

From: Brian Drummond on 25 May 2010 20:36

On Tue, 25 May 2010 14:32:59 -0700, Rob Gaddi
<rgaddi(a)technologyhighland.com> wrote:

>I've got a Spartan 6 design that I'm working with under ISE 11.5. A
>code block that I would expect to take up about 200 LUTs is taking 800
>instead. 600 LUTs wouldn't be the end of the world, except I'm planning
>to replicate this block 32 times, which puts me well over the top.
>
>So the question becomes where are all of the LUTs going?

> Then I tried looking
>through the technology schematic instead. The viewer took forever to
>open the schematic, and when I finally got it open it took better than a
>minute any time I wanted to refresh the screen. Needless to say, this
>got me nowhere.

Rather than use the technology viewer, I've had better luck reading the
post-synthesis netlist in a text editor!

I'm not necessarily recommending that approach, but it has its uses. You
could quickly search for the first few instances of "ram_k_hi", then
every instance of "ram_k_hi<whatever>(63) to see if e.g. the LUT RAMs
have been duplicated to give you enough ports.

But my recommendation would be divide and conquer on that block; it's
not large. For example, comment or "generate" out the coefficient
readback module and see how the size changes. Or "generate" out the
whole lot then re-introduce it a block at a time, comparing the synth
result with your expectations.

Have you allowed for the size of the coefficient rams - 3x64-bit as far
as I can tell from the posted code? Or how are the 4 ports of the quad
port RAM organised? With more than 1 write port, that can get complex
and inefficient...

- Brian

From: Rob Gaddi on 25 May 2010 20:45

On 5/25/2010 5:36 PM, Brian Drummond wrote:
> On Tue, 25 May 2010 14:32:59 -0700, Rob Gaddi
> <rgaddi(a)technologyhighland.com> wrote:
>
>> I've got a Spartan 6 design that I'm working with under ISE 11.5. A
>> code block that I would expect to take up about 200 LUTs is taking 800
>> instead. 600 LUTs wouldn't be the end of the world, except I'm planning
>> to replicate this block 32 times, which puts me well over the top.
>>
>> So the question becomes where are all of the LUTs going?
>
>> Then I tried looking
>> through the technology schematic instead. The viewer took forever to
>> open the schematic, and when I finally got it open it took better than a
>> minute any time I wanted to refresh the screen. Needless to say, this
>> got me nowhere.
>
> Rather than use the technology viewer, I've had better luck reading the
> post-synthesis netlist in a text editor!
>
> I'm not necessarily recommending that approach, but it has its uses. You
> could quickly search for the first few instances of "ram_k_hi", then
> every instance of "ram_k_hi<whatever>(63) to see if e.g. the LUT RAMs
> have been duplicated to give you enough ports.
>
> But my recommendation would be divide and conquer on that block; it's
> not large. For example, comment or "generate" out the coefficient
> readback module and see how the size changes. Or "generate" out the
> whole lot then re-introduce it a block at a time, comparing the synth
> result with your expectations.
>
> Have you allowed for the size of the coefficient rams - 3x64-bit as far
> as I can tell from the posted code? Or how are the 4 ports of the quad
> port RAM organised? With more than 1 write port, that can get complex
> and inefficient...
>
> - Brian

The quad port only became a quad port because XST decided to implement
the reset logic on it's own dedicated write port rather than just have
one write port and feed it from an AND gate.

It turns out that, if I just comment out the reset logic, the
utilization drops to 236 LUTs. It must have been implementing something
truly awful to try to get that extra write port in. Why it thought it
needed it in the first place I'll never know, but at least I'm back on
track now.

--
Rob Gaddi, Highland Technology
Email address is currently out of order

From: Nial Stewart on 26 May 2010 04:54

> It turns out that, if I just comment out the reset logic, the utilization drops to 236 LUTs. It
> must have been implementing something truly awful to try to get that extra write port in. Why it
> thought it needed it in the first place I'll never know, but at least I'm back on track now.

Rob, some(/most) templates for inferring RAMs don't work if you have a
reset defined.

Nial.

From: Brian Drummond on 26 May 2010 06:59

On Tue, 25 May 2010 17:45:33 -0700, Rob Gaddi
<rgaddi(a)technologyhighland.com> wrote:

>On 5/25/2010 5:36 PM, Brian Drummond wrote:
>> On Tue, 25 May 2010 14:32:59 -0700, Rob Gaddi
>> <rgaddi(a)technologyhighland.com> wrote:
>>
>>> I've got a Spartan 6 design that I'm working with under ISE 11.5. A
>>> code block that I would expect to take up about 200 LUTs is taking 800
>>> instead.
>> Or how are the 4 ports of the quad
>> port RAM organised? With more than 1 write port, that can get complex
>> and inefficient...

>The quad port only became a quad port because XST decided to implement
>the reset logic on it's own dedicated write port rather than just have
>one write port and feed it from an AND gate.
>
>It turns out that, if I just comment out the reset logic, the
>utilization drops to 236 LUTs.

Glad you found it.
Implementing the reset externally as you described, is the sort of trick
that is occasionally necessary to get round XST limitations.

Or eliminating the reset, and writing all those zeroes across the
wishbone bus.

If you think that XST can be usefully improved in this area, submit a
testcase to Webcase.

- Brian

From: Rob Gaddi on 26 May 2010 12:05

On 5/26/2010 1:54 AM, Nial Stewart wrote:
>> It turns out that, if I just comment out the reset logic, the utilization drops to 236 LUTs. It
>> must have been implementing something truly awful to try to get that extra write port in. Why it
>> thought it needed it in the first place I'll never know, but at least I'm back on track now.
>
>
> Rob, some(/most) templates for inferring RAMs don't work if you have a
> reset defined.
>
>
> Nial.
>

The reset logic was sequential, i.e. reset address 0, then reset address
1, one per clock until the entire thing was done. The intention being
that the entire thing would take place on the normal write port of the
RAM, which wasn't being used while it was in the reset state.
Apparently it didn't work out that way.

--
Rob Gaddi, Highland Technology
Email address is currently out of order

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: mux behavior
Next: Software bloat (Larkin was right)