From: radarman on
This isn't strictly a FPGA question, but I figured someone here might
be able to point me in the right direction.

I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
operate the SRAM at 200MHz, so I know the routing needs to be somewhat
careful. (I'm internally "dual-porting" the SRAM, and each port needs
to run at 100MHz)

Right now, I have the SRAM on the flip-side of the board from the
FPGA. The RAM has a rectangular footprint, which means that some of
the traces are proportionally longer than others, but the routing is
fairly tight, with traces between 250 and 750mils. Naturally, every
signal is going through a via in this design, but the vias are
literally right next to the pad, so the top-level trace is practically
non-existent for most signals.

The questions are,
1) Do I need to further tighten these up? I have some room left under
the SRAM to lengthen traces (not much, but I might could improve the
delta by 10-20%),
2) Should I try to make the clock line equal the longest non-clock
signal, or leave it at its natural length, which is about midway
(400mil point-to-point)?
3) With the traces this short, does it still make sense to source
terminate the clock? I'm guessing yes, but the density is getting
pretty high around this thing.

I've only done one other "high speed" design, with a Gig-E PHY, but I
was able to get all of the signals to within +/- 5 mils on that board.
It's also not entirely tested yet, so before I spin another board
running even faster, I'd like to get it right.

Note, this is a personal project, so I'm trying to avoid BGA's.

Thanks!
From: -jg on
On Mar 27, 9:13 am, radarman <jsham...(a)gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might
> be able to point me in the right direction.
>
> I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
> Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
> operate the SRAM at 200MHz, so I know the routing needs to be somewhat
> careful. (I'm internally "dual-porting" the SRAM, and each port needs
> to run at 100MHz)
>
> Right now, I have the SRAM on the flip-side of the board from the
> FPGA. The RAM has a rectangular footprint, which means that some of
> the traces are proportionally longer than others, but the routing is
> fairly tight, with traces between 250 and 750mils. Naturally, every
> signal is going through a via in this design, but the vias are
> literally right next to the pad, so the top-level trace is practically
> non-existent for most signals.
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),
> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?
> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>
> I've only done one other "high speed" design, with a Gig-E PHY, but I
> was able to get all of the signals to within +/- 5 mils on that board.
> It's also not entirely tested yet, so before I spin another board
> running even faster, I'd like to get it right.
>
> Note, this is a personal project, so I'm trying to avoid BGA's.
>
> Thanks!

You can reality check this with a Trace-delay ballpark of "150 ps/
inch and 190 ps/inch".
The clock signal is always the most important, and I have seen
designs generate a CLK, & !CLK to lower EMC.
Everything else should change of the other edge, so balancing only
gets critical on very tight time budgets.
From: KJ on
On Mar 26, 5:13 pm, radarman <jsham...(a)gmail.com> wrote:
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),

6 inches is approximately 1 ns of delay => your 250 - 750 mils is
approximately 40 - 120 ps = 0.41% - 1.2% of the timing budget for one
way, double that for the round trip...worth keeping track of, but
likely not a cause for concern. Clock to output delays and setup time
requirements are going to chew up much more of the timing budget.

> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?

Clock line lengths should be matched to other clock line lengths
(which is not your situation), not other data signals. Leave it at
the natural length.

> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>

A better question to ask yourself is "If I find out later that I need
to terminate the clock, how the heck am I going to do it since I
didn't provision for one?" Viewed that way, the answer should be
obvious.

Series terminate with a ~22 ohm resistor and you'll not have any
worries about signal quality.

Since the runs are so short anyway, I'd suggest surface route only for
the clock since a fair sized percentage of the route will be surface
traces just because of the parts placement you've described so there
is no reason not to make it 100% surface, then the only impedance
discontinuity is the one via that takes you from top to bottom.

Kevin Jennings
From: John_H on
On Mar 26, 5:13 pm, radarman <jsham...(a)gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might
> be able to point me in the right direction.
>
> I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
> Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
> operate the SRAM at 200MHz, so I know the routing needs to be somewhat
> careful. (I'm internally "dual-porting" the SRAM, and each port needs
> to run at 100MHz)
>
> Right now, I have the SRAM on the flip-side of the board from the
> FPGA. The RAM has a rectangular footprint, which means that some of
> the traces are proportionally longer than others, but the routing is
> fairly tight, with traces between 250 and 750mils. Naturally, every
> signal is going through a via in this design, but the vias are
> literally right next to the pad, so the top-level trace is practically
> non-existent for most signals.
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),
> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?
> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>
> I've only done one other "high speed" design, with a Gig-E PHY, but I
> was able to get all of the signals to within +/- 5 mils on that board.
> It's also not entirely tested yet, so before I spin another board
> running even faster, I'd like to get it right.
>
> Note, this is a personal project, so I'm trying to avoid BGA's.
>
> Thanks!

I'm a little surprised that there are no cares about clock versus data
from others. CARE!

Your synchronous SRAM doesn't have a DLL like SDRAMs. There's one
clock that you provide, nothing provided back.

The amount of length matching required is determined by your timing
budget. You NEED to put together a timing budget to make sure your
clock and data are related better than second cousins once removed.

My *opinion* is that the traces are so extremely short that ther would
be little benefit from terminations or matching any better than what
you have. The reality may be that there's no WAY you could get the
speed you're looking for if you generate the clock FROM the FPGA
without some extra work. I believe what I did in my own sync SRAM
hookup was to feed the clock to the memory chip AND take the clock
from the I/O back to the internals so the clock-to-out delay wasn't a
concern; the "input clock" and data aligned. At least they did before
the mapper started routing the signal back through an internal logic
path rather than the actual pad.

What is your clock source? If you have a 200MHz clock feeding both
the SRAM and the FPGA externally, you have a common external reference
to work from. Getting the input sampling and clock to out times to
behave may be difficult. Timing budget!

As for terminations, series terminations are typically used to deal
with reflections. You won't get a reflection off 750 mils. But you
might get a cleaner clock if there's series impedance (even if it
doesn't damp reflections) and/or an AC load impedance for the clock.
Resistors are pretty tiny these days; if it's hobbyist stuff, get a
microscope or get your nose dangerously close to your soldering iron
by using a loop. We used 0402 caps between pads under the balls on a
BGA for decoupling, surely you could put an 0402 or perhaps even a
"large" 0603 inline near the escapes on your QFP.

The timing budget is crucial to your ability to achieve timing for
both reads and writes. The data's there in the SRAM data sheet. You
need to figure out what you need to make it happen for you. You have
clock delays in and out, clock-to-out and setup/hold to deal with as
well as absolute delays and delay skew. It can be done but you won't
get it working without working the budget first.
From: John Adair on
Short is always good but often more important at these speeds is
difference in length or propagation time. Keeping the skew down
between signals allows the use of of a single clock shifting mechanism
to make capture on return relatively simple to implement and even to
make a auto training mechanism.

One triick for these types of devices is to have a clock loop driven
by an I/O and returned by a different I/O. This gives something that
can be used as part of a timing lock loop but tracks changes in the
device I/O for voltage and temperature etc..

John Adair
Enterpoint Ltd. - Home of Raggedstone2. The Spartan-6 PCIe Development
Board.

On 26 Mar, 21:13, radarman <jsham...(a)gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might
> be able to point me in the right direction.
>
> I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
> Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
> operate the SRAM at 200MHz, so I know the routing needs to be somewhat
> careful. (I'm internally "dual-porting" the SRAM, and each port needs
> to run at 100MHz)
>
> Right now, I have the SRAM on the flip-side of the board from the
> FPGA. The RAM has a rectangular footprint, which means that some of
> the traces are proportionally longer than others, but the routing is
> fairly tight, with traces between 250 and 750mils. Naturally, every
> signal is going through a via in this design, but the vias are
> literally right next to the pad, so the top-level trace is practically
> non-existent for most signals.
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),
> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?
> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>
> I've only done one other "high speed" design, with a Gig-E PHY, but I
> was able to get all of the signals to within +/- 5 mils on that board.
> It's also not entirely tested yet, so before I spin another board
> running even faster, I'd like to get it right.
>
> Note, this is a personal project, so I'm trying to avoid BGA's.
>
> Thanks!