From: Antti on
On Dec 22, 7:02 am, John McCaskill <jhmccask...(a)gmail.com> wrote:
> On Dec 21, 3:12 pm, Antti <antti.luk...(a)googlemail.com> wrote:
>
>
>
>
>
> > On Dec 21, 10:21 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
> > > On Dec 21, 11:58 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > On Dec 21, 9:50 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
> > > > > On Dec 21, 9:30 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > > On Dec 21, 7:20 pm, Ed McGettigan <ed.mcgetti...(a)xilinx.com> wrote:
>
> > > > > > > On Dec 21, 3:01 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > > > > On Dec 21, 12:56 pm, Symon <symon_bre...(a)hotmail.com> wrote:
>
> > > > > > > > > Antti wrote:
>
> > > > > > > > > > Xilinx Coregen FIFO, dual clock, most options disable, only FULL EMPTY
> > > > > > > > > > flags present.
>
> > > > > > > > > > signals at input correct, as expected (checked with ChipScope)
> > > > > > > > > > signals at output:
> > > > > > > > > > - double value
> > > > > > > > > > - missing 1, 2 or 3 values
> > > > > > > > > > - FIFO will read out random number of OLD entries, this could be 4
> > > > > > > > > > values, or 50% of the FIFO old values
>
> > > > > > > > > I know you will have read this.
>
> > > > > > > > > Can you think of any reason why the Xilinx work-around wouldn't work
> > > > > > > > > because of your specific implementation? It seems to have different
> > > > > > > > > work-arounds depending on whether the read clock is faster or slower
> > > > > > > > > than the write clock. Do your clocks change frequency?
>
> > > > > > > > > Are you sure your clocks don't have any glitches? The reset also?
> > > > > > > > > Power's OK? Is your office made of Cobalt 60?
>
> > > > > > > > > HTH., Syms.
>
> > > > > > > > 1) I entered the clock figures in FIFO16 implementationm, but the
> > > > > > > > error also happens with BRAM based FIFO that do not need workarounds
> > > > > > > > 2) Clocks DO NOT CHANGE ever, one is MGT recovered clock 125MHz write,
> > > > > > > > one is PLB clock 62.5MHz read
> > > > > > > > 3) Power OK? Well the problem happens at 2 different sites, hm yes it
> > > > > > > > could be still be power problem
>
> > > > > > > > 4) My office is not of Cobalt 60, ... and its cold here too
>
> > > > > > > > Antti- Hide quoted text -
>
> > > > > > > > - Show quoted text -
>
> > > > > > > Are you sure that this is a FIFO issue and not something else?  Some
> > > > > > > things to think about.
>
> > > > > > > 1) The recovered clock from the MGT is a bit noisy as it moves as the
> > > > > > > CDR moves.  Why are you using this instead of the REFCLK source?
>
> > > > > > > 2) It seems like you have a PLB core that is reading from the FIFO,
> > > > > > > could the problem be in this?
>
> > > > > > > Ed McGettigan
> > > > > > > --
> > > > > > > Xilinx Inc.
>
> > > > > > Well the MGT datapath and clock system is not done by me, and the guy
> > > > > > says it is OK all the way it is connected.
>
> > > > > > yes, It is very unlikely to belive that all THREE types of coregen
> > > > > > FIFO's fail with about same symptoms, but in all
> > > > > > 3 cased Chipscope sees correct data into fifo, and trash coming out
>
> > > > > > the system can span up to 100 boards, all synced to master unit, the
> > > > > > local refclk is not fully sync to the clock of
> > > > > > the master unit, so I see no way to use this clock to syncronise the
> > > > > > fifo?
>
> > > > > > Antti
> > > > > > PS I just received a attempt to collect the reward, by using non
> > > > > > xilinx FIFO implementation, i let you all know
> > > > > > the test results
>
> > > > > Antti
> > > > > If I remember right (I am no longer at Xilinx) the FIFO is NOT
> > > > > designed for unequal data width of write and read. (Reason: possible
> > > > > ambiguity of Full and EMPTY)
> > > > > Since you use two clocks that are roughly 2:1 in frequency, I hope
> > > > > that you do not try to have double width on one of the ports.
> > > > > The FIFO must have the same width on both ports. You must design the
> > > > > width conversion outside the FIFO. That little circuit will be
> > > > > synchronous and thus quite simple.
> > > > > Peter Alfke
>
> > > > well the FIFO is 9b in 9b out so it should work?
> > > > at least this is what i hoped...
>
> > > > we did not suspect the FIFO as problem at first
> > > > so spent LOT of time looking for the problem AROUND the FIFOS
> > > > but.. at least based on what i can see from CS snapshots on fifo
> > > > inputs and outputs, the only explanation i have is that the FIFO
> > > > are just goind mad,
>
> > > > of course one option is that its me doing, but i have someone
> > > > who is in better shape looking over the code as well, and he
> > > > sees no issues there either. I know the FIFOs should work
> > > > so there must be some explanation, but so far failing to see it.
>
> > > > Antti
> > > > PS thank you Peter for the response
>
> > > OK, Antti,
> > > so you have the same port width, but one clock is about twice as fast
> > > as the other.
> > > How do you stop the 125 MHz write clock from filling up the FIFO,
> > > since you read at only 62 MHz ?
> > > I hope you are not gating the clock, but rather run it continuously
> > > and use WE to stop the writing.
> > > Yes, many of these suggestions are well below your level, but stupid
> > > problems need stupid investigations.
> > > Cheers
> > > Peter
>
> > I am level below ground right now the project is just driving me nuts.
> > slowly.
> > To work for months, and end up with Xilinx saying:
> > The man who could have helped you, left Xilinx last friday. Your
> > situation is unsupportable.
> > Well we got out of that situation.
> > To end up in the new ones.
>
> > The FIFO is never over filled by design.
> > The fiber link is 99% IDLE sending usually only short 10byte packets
> > over the link.
>
> > For tesing I generate 10 byte pakets with MOUSE so 1 per second so
> > there is no doubt
> > the FIFO is never near full at all.
>
> > Last results:
> > - ALL 3 types of Xilinx FIFO's same style of errors, about same error
> > rate
> > - VHDL FIFO send by CAF reader, uses gray counters, about TEN TIMES
> > LESS errors then Xilinx implementation, but still all different types
> > of error did occour: missing values, and FIFO outputtin large junk of
> > OLD values, that is read pointer changing by some random value
>
> > again, I did not design the MGT clocking and the overall MGT
> > subsystem, the people who did are either unreachable or unable to
> > provide any help beyound saying that the implementation (connection of
> > the FIFO) is done properly. It is also what I have figured out so far,
> > but.. well somewhere must be problem.
>
> > Antti
>
> Hello Antti,
>
> With four different FIFOs all failing, it is not likely that they are
> the source of the problem, just where the symptoms are showing up, as
> if you did not already know that.
>
> If you still want suggestions, here are a few.
>
> First, I always consider having an error condition I can trigger on to
> be worth its weight in gold and you apparently have one in the FIFO.
> Put in ChipScope with multiple ILAs observing one of the FIFOs that
> you have source code for.  Use what ever you are currently triggering
> on to trigger the other ILAs.  Put one on the write clock domain, and
> one on the read clock domain.  Have them look at all of the IOs, as
> well as the counters and other logic in the FIFO.  I doubt that you
> will find a problem with the FIFO, but something will look wrong and
> give you a clue to follow.
>
> Also use separate ILAs to watch the read and write clocks.  I am
> always suspicious of IO clocks, I have seen too many problems with
> them. If one of those clocks is having a problem, and you are using
> that clock as the clock for the ILA, you will not see the clock
> problem with that ILA. Since you are using the recovered clock instead
> of the reference clock (which you can do, and is how we do it), I
> would pay extra attention to it.  Over sample the read and write
> clocks by either using one faster clock, or multiple ILAs running on
> multiple phases of a faster clock.  On a Virtex-4FX, we have multiple
> MGTs/EMACs running GigE.  We use the 125 MHz reference clock instead
> of the recovered clock so we only have one 125 MHz clock to deal
> with.  We feed it through a PMCD to generate the 62.5 MHz clock so
> that they are not asynchronous.   That give us a bit less to have to
> deal with.
>
> Do you have access to a digital storage oscilloscope?  If so,  run the
> ILA trigger out of the FPGA and use that to trigger the scope. Use it
> to look at the clocks and power supplies, and anything else that the
> other test turned up.
>
> Use the timing analyzer to look for unconstrained paths. Look for any
> cross clock domain buses that have more than a cycle of skew on them.
> I have not seen that cause problems yet, but I use from to constraints
> to minimize skew to prevent a gray coded bus from having more than a
> cycle of skew crossing domains and causing problems.  I don't think it
> is a high probability, but your symptoms remind me of the time we
> wrote our own FIFO that had different read and write widths and
> incremented the Gray code counter by two. That would cause two bits to
> change at a time, and eventually that would cause it to fail.
>
> Good luck, and remember that it it was easy, it would not be called
> hardware.
>
> John McCaskillwww.FasterTechnology.com

Thank you John,

Antti



From: glen herrmannsfeldt on
Antti <antti.lukats(a)googlemail.com> wrote:
(big snip)

> 1 the FIFO is supposed to be SIMPLEST possible MGT receiver, FIFO
> wr_en is active when the incoming char is not IDLE.

> 2 Latency is absolutly NO issue, PPC is pulling the data extremly slow
> anyway :(

> 3 rd_en almost do not care, well currently it is wrong, 1 clock too
> late so PPC doesnt pull the last value from fifo (it is pulled when
> new data comes in), but this minor issue does really not explain the
> error where the fifo reads out out half of the old values

If the fifo is empty half the time, then half the time you will
be reading the wrong value.

-- glen
From: Matthieu Michon on
To follow on, here are some of my thoughts:


- I would try to limit the scope of this issue by using a chain of three async identical FIFOs (with the control signals properly forwarded: the point is to make the whole thing transparent, although with increased cycle latency)
[MGT] ---> [FIFO #1] ---> [FIFO #2] ---> [FIFO #3] ---> [PLB]
MGT, FIFO #1 (both ports), inbound port of FIFO #2 @ 125 MHz
Outbound port of FIFO #2, FIFO #3 (both ports), and PLB @ 62.5 MHz
The FIFO #1 and #3 are useless but they may experience the issue you are facing, bringing up interesting facts such as knowing which port is going south.


Also I guess that you already went through the obvious items:

- I would check and re-check __myself__ the clocking scheme inside and outside the FPGA
- Same thing with power-supply
- Check that all I/O pads are LOC'ed (I once had an unconstrained pad due to a typo inside the UCF file, nasty things followed)
- Check that the FIFO reset is performed correctly (all clock stable, FIFO state is idle) and meets the required duration
- A good sleep, cold shower and breakfast are very effective when dealing with though issues !!


--
Matthieu Michon <prenom.nom(a)gmail.com>
From: Brian Drummond on
On Tue, 22 Dec 2009 09:36:39 +0100, Matthieu Michon <prenom.nom(a)gmail.com>
wrote:

>To follow on, here are some of my thoughts:
>
>
>- I would try to limit the scope of this issue by using a chain of three async identical FIFOs (with the control signals properly forwarded: the point is to make the whole thing transparent, although with increased cycle latency)
> [MGT] ---> [FIFO #1] ---> [FIFO #2] ---> [FIFO #3] ---> [PLB]
> MGT, FIFO #1 (both ports), inbound port of FIFO #2 @ 125 MHz
> Outbound port of FIFO #2, FIFO #3 (both ports), and PLB @ 62.5 MHz
>The FIFO #1 and #3 are useless but they may experience the issue you are facing, bringing up interesting facts such as knowing which port is going south.
>
>
>Also I guess that you already went through the obvious items:
>
>- I would check and re-check __myself__ the clocking scheme inside and outside the FPGA

This triggered one thought.

when checking the clocking arrangements, have you done so in the technology
view? (or the post-synthesis netlist, which I find MUCH easier to search?)

Somewhere between ISE7.1 and ISE10.1, XST changed the way it inferred clock
buffers, so that a correct design in 7.1 became incorrect in 10.1...

So if the non-Antti part was verified it a previous existence, and imported to
this design, something may have changed even though the source is identical.

In my case it manifested as a named clock which inferred an IBUFG into (a) logic
and (b) a DCM to generate related clocks - correctly in 7.1.

ISE10.1 inferred the IBUFG for the named clock - to the logic only; taking the
DCM feed from the IBUF part (ahead of the BUFG) - thus the "related" clocks were
skewed ahead of the logic clock by a few ns.

This took a while to find, since it was in stable "proven" code, and shook me up
a bit. It was definitely not what I asked for, but not quite a synthesis bug...

And one case where explicitly instantiating Xilinx-specific black boxes proved
to be necessary.

I have no idea if this is related to your problem, but the weight of evidence
does suggest some common problem causing all the FIFOs to fail.


Alternatively: think about a simple clock domain crosser in registers, (depth =
1) either ahead of or after a synchronous FIFO. (After is easier, because it is
controlled by the slow PLB).

Even if it doesn't work, it gives you a probe point between the FIFO and the
clock crosser, which will hopefully exonerate one of them...

- Brian


From: John_H on
On Dec 22, 12:00 am, Antti <antti.luk...(a)googlemail.com> wrote:
> On Dec 22, 6:40 am, John_H <newsgr...(a)johnhandwork.com> wrote:
>
> > Sometimes the simpler things can get in the way of complex issues.
>
> > Are you certain your read enable and write enables are showing up
> > relative to the correct data?
> > It seems some people expect the read enable to indicate the valid data
> > is being removed from the FIFO while others believe the read enable
> > should produce valid data on the following clock.
>
> > Double check where the documentation says the valid data should be
> > relative to the enable pulse especially for the read, but check the
> > write as well.
> > ___
>
> > How deep do you want your FIFO?
> > Is latency an issue?
> > Do you want rd_en to indicate you're taking valid data or that the
> > next clock is valid?
> > You want wr_en to be present in the same clock cycle as the din,
> > right?
>
> > Long time no post (partly because I miss having a real newsreader),
> > - John_H
>
> Hi John,
>
> 1 the FIFO is supposed to be SIMPLEST possible MGT receiver, FIFO
> wr_en is active when the incoming char is not IDLE.
> 2 Latency is absolutly NO issue, PPC is pulling the data extremly slow
> anyway :(
> 3 rd_en almost do not care, well currently it is wrong, 1 clock too
> late so PPC doesnt pull the last value from fifo (it is pulled when
> new data comes in), but this minor issue does really not explain the
> error where the fifo reads out out half of the old values
>
> Antti

If the rd_en is one cycle off, the data during that cycle is
undefined.

If the rd_en is active for two cycles, the data extracted will be
precisely one cycle off for the rd_en pulses after the first.

The specific FIFO implementation may provide what looks like valid
data - or not - during the first of those consecutive rd_en pulses.

I would *love* to know how much data is "good" versus "bad" with the
rd_en realigned.

- John_H