Please help, Xilinx FIFO problem! [FPGA]

Prev: Trouble with Xilinx DCM - Spartan3
Next: H.264 on Spartan3A DSP

From: Ed McGettigan on 21 Dec 2009 18:12

On Dec 21, 1:12 pm, Antti <antti.luk...(a)googlemail.com> wrote:
> On Dec 21, 10:21 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
>
>
>
>
> > On Dec 21, 11:58 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > On Dec 21, 9:50 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
> > > > On Dec 21, 9:30 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > On Dec 21, 7:20 pm, Ed McGettigan <ed.mcgetti...(a)xilinx.com> wrote:
>
> > > > > > On Dec 21, 3:01 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > > > On Dec 21, 12:56 pm, Symon <symon_bre...(a)hotmail.com> wrote:
>
> > > > > > > > Antti wrote:
>
> > > > > > > > > Xilinx Coregen FIFO, dual clock, most options disable, only FULL EMPTY
> > > > > > > > > flags present.
>
> > > > > > > > > signals at input correct, as expected (checked with ChipScope)
> > > > > > > > > signals at output:
> > > > > > > > > - double value
> > > > > > > > > - missing 1, 2 or 3 values
> > > > > > > > > - FIFO will read out random number of OLD entries, this could be 4
> > > > > > > > > values, or 50% of the FIFO old values
>
> > > > > > > > I know you will have read this.
>
> > > > > > > > Can you think of any reason why the Xilinx work-around wouldn't work
> > > > > > > > because of your specific implementation? It seems to have different
> > > > > > > > work-arounds depending on whether the read clock is faster or slower
> > > > > > > > than the write clock. Do your clocks change frequency?
>
> > > > > > > > Are you sure your clocks don't have any glitches? The reset also?
> > > > > > > > Power's OK? Is your office made of Cobalt 60?
>
> > > > > > > > HTH., Syms.
>
> > > > > > > 1) I entered the clock figures in FIFO16 implementationm, but the
> > > > > > > error also happens with BRAM based FIFO that do not need workarounds
> > > > > > > 2) Clocks DO NOT CHANGE ever, one is MGT recovered clock 125MHz write,
> > > > > > > one is PLB clock 62.5MHz read
> > > > > > > 3) Power OK? Well the problem happens at 2 different sites, hm yes it
> > > > > > > could be still be power problem
>
> > > > > > > 4) My office is not of Cobalt 60, ... and its cold here too
>
> > > > > > > Antti- Hide quoted text -
>
> > > > > > > - Show quoted text -
>
> > > > > > Are you sure that this is a FIFO issue and not something else? Some
> > > > > > things to think about.
>
> > > > > > 1) The recovered clock from the MGT is a bit noisy as it moves as the
> > > > > > CDR moves. Why are you using this instead of the REFCLK source?
>
> > > > > > 2) It seems like you have a PLB core that is reading from the FIFO,
> > > > > > could the problem be in this?
>
> > > > > > Ed McGettigan
> > > > > > --
> > > > > > Xilinx Inc.
>
> > > > > Well the MGT datapath and clock system is not done by me, and the guy
> > > > > says it is OK all the way it is connected.
>
> > > > > yes, It is very unlikely to belive that all THREE types of coregen
> > > > > FIFO's fail with about same symptoms, but in all
> > > > > 3 cased Chipscope sees correct data into fifo, and trash coming out
>
> > > > > the system can span up to 100 boards, all synced to master unit, the
> > > > > local refclk is not fully sync to the clock of
> > > > > the master unit, so I see no way to use this clock to syncronise the
> > > > > fifo?
>
> > > > > Antti
> > > > > PS I just received a attempt to collect the reward, by using non
> > > > > xilinx FIFO implementation, i let you all know
> > > > > the test results
>
> > > > Antti
> > > > If I remember right (I am no longer at Xilinx) the FIFO is NOT
> > > > designed for unequal data width of write and read. (Reason: possible
> > > > ambiguity of Full and EMPTY)
> > > > Since you use two clocks that are roughly 2:1 in frequency, I hope
> > > > that you do not try to have double width on one of the ports.
> > > > The FIFO must have the same width on both ports. You must design the
> > > > width conversion outside the FIFO. That little circuit will be
> > > > synchronous and thus quite simple.
> > > > Peter Alfke
>
> > > well the FIFO is 9b in 9b out so it should work?
> > > at least this is what i hoped...
>
> > > we did not suspect the FIFO as problem at first
> > > so spent LOT of time looking for the problem AROUND the FIFOS
> > > but.. at least based on what i can see from CS snapshots on fifo
> > > inputs and outputs, the only explanation i have is that the FIFO
> > > are just goind mad,
>
> > > of course one option is that its me doing, but i have someone
> > > who is in better shape looking over the code as well, and he
> > > sees no issues there either. I know the FIFOs should work
> > > so there must be some explanation, but so far failing to see it.
>
> > > Antti
> > > PS thank you Peter for the response
>
> > OK, Antti,
> > so you have the same port width, but one clock is about twice as fast
> > as the other.
> > How do you stop the 125 MHz write clock from filling up the FIFO,
> > since you read at only 62 MHz ?
> > I hope you are not gating the clock, but rather run it continuously
> > and use WE to stop the writing.
> > Yes, many of these suggestions are well below your level, but stupid
> > problems need stupid investigations.
> > Cheers
> > Peter
>
> I am level below ground right now the project is just driving me nuts.
> slowly.
> To work for months, and end up with Xilinx saying:
> The man who could have helped you, left Xilinx last friday. Your
> situation is unsupportable.
> Well we got out of that situation.
> To end up in the new ones.
>
> The FIFO is never over filled by design.
> The fiber link is 99% IDLE sending usually only short 10byte packets
> over the link.
>
> For tesing I generate 10 byte pakets with MOUSE so 1 per second so
> there is no doubt
> the FIFO is never near full at all.
>
> Last results:
> - ALL 3 types of Xilinx FIFO's same style of errors, about same error
> rate
> - VHDL FIFO send by CAF reader, uses gray counters, about TEN TIMES
> LESS errors then Xilinx implementation, but still all different types
> of error did occour: missing values, and FIFO outputtin large junk of
> OLD values, that is read pointer changing by some random value
>
> again, I did not design the MGT clocking and the overall MGT
> subsystem, the people who did are either unreachable or unable to
> provide any help beyound saying that the implementation (connection of
> the FIFO) is done properly. It is also what I have figured out so far,
> but.. well somewhere must be problem.
>
> Antti- Hide quoted text -
>
> - Show quoted text -

Since you can't get further on the MGT clocking circuit topology, what
about the 62.5 MHz read clock? How is this generated? It sounds like
it could be glitching.

In one of your other posts, you had mention that ChipScope had shown
that the write data was correct and that the read data wasn't. Did
you have two separate ILA cores with the 125 MHz and 62.5 MHz clocks
when you did this testing?

Ed McGettigan
--
Xilinx Inc

From: John_H on 21 Dec 2009 23:40

On Dec 21, 4:43 pm, Antti <antti.luk...(a)googlemail.com> wrote:
> On Dec 21, 11:29 pm, n...(a)puntnl.niks (Nico Coesel) wrote:
>
>
>
> > Antti <antti.luk...(a)googlemail.com> wrote:
> > >On Dec 21, 3:21=A0pm, John McCaskill <jhmccask...(a)gmail.com> wrote:
> > >> On Dec 21, 5:42=A0am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > >> > On Dec 21, 1:29=A0pm, "maxascent" <maxasc...(a)yahoo.co.uk> wrote:
>
> > >> > > >On Dec 21, 12:32=3DA0pm, "maxascent" <maxasc...(a)yahoo.co.uk> wrote:
> > >> > > >> Well once you have written and tested your own fifo then you would=
> > > have
> > >> > > i=3D
> > >> > > >t
> > >> > > >> for any other project. It seems like you have wasted a lot of time
> > >> > > alread=3D
> > >> > > >y
> > >> > > >> trying to fix the Xilinx version so I dont see that you have anyth=
> > >ing
> > >> > > to
> > >> > > >> loose by creating your own.
>
> > >> > > >> Jon =3DA0 =3DA0 =3DA0 =3DA0
>
> > >> > > >If you REALLY need todo something else, when your time is at absolut=
> > >e
> > >> > > >premium
> > >> > > >And if the system working (except occasional errors about 2 of fiber
> > >> > > >packets are corrupt)
> > >> > > >Then you do not go replacing Xilinx validated FIFO solutions with yo=
> > >ur
> > >> > > >own, if there are other options.
>
> > >> > > >If 2 completly different FIFO implementations both have same error?
> > >> > > >you think 3rd one would instantly work? Could be, yes.
>
> > >> > > >Antti
>
> > >> > > In my opinion people tend to use coregen far too often. Looking throu=
> > >gh
> > >> > > some of Xilinx code it is awfull. I went down the route of writing my=
> > > own
> > >> > > fifos not because I had a problem with Xilinx fifos but because I bel=
> > >ieve a
> > >> > > fifo written by myself is a lot more flexible and simulates faster th=
> > >an the
> > >> > > Xilinx version. I also know to as good a degree as I can test that it=
> > > will
> > >> > > work 100%.
> > >> > > I dont really think you can say that their fifos have been validated =
> > >100%
> > >> > > if they have to release patches for them.
>
> > >> > > Jon =A0 =A0 =A0 =A0
>
> > >> > Dear Jon,
>
> > >> > I do not feel to be in health right now to write this fifo, so here is
> > >> > the deal:
>
> > >> > =A0 component mgt_fifo
> > >> > =A0 =A0 port (
> > >> > =A0 =A0 =A0 din =A0 =A0: in =A0std_logic_vector(8 downto 0);
> > >> > =A0 =A0 =A0 rd_clk : in =A0std_logic;
> > >> > =A0 =A0 =A0 rd_en =A0: in =A0std_logic;
> > >> > =A0 =A0 =A0 rst =A0 =A0: in =A0std_logic;
> > >> > =A0 =A0 =A0 wr_clk : in =A0std_logic;
> > >> > =A0 =A0 =A0 wr_en =A0: in =A0std_logic;
> > >> > =A0 =A0 =A0 dout =A0 : out std_logic_vector(8 downto 0);
> > >> > =A0 =A0 =A0 empty =A0: out std_logic;
> > >> > =A0 =A0 =A0 full =A0 : out std_logic);
> > >> > =A0 end component;
>
> > >> > if you can write fifo that i can "drop in" and the Xilinx FIFO error
> > >> > is gone,
> > >> > then i will stand up, go to postal office and send you 1000 EUR by
> > >> > western union.
> > >> > If 1000 EUR is not enough, name your price, i will consider it.
> > >> > there is no price on the health of our family
>
> > >> > condition is: DROP IN, WORKS, if i need to troubleshoot, then no pay.
>
> > >> > Antti
>
> > >> Hello Antti,
>
> > >> If you want to try a different implementation of a FIFO, you can get
> > >> the one that the FSL bus uses out of the EDK pcores directory at C:
> > >> \Xilinx\11.1\EDK\hw\XilinxProcessorIPLib\pcores\fsl_v20_v2_11_a\hdl
> > >> \vhdl.
>
> > >> There are multiple implementations, including an async BRAM based one
> > >> that has the same ports as you list above, except that it uses exist
> > >> instead of empty on the read port.
>
> > >> That said, I don't expect a third implementation to work instantly
> > >> when the previous two implementations had the same error. =A0This FIFO
> > >> has the full source to it, so it is straight forward to see how it
> > >> works, and add ChipScope to observe what is happening around the time
> > >> of the error.
>
> > >> If you have not used it before, FPGA editor has the ability to find a
> > >> ChipScope ILA core, and change what is connected to it. That can make
> > >> it much quicker to follow the trail of clues since you avoid having to
> > >> go through a full place and route every time you want to look at
> > >> something different.
>
> > >> Is your 62.5 MHz clock a divided version of the 125 MHz clock? You
> > >> mention that the 125 MHz is the recovered clock from the MGT, but
> > >> there are other options. =A0When we did our GigE interface, we used a
> > >> 125 MHz clock from the MGT, but it was not the recovered clock, but
> > >> the local MGT PLL. =A0This let us use the same 125 MHz clock for all
> > >> four GigE interfaces and a PMCD to generate a 62.5 MHz clock that is
> > >> phase aligned with the 125 MHz clock.
>
> > >> Regards,
>
> > >> John McCaskillwww.FasterTechnology.com
>
> > >Hi
>
> > >I have tried all 3 variants possible with coregen,
> > >all 3 have similar errors
>
> > >and no, the clocks are not divided version, the 125MHz comes from
> > >master over fiber
> > >the master could be 100 hops away, the 62.5mhz is derived from local
> > >oscillator
>
> > >so the frequencier are very close but not synchron
>
> > >Antti
> > >who has to give up, at least for a while :(
> > >good advice still welcome, if there is any hope or idea how to fix the
> > >issue
> > >and yes it could be power supply issue at the end of the day also
>
> > I always write my own fifo's to keep things simple. I keep a write
> > pointer, read pointer and number of elements counter in the domain
> > with the highest clock frequency. I don't cross the clock domain
> > inside the fifo instead I create an interface which does the clock
> > domain crossing. I also use an early full signal (say max. elements -X
> > depending on the expected latency). This allows for fast FIFO's (no
> > cray code counters) with very little logic.
>
> > The control logic looks like this:
>
> > if read then read_ptr++;
> > if write then write_ptr++;
> > if (read=true and write=false) num_elements--;
> > if (write=true and read=false) num_elements++;
>
> > if (num_elements>=(MAX_ELEMENTS-X)) full=true; else full=false;
> > if (num_elements==0) empty=true;
>
> > The external logic should prohibit itself from reading/writing fifo
> > when its empty or full.
>
> > Besides: could your problem be a timing constraint problem? Did you
> > specify the amount of time signals may travel from one clock domain to
> > the other? The Xilinx tools are not doing this automatically!
>
> > --
> > Failure does not prove something is impossible, failure simply
> > indicates you are not using the right tools...
> > "If it doesn't fit, use a bigger hammer!"
> > --------------------------------------------------------------
>
> hi
>
> I was already thinking of writing "simplified FIFO" that is would
> work under the conditions it is used, the read is done by PPC software
> polling so never too often
>
> well the clock domains are fully async, so the clock edges of the read-
> write
> can have any phase they like
>
> so I assumed if the read and write clock are constrained then it is
> enough?
>
> Antti

Sometimes the simpler things can get in the way of complex issues.

Are you certain your read enable and write enables are showing up
relative to the correct data?
It seems some people expect the read enable to indicate the valid data
is being removed from the FIFO while others believe the read enable
should produce valid data on the following clock.

Double check where the documentation says the valid data should be
relative to the enable pulse especially for the read, but check the
write as well.
___

How deep do you want your FIFO?
Is latency an issue?
Do you want rd_en to indicate you're taking valid data or that the
next clock is valid?
You want wr_en to be present in the same clock cycle as the din,
right?

Long time no post (partly because I miss having a real newsreader),
- John_H

From: Antti on 21 Dec 2009 23:53

On Dec 22, 1:12 am, Ed McGettigan <ed.mcgetti...(a)xilinx.com> wrote:
> On Dec 21, 1:12 pm, Antti <antti.luk...(a)googlemail.com> wrote:
>
>
>
>
>
> > On Dec 21, 10:21 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
> > > On Dec 21, 11:58 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > On Dec 21, 9:50 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
> > > > > On Dec 21, 9:30 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > > On Dec 21, 7:20 pm, Ed McGettigan <ed.mcgetti...(a)xilinx.com> wrote:
>
> > > > > > > On Dec 21, 3:01 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > > > > On Dec 21, 12:56 pm, Symon <symon_bre...(a)hotmail.com> wrote:
>
> > > > > > > > > Antti wrote:
>
> > > > > > > > > > Xilinx Coregen FIFO, dual clock, most options disable, only FULL EMPTY
> > > > > > > > > > flags present.
>
> > > > > > > > > > signals at input correct, as expected (checked with ChipScope)
> > > > > > > > > > signals at output:
> > > > > > > > > > - double value
> > > > > > > > > > - missing 1, 2 or 3 values
> > > > > > > > > > - FIFO will read out random number of OLD entries, this could be 4
> > > > > > > > > > values, or 50% of the FIFO old values
>
> > > > > > > > > I know you will have read this.
>
> > > > > > > > > Can you think of any reason why the Xilinx work-around wouldn't work
> > > > > > > > > because of your specific implementation? It seems to have different
> > > > > > > > > work-arounds depending on whether the read clock is faster or slower
> > > > > > > > > than the write clock. Do your clocks change frequency?
>
> > > > > > > > > Are you sure your clocks don't have any glitches? The reset also?
> > > > > > > > > Power's OK? Is your office made of Cobalt 60?
>
> > > > > > > > > HTH., Syms.
>
> > > > > > > > 1) I entered the clock figures in FIFO16 implementationm, but the
> > > > > > > > error also happens with BRAM based FIFO that do not need workarounds
> > > > > > > > 2) Clocks DO NOT CHANGE ever, one is MGT recovered clock 125MHz write,
> > > > > > > > one is PLB clock 62.5MHz read
> > > > > > > > 3) Power OK? Well the problem happens at 2 different sites, hm yes it
> > > > > > > > could be still be power problem
>
> > > > > > > > 4) My office is not of Cobalt 60, ... and its cold here too
>
> > > > > > > > Antti- Hide quoted text -
>
> > > > > > > > - Show quoted text -
>
> > > > > > > Are you sure that this is a FIFO issue and not something else? Some
> > > > > > > things to think about.
>
> > > > > > > 1) The recovered clock from the MGT is a bit noisy as it moves as the
> > > > > > > CDR moves. Why are you using this instead of the REFCLK source?
>
> > > > > > > 2) It seems like you have a PLB core that is reading from the FIFO,
> > > > > > > could the problem be in this?
>
> > > > > > > Ed McGettigan
> > > > > > > --
> > > > > > > Xilinx Inc.
>
> > > > > > Well the MGT datapath and clock system is not done by me, and the guy
> > > > > > says it is OK all the way it is connected.
>
> > > > > > yes, It is very unlikely to belive that all THREE types of coregen
> > > > > > FIFO's fail with about same symptoms, but in all
> > > > > > 3 cased Chipscope sees correct data into fifo, and trash coming out
>
> > > > > > the system can span up to 100 boards, all synced to master unit, the
> > > > > > local refclk is not fully sync to the clock of
> > > > > > the master unit, so I see no way to use this clock to syncronise the
> > > > > > fifo?
>
> > > > > > Antti
> > > > > > PS I just received a attempt to collect the reward, by using non
> > > > > > xilinx FIFO implementation, i let you all know
> > > > > > the test results
>
> > > > > Antti
> > > > > If I remember right (I am no longer at Xilinx) the FIFO is NOT
> > > > > designed for unequal data width of write and read. (Reason: possible
> > > > > ambiguity of Full and EMPTY)
> > > > > Since you use two clocks that are roughly 2:1 in frequency, I hope
> > > > > that you do not try to have double width on one of the ports.
> > > > > The FIFO must have the same width on both ports. You must design the
> > > > > width conversion outside the FIFO. That little circuit will be
> > > > > synchronous and thus quite simple.
> > > > > Peter Alfke
>
> > > > well the FIFO is 9b in 9b out so it should work?
> > > > at least this is what i hoped...
>
> > > > we did not suspect the FIFO as problem at first
> > > > so spent LOT of time looking for the problem AROUND the FIFOS
> > > > but.. at least based on what i can see from CS snapshots on fifo
> > > > inputs and outputs, the only explanation i have is that the FIFO
> > > > are just goind mad,
>
> > > > of course one option is that its me doing, but i have someone
> > > > who is in better shape looking over the code as well, and he
> > > > sees no issues there either. I know the FIFOs should work
> > > > so there must be some explanation, but so far failing to see it.
>
> > > > Antti
> > > > PS thank you Peter for the response
>
> > > OK, Antti,
> > > so you have the same port width, but one clock is about twice as fast
> > > as the other.
> > > How do you stop the 125 MHz write clock from filling up the FIFO,
> > > since you read at only 62 MHz ?
> > > I hope you are not gating the clock, but rather run it continuously
> > > and use WE to stop the writing.
> > > Yes, many of these suggestions are well below your level, but stupid
> > > problems need stupid investigations.
> > > Cheers
> > > Peter
>
> > I am level below ground right now the project is just driving me nuts.
> > slowly.
> > To work for months, and end up with Xilinx saying:
> > The man who could have helped you, left Xilinx last friday. Your
> > situation is unsupportable.
> > Well we got out of that situation.
> > To end up in the new ones.
>
> > The FIFO is never over filled by design.
> > The fiber link is 99% IDLE sending usually only short 10byte packets
> > over the link.
>
> > For tesing I generate 10 byte pakets with MOUSE so 1 per second so
> > there is no doubt
> > the FIFO is never near full at all.
>
> > Last results:
> > - ALL 3 types of Xilinx FIFO's same style of errors, about same error
> > rate
> > - VHDL FIFO send by CAF reader, uses gray counters, about TEN TIMES
> > LESS errors then Xilinx implementation, but still all different types
> > of error did occour: missing values, and FIFO outputtin large junk of
> > OLD values, that is read pointer changing by some random value
>
> > again, I did not design the MGT clocking and the overall MGT
> > subsystem, the people who did are either unreachable or unable to
> > provide any help beyound saying that the implementation (connection of
> > the FIFO) is done properly. It is also what I have figured out so far,
> > but.. well somewhere must be problem.
>
> > Antti- Hide quoted text -
>
> > - Show quoted text -
>
> Since you can't get further on the MGT clocking circuit topology, what
> about the 62.5 MHz read clock? How is this generated? It sounds like
> it could be glitching.
>
> In one of your other posts, you had mention that ChipScope had shown
> that the write data was correct and that the read data wasn't. Did
> you have two separate ILA cores with the 125 MHz and 62.5 MHz clocks
> when you did this testing?
>
> Ed McGettigan
> --
> Xilinx Inc

Peter, Ed, et others

* yes both clock are running all the time

* 125MHz is coming from MGT (recovered clock) there is no gating

* the 62.5mhz clock is PLB clock directly there is no gating

* the 62.5mhz read is generated as edge detect that generates 1 clk
wide pulse on PLB reads

* I used separate ILA cores in different clock domains

* Routing out the 125MHz for external scope would not show the
internal signal same as it is seen by the FIFO module, besides the IOB
characteristics would filter out something and introduce an delay so
the measurement would not be likely to show anything. OTOH the
chipscope inside the FPGA also doesnt tell much about the clock,
except that the data if clocked with the selected clock is latched
properly at the same conditions where FIFO does go crazy.

* The design occupies about 80% of all available resources of Virtex-4
FX40, in order to see the error, I have to start 2 units with GbE to
the first one and fiber link between the two, and send specific
commands to the master unit where the packets are processed by PPC
running custom firmware, that then triggers the condition in the slave
where then the problem can be seen. If anyone says this kind of system
can be simulated with meaningful results I am all ears to know the
setup for this. It doesnt make sense to simulate Xilinx FIFO's they
are almost certain not to exhibit the observed fault behavior.

the possibilities i still see are:

1) one of the clocks has something really "bad" in it, do not even
know what it could be: 1.2Ghz ringing? short bursts of some very high
frequency that do not trigger CS but do trigger FIFO ?
2) Xilinx tools are missing the timing that badly that all 4 type of
fifos inhibit similar error, but parallel connected CS core doesnt?
3) Problem with power supply?
4) Unspecified technical problem?
5) Me needing in sign off from this project to preserve my sanity?

It could be 5, it can be that the problem is there but I constantly
connect CS to some other clock and the FIFO's are one some other clock
that has problem. Well I have asked help, and a fellow engineer has
looked over the clock routing and what he has said is that it is all
OK the way it is right now. Maybe he needs a break as well.

Antti

From: Antti on 22 Dec 2009 00:00

On Dec 22, 6:40 am, John_H <newsgr...(a)johnhandwork.com> wrote:
> On Dec 21, 4:43 pm, Antti <antti.luk...(a)googlemail.com> wrote:
>
>
>
>
>
> > On Dec 21, 11:29 pm, n...(a)puntnl.niks (Nico Coesel) wrote:
>
> > > Antti <antti.luk...(a)googlemail.com> wrote:
> > > >On Dec 21, 3:21=A0pm, John McCaskill <jhmccask...(a)gmail.com> wrote:
> > > >> On Dec 21, 5:42=A0am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > >> > On Dec 21, 1:29=A0pm, "maxascent" <maxasc...(a)yahoo.co.uk> wrote:
>
> > > >> > > >On Dec 21, 12:32=3DA0pm, "maxascent" <maxasc...(a)yahoo.co.uk> wrote:
> > > >> > > >> Well once you have written and tested your own fifo then you would=
> > > > have
> > > >> > > i=3D
> > > >> > > >t
> > > >> > > >> for any other project. It seems like you have wasted a lot of time
> > > >> > > alread=3D
> > > >> > > >y
> > > >> > > >> trying to fix the Xilinx version so I dont see that you have anyth=
> > > >ing
> > > >> > > to
> > > >> > > >> loose by creating your own.
>
> > > >> > > >> Jon =3DA0 =3DA0 =3DA0 =3DA0
>
> > > >> > > >If you REALLY need todo something else, when your time is at absolut=
> > > >e
> > > >> > > >premium
> > > >> > > >And if the system working (except occasional errors about 2 of fiber
> > > >> > > >packets are corrupt)
> > > >> > > >Then you do not go replacing Xilinx validated FIFO solutions with yo=
> > > >ur
> > > >> > > >own, if there are other options.
>
> > > >> > > >If 2 completly different FIFO implementations both have same error?
> > > >> > > >you think 3rd one would instantly work? Could be, yes.
>
> > > >> > > >Antti
>
> > > >> > > In my opinion people tend to use coregen far too often. Looking throu=
> > > >gh
> > > >> > > some of Xilinx code it is awfull. I went down the route of writing my=
> > > > own
> > > >> > > fifos not because I had a problem with Xilinx fifos but because I bel=
> > > >ieve a
> > > >> > > fifo written by myself is a lot more flexible and simulates faster th=
> > > >an the
> > > >> > > Xilinx version. I also know to as good a degree as I can test that it=
> > > > will
> > > >> > > work 100%.
> > > >> > > I dont really think you can say that their fifos have been validated =
> > > >100%
> > > >> > > if they have to release patches for them.
>
> > > >> > > Jon =A0 =A0 =A0 =A0
>
> > > >> > Dear Jon,
>
> > > >> > I do not feel to be in health right now to write this fifo, so here is
> > > >> > the deal:
>
> > > >> > =A0 component mgt_fifo
> > > >> > =A0 =A0 port (
> > > >> > =A0 =A0 =A0 din =A0 =A0: in =A0std_logic_vector(8 downto 0);
> > > >> > =A0 =A0 =A0 rd_clk : in =A0std_logic;
> > > >> > =A0 =A0 =A0 rd_en =A0: in =A0std_logic;
> > > >> > =A0 =A0 =A0 rst =A0 =A0: in =A0std_logic;
> > > >> > =A0 =A0 =A0 wr_clk : in =A0std_logic;
> > > >> > =A0 =A0 =A0 wr_en =A0: in =A0std_logic;
> > > >> > =A0 =A0 =A0 dout =A0 : out std_logic_vector(8 downto 0);
> > > >> > =A0 =A0 =A0 empty =A0: out std_logic;
> > > >> > =A0 =A0 =A0 full =A0 : out std_logic);
> > > >> > =A0 end component;
>
> > > >> > if you can write fifo that i can "drop in" and the Xilinx FIFO error
> > > >> > is gone,
> > > >> > then i will stand up, go to postal office and send you 1000 EUR by
> > > >> > western union.
> > > >> > If 1000 EUR is not enough, name your price, i will consider it.
> > > >> > there is no price on the health of our family
>
> > > >> > condition is: DROP IN, WORKS, if i need to troubleshoot, then no pay.
>
> > > >> > Antti
>
> > > >> Hello Antti,
>
> > > >> If you want to try a different implementation of a FIFO, you can get
> > > >> the one that the FSL bus uses out of the EDK pcores directory at C:
> > > >> \Xilinx\11.1\EDK\hw\XilinxProcessorIPLib\pcores\fsl_v20_v2_11_a\hdl
> > > >> \vhdl.
>
> > > >> There are multiple implementations, including an async BRAM based one
> > > >> that has the same ports as you list above, except that it uses exist
> > > >> instead of empty on the read port.
>
> > > >> That said, I don't expect a third implementation to work instantly
> > > >> when the previous two implementations had the same error. =A0This FIFO
> > > >> has the full source to it, so it is straight forward to see how it
> > > >> works, and add ChipScope to observe what is happening around the time
> > > >> of the error.
>
> > > >> If you have not used it before, FPGA editor has the ability to find a
> > > >> ChipScope ILA core, and change what is connected to it. That can make
> > > >> it much quicker to follow the trail of clues since you avoid having to
> > > >> go through a full place and route every time you want to look at
> > > >> something different.
>
> > > >> Is your 62.5 MHz clock a divided version of the 125 MHz clock? You
> > > >> mention that the 125 MHz is the recovered clock from the MGT, but
> > > >> there are other options. =A0When we did our GigE interface, we used a
> > > >> 125 MHz clock from the MGT, but it was not the recovered clock, but
> > > >> the local MGT PLL. =A0This let us use the same 125 MHz clock for all
> > > >> four GigE interfaces and a PMCD to generate a 62.5 MHz clock that is
> > > >> phase aligned with the 125 MHz clock.
>
> > > >> Regards,
>
> > > >> John McCaskillwww.FasterTechnology.com
>
> > > >Hi
>
> > > >I have tried all 3 variants possible with coregen,
> > > >all 3 have similar errors
>
> > > >and no, the clocks are not divided version, the 125MHz comes from
> > > >master over fiber
> > > >the master could be 100 hops away, the 62.5mhz is derived from local
> > > >oscillator
>
> > > >so the frequencier are very close but not synchron
>
> > > >Antti
> > > >who has to give up, at least for a while :(
> > > >good advice still welcome, if there is any hope or idea how to fix the
> > > >issue
> > > >and yes it could be power supply issue at the end of the day also
>
> > > I always write my own fifo's to keep things simple. I keep a write
> > > pointer, read pointer and number of elements counter in the domain
> > > with the highest clock frequency. I don't cross the clock domain
> > > inside the fifo instead I create an interface which does the clock
> > > domain crossing. I also use an early full signal (say max. elements -X
> > > depending on the expected latency). This allows for fast FIFO's (no
> > > cray code counters) with very little logic.
>
> > > The control logic looks like this:
>
> > > if read then read_ptr++;
> > > if write then write_ptr++;
> > > if (read=true and write=false) num_elements--;
> > > if (write=true and read=false) num_elements++;
>
> > > if (num_elements>=(MAX_ELEMENTS-X)) full=true; else full=false;
> > > if (num_elements==0) empty=true;
>
> > > The external logic should prohibit itself from reading/writing fifo
> > > when its empty or full.
>
> > > Besides: could your problem be a timing constraint problem? Did you
> > > specify the amount of time signals may travel from one clock domain to
> > > the other? The Xilinx tools are not doing this automatically!
>
> > > --
> > > Failure does not prove something is impossible, failure simply
> > > indicates you are not using the right tools...
> > > "If it doesn't fit, use a bigger hammer!"
> > > --------------------------------------------------------------
>
> > hi
>
> > I was already thinking of writing "simplified FIFO" that is would
> > work under the conditions it is used, the read is done by PPC software
> > polling so never too often
>
> > well the clock domains are fully async, so the clock edges of the read-
> > write
> > can have any phase they like
>
> > so I assumed if the read and write clock are constrained then it is
> > enough?
>
> > Antti
>
> Sometimes the simpler things can get in the way of complex issues.
>
> Are you certain your read enable and write enables are showing up
> relative to the correct data?
> It seems some people expect the read enable to indicate the valid data
> is being removed from the FIFO while others believe the read enable
> should produce valid data on the following clock.
>
> Double check where the documentation says the valid data should be
> relative to the enable pulse especially for the read, but check the
> write as well.
> ___
>
> How deep do you want your FIFO?
> Is latency an issue?
> Do you want rd_en to indicate you're taking valid data or that the
> next clock is valid?
> You want wr_en to be present in the same clock cycle as the din,
> right?
>
> Long time no post (partly because I miss having a real newsreader),
> - John_H

Hi John,

1 the FIFO is supposed to be SIMPLEST possible MGT receiver, FIFO
wr_en is active when the incoming char is not IDLE.
2 Latency is absolutly NO issue, PPC is pulling the data extremly slow
anyway :(
3 rd_en almost do not care, well currently it is wrong, 1 clock too
late so PPC doesnt pull the last value from fifo (it is pulled when
new data comes in), but this minor issue does really not explain the
error where the fifo reads out out half of the old values

Antti

From: John McCaskill on 22 Dec 2009 00:02

On Dec 21, 3:12 pm, Antti <antti.luk...(a)googlemail.com> wrote:
> On Dec 21, 10:21 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
>
>
> > On Dec 21, 11:58 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > On Dec 21, 9:50 pm, Peter Alfke <al...(a)sbcglobal.net> wrote:
>
> > > > On Dec 21, 9:30 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > On Dec 21, 7:20 pm, Ed McGettigan <ed.mcgetti...(a)xilinx.com> wrote:
>
> > > > > > On Dec 21, 3:01 am, Antti <antti.luk...(a)googlemail.com> wrote:
>
> > > > > > > On Dec 21, 12:56 pm, Symon <symon_bre...(a)hotmail.com> wrote:
>
> > > > > > > > Antti wrote:
>
> > > > > > > > > Xilinx Coregen FIFO, dual clock, most options disable, only FULL EMPTY
> > > > > > > > > flags present.
>
> > > > > > > > > signals at input correct, as expected (checked with ChipScope)
> > > > > > > > > signals at output:
> > > > > > > > > - double value
> > > > > > > > > - missing 1, 2 or 3 values
> > > > > > > > > - FIFO will read out random number of OLD entries, this could be 4
> > > > > > > > > values, or 50% of the FIFO old values
>
> > > > > > > > I know you will have read this.
>
> > > > > > > > Can you think of any reason why the Xilinx work-around wouldn't work
> > > > > > > > because of your specific implementation? It seems to have different
> > > > > > > > work-arounds depending on whether the read clock is faster or slower
> > > > > > > > than the write clock. Do your clocks change frequency?
>
> > > > > > > > Are you sure your clocks don't have any glitches? The reset also?
> > > > > > > > Power's OK? Is your office made of Cobalt 60?
>
> > > > > > > > HTH., Syms.
>
> > > > > > > 1) I entered the clock figures in FIFO16 implementationm, but the
> > > > > > > error also happens with BRAM based FIFO that do not need workarounds
> > > > > > > 2) Clocks DO NOT CHANGE ever, one is MGT recovered clock 125MHz write,
> > > > > > > one is PLB clock 62.5MHz read
> > > > > > > 3) Power OK? Well the problem happens at 2 different sites, hm yes it
> > > > > > > could be still be power problem
>
> > > > > > > 4) My office is not of Cobalt 60, ... and its cold here too
>
> > > > > > > Antti- Hide quoted text -
>
> > > > > > > - Show quoted text -
>
> > > > > > Are you sure that this is a FIFO issue and not something else? Some
> > > > > > things to think about.
>
> > > > > > 1) The recovered clock from the MGT is a bit noisy as it moves as the
> > > > > > CDR moves. Why are you using this instead of the REFCLK source?
>
> > > > > > 2) It seems like you have a PLB core that is reading from the FIFO,
> > > > > > could the problem be in this?
>
> > > > > > Ed McGettigan
> > > > > > --
> > > > > > Xilinx Inc.
>
> > > > > Well the MGT datapath and clock system is not done by me, and the guy
> > > > > says it is OK all the way it is connected.
>
> > > > > yes, It is very unlikely to belive that all THREE types of coregen
> > > > > FIFO's fail with about same symptoms, but in all
> > > > > 3 cased Chipscope sees correct data into fifo, and trash coming out
>
> > > > > the system can span up to 100 boards, all synced to master unit, the
> > > > > local refclk is not fully sync to the clock of
> > > > > the master unit, so I see no way to use this clock to syncronise the
> > > > > fifo?
>
> > > > > Antti
> > > > > PS I just received a attempt to collect the reward, by using non
> > > > > xilinx FIFO implementation, i let you all know
> > > > > the test results
>
> > > > Antti
> > > > If I remember right (I am no longer at Xilinx) the FIFO is NOT
> > > > designed for unequal data width of write and read. (Reason: possible
> > > > ambiguity of Full and EMPTY)
> > > > Since you use two clocks that are roughly 2:1 in frequency, I hope
> > > > that you do not try to have double width on one of the ports.
> > > > The FIFO must have the same width on both ports. You must design the
> > > > width conversion outside the FIFO. That little circuit will be
> > > > synchronous and thus quite simple.
> > > > Peter Alfke
>
> > > well the FIFO is 9b in 9b out so it should work?
> > > at least this is what i hoped...
>
> > > we did not suspect the FIFO as problem at first
> > > so spent LOT of time looking for the problem AROUND the FIFOS
> > > but.. at least based on what i can see from CS snapshots on fifo
> > > inputs and outputs, the only explanation i have is that the FIFO
> > > are just goind mad,
>
> > > of course one option is that its me doing, but i have someone
> > > who is in better shape looking over the code as well, and he
> > > sees no issues there either. I know the FIFOs should work
> > > so there must be some explanation, but so far failing to see it.
>
> > > Antti
> > > PS thank you Peter for the response
>
> > OK, Antti,
> > so you have the same port width, but one clock is about twice as fast
> > as the other.
> > How do you stop the 125 MHz write clock from filling up the FIFO,
> > since you read at only 62 MHz ?
> > I hope you are not gating the clock, but rather run it continuously
> > and use WE to stop the writing.
> > Yes, many of these suggestions are well below your level, but stupid
> > problems need stupid investigations.
> > Cheers
> > Peter
>
> I am level below ground right now the project is just driving me nuts.
> slowly.
> To work for months, and end up with Xilinx saying:
> The man who could have helped you, left Xilinx last friday. Your
> situation is unsupportable.
> Well we got out of that situation.
> To end up in the new ones.
>
> The FIFO is never over filled by design.
> The fiber link is 99% IDLE sending usually only short 10byte packets
> over the link.
>
> For tesing I generate 10 byte pakets with MOUSE so 1 per second so
> there is no doubt
> the FIFO is never near full at all.
>
> Last results:
> - ALL 3 types of Xilinx FIFO's same style of errors, about same error
> rate
> - VHDL FIFO send by CAF reader, uses gray counters, about TEN TIMES
> LESS errors then Xilinx implementation, but still all different types
> of error did occour: missing values, and FIFO outputtin large junk of
> OLD values, that is read pointer changing by some random value
>
> again, I did not design the MGT clocking and the overall MGT
> subsystem, the people who did are either unreachable or unable to
> provide any help beyound saying that the implementation (connection of
> the FIFO) is done properly. It is also what I have figured out so far,
> but.. well somewhere must be problem.
>
> Antti

Hello Antti,

With four different FIFOs all failing, it is not likely that they are
the source of the problem, just where the symptoms are showing up, as
if you did not already know that.

If you still want suggestions, here are a few.

First, I always consider having an error condition I can trigger on to
be worth its weight in gold and you apparently have one in the FIFO.
Put in ChipScope with multiple ILAs observing one of the FIFOs that
you have source code for. Use what ever you are currently triggering
on to trigger the other ILAs. Put one on the write clock domain, and
one on the read clock domain. Have them look at all of the IOs, as
well as the counters and other logic in the FIFO. I doubt that you
will find a problem with the FIFO, but something will look wrong and
give you a clue to follow.

Also use separate ILAs to watch the read and write clocks. I am
always suspicious of IO clocks, I have seen too many problems with
them. If one of those clocks is having a problem, and you are using
that clock as the clock for the ILA, you will not see the clock
problem with that ILA. Since you are using the recovered clock instead
of the reference clock (which you can do, and is how we do it), I
would pay extra attention to it. Over sample the read and write
clocks by either using one faster clock, or multiple ILAs running on
multiple phases of a faster clock. On a Virtex-4FX, we have multiple
MGTs/EMACs running GigE. We use the 125 MHz reference clock instead
of the recovered clock so we only have one 125 MHz clock to deal
with. We feed it through a PMCD to generate the 62.5 MHz clock so
that they are not asynchronous. That give us a bit less to have to
deal with.

Do you have access to a digital storage oscilloscope? If so, run the
ILA trigger out of the FPGA and use that to trigger the scope. Use it
to look at the clocks and power supplies, and anything else that the
other test turned up.

Use the timing analyzer to look for unconstrained paths. Look for any
cross clock domain buses that have more than a cycle of skew on them.
I have not seen that cause problems yet, but I use from to constraints
to minimize skew to prevent a gray coded bus from having more than a
cycle of skew crossing domains and causing problems. I don't think it
is a high probability, but your symptoms remind me of the time we
wrote our own FIFO that had different read and write widths and
incremented the Gray code counter by two. That would cause two bits to
change at a time, and eventually that would cause it to fail.

Good luck, and remember that it it was easy, it would not be called
hardware.

John McCaskill
www.FasterTechnology.com

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Trouble with Xilinx DCM - Spartan3
Next: H.264 on Spartan3A DSP