From: Peter Alfke on
I only claimed that you lose 80% of the performance improvement of the
next generation (I mean differentally), not 80% of the whole
performance.

This debate can go on and on. I suppose we both made our points...
Peter

On Oct 31, 12:33 pm, "KJ" <Kevin.Jenni...(a)Unisys.com> wrote:
> > To take an example, and using your numbers, are you suggesting that the
> performance of a Xilinx DDR controller implemented using the Wishbone
> interface would be 80% slower than the functionally identical DDR
> controller that Xilinx has? If so, why is that? If not then what
> point were you trying to make?
>
>
>

From: Peter Alfke on
KJ, You like standards.
We just finished implementing PCIexpress. When I look at the complexity
of that standard, I just cringe. I cannot fathom why one needs so much
stuff to communicate data. But then I am old frugal and basic guy who
believes in simplicity.
Talking about a FIFO, what other standard interface do you want, except
data-in, data-out, 2 clocks, 2 enables, 4 flags and perhaps a reset?
Isn't that about as generic as it can get? Why would Altera do it
differently, except that they don't have a hard-coded 550 MHz one...
:-(
I vote for smarter synthesis tool that interprete your intentions in
the best possible way.
Peter Alfke

On Oct 31, 12:33 pm, "KJ" <Kevin.Jenni...(a)Unisys.com> wrote:
> Peter Alfke wrote:
> > Real progress comes from better integration of popular functions.
> > That's why we now include "hard-coded" FIFO and ECC controllers in the
> > BlockRAM, Ethernet and PCIe controllers, multi-gigabit transceivers,
> > and microprocessors.None of that is precluded, I'm just saying that I haven't heard why it
> could not be accomplished within a standard framework. Why would the
> entity (i.e. the interface) for brand X's FIFO with ECC, Ethernet,
> blah, blah, blah, not use a standard user side interface in addition to
> the external standards? Besides facilitating movement (which is not
> the only concern) it promotes ease of use in the first place.
>
> > Clock control with DCMs and PLLs, as well as
> > configurable 75-ps incremental I/O delays are lower-level examples.I agree, those are good examples of some of the easiest things that
> could have a standardized interface....although I don't think you
> really agree with my reading of what you wrote ;)
>
> > These features increase the value of our FPGAs, but they definitely are
> > not generic.I said standardized not 'generic'. I was discussing the interface to
> that nifty wiz bang item and saying that the interface could be
> standardized, the implementation is free to take as much advantage of
> the part as it wishes.
>
>
>
> > If a user wants to treat our FPGAs in a generic way, so that the design
> > can painlessly be migrated to our competitor, all these powerful,
> > cost-saving and performance-enhancing features (from either X or A)
> > must be avoided. That negates 80% of any progress from generation to
> > generation. Most users might not want to pay that price.My point was to agree on a standard interface for given functionality
> not some dumbed down generic vanilla implementation of that function.
>
> To take an example, and using your numbers, are you suggesting that the
> performance of a Xilinx DDR controller implemented using the Wishbone
> interface would be 80% slower than the functionally identical DDR
> controller that Xilinx has? If so, why is that? If not then what
> point were you trying to make?
>
>
>
> > And remember, standards are nice and necessary for interfacing between
> > chips, but they always lag the "cutting edge" by several years.I don't think any of the FPGA vendors target only the 'cutting edge'
> designs. I'm pretty sure that most of their revenue and profit comes
> from designs that are not 'cutting edge' so that would give you those
> 'several years' to get the standardized IP in place.
>
> > Have
> > you ever attended the bickering at a standards meeting?...Stop bickering so much. The IC guys cooperate and march to the
> drumbeat of the IC roadmap whether they think it is possible or not at
> that time (but also recognizing what the technology hurdles to get
> there are). There is precedent for cooperation in the industry.
>
> > Cutting edge FPGAs will become ever less generic.Again, my point was standardization of the entity of the IP, not
> whether it is 'generic'.
>
> > That's a fact of life, and it helps you build better and less costly
> > systems.But not supported by anything you've said here. Again, my point was
> for a given function, why can't the interface to that component be
> standardized? Provide an example to bolster your point (as I've
> suggested with the earlier comments regarding the Wishbone/Xilinx DDR
> controller example).
>
> KJ
>
> KJ

From: Ray Andraka on
KJ,

This is actually a fairly common usage model for the Xilinx dual port
RAMs. It lets you, for example store two words per clock on one port and
read them one word per clock on the opposite port at perhaps a faster
clock rate. The data width and address width vary inversely so that
there are always 18k or 16K bits in the memory (18K for the widths that
support the parity bit). For example, if you set one port for 36 bit
width, that port has a depth of 512 words. If you then set the other
port for 18 bit width, it has a 1K depth, and the extra address bit (the
extra bits are added at the lsbs) essentially selects the low or high
half of the 36 bit width for access through the 18 bit port. Similarly,
a 9 bit wide port is 2K deep and accesses a 9 bit slice of that 36 bit
word for each access, with the slice selected with the 2 lsbs of the 9
bit wide port's address.

I've found the easiest way to deal with the dual port memories is to
instantiate the primitives. Xilinx has made it far easier with the
virtex 4 which has a common BRAM element for all aspect ratios with
generics on it to define the width. Previously, you needed to
instantiate the specific primitive with the right aspect ratios on each
port. I found it easiest to develop a wrapper for the memory that uses
the width of the address and data to select the BRAM aspect ratio and
instantiate as many as are needed to obtain the data width, that way the
hard work is done just once. This is especially true with the older
style primitives.
From: Jim Granville on
Peter Alfke wrote:

> KJ, You like standards.
> We just finished implementing PCIexpress. When I look at the complexity
> of that standard, I just cringe. I cannot fathom why one needs so much
> stuff to communicate data. But then I am old frugal and basic guy who
> believes in simplicity.

Could this have been made any faster, by relaxing some of the standard ?
(and would that have a cost, like interopability ?)

-jg

From: Peter Alfke on
There is also a way to use the two ports as two completely independent
half-size RAMs, by making sure thet the two ports never overlap their
addressing. The division does not even have to be 50:50 and the widths
can differ, as can clock and enables. There is some room for
creativity...
Peter Alfke

On Oct 31, 6:49 pm, Ray Andraka <r...(a)andraka.com> wrote:
> KJ,
>
> This is actually a fairly common usage model for the Xilinx dual port
> RAMs. It lets you, for example store two words per clock on one port and
> read them one word per clock on the opposite port at perhaps a faster
> clock rate. The data width and address width vary inversely so that
> there are always 18k or 16K bits in the memory (18K for the widths that
> support the parity bit). For example, if you set one port for 36 bit
> width, that port has a depth of 512 words. If you then set the other
> port for 18 bit width, it has a 1K depth, and the extra address bit (the
> extra bits are added at the lsbs) essentially selects the low or high
> half of the 36 bit width for access through the 18 bit port. Similarly,
> a 9 bit wide port is 2K deep and accesses a 9 bit slice of that 36 bit
> word for each access, with the slice selected with the 2 lsbs of the 9
> bit wide port's address.
>
> I've found the easiest way to deal with the dual port memories is to
> instantiate the primitives. Xilinx has made it far easier with the
> virtex 4 which has a common BRAM element for all aspect ratios with
> generics on it to define the width. Previously, you needed to
> instantiate the specific primitive with the right aspect ratios on each
> port. I found it easiest to develop a wrapper for the memory that uses
> the width of the address and data to select the BRAM aspect ratio and
> instantiate as many as are needed to obtain the data width, that way the
> hard work is done just once. This is especially true with the older
> style primitives.