From: Eric on
Hi All,

I've recently published a paper exploring how to implement memories
with multiple read and write ports on existing FPGAs. I figured it
might be of interest to some.

Summary, paper, slides, and example code are here:
http://www.eecg.utoronto.ca/~laforest/multiport/index.html

There are no patents or other additional IP encumbrances on the code.
If you have any comments or other feedback, I'd like to hear it.

Eric LaForest
PhD student, ECE Dept.
University of Toronto
http://www.eecg.utoronto.ca/~laforest/
From: Eric on
On Apr 21, 6:26 am, John_H <newsgr...(a)johnhandwork.com> wrote:
> Could you mention here or on your page what you mean by
> "multipumping?"  If you mean time multiplexed access, I can see why
> multipumping is bad.  [The "pure logic" approach also isn't obvious.]

Yes, multipumping is time-multiplexing. It's not entirely bad, as
there may be a speed margin leftover that you can trade for area using
multipumping. Also, it is useful if you have few ports or low speed
requirements.

Pure logic refers simply to using only the reconfigurable fabric of
the FPGA to implement the memory. It's not a very scalable
solution. :)

> Do you update the LVT in the same way I might update the RAM value in
> a many-write BlockRAM?

No. We've had several independent mentions of using XOR, but we hadn't
heard of it at the time. We'll be looking at it in the future. The LVT
is implemented in pure logic and has multiple read and write ports
which can all work simultaneously. It remains practical because it is
very narrow (log2(# of write ports) instead of full data word width).

> Aside from wide data, however, I don't see (without going into the
> attachments on that page) how updating the LVT is any different than
> updating the memory in the first place.

The LVT manages a bunch of Block RAMs with only one write and one read
port, making them all behave as a single multiported memory. The LVT
simply keeps track of which port last wrote to each address. Since the
actual data is stored in Block RAMs, the end result is faster and more
area efficient than other approaches.

Please let me know if you have more questions.

Eric
From: rickman on
On Apr 22, 10:52 am, Eric <eric.lafor...(a)gmail.com> wrote:
> On Apr 21, 6:26 am, John_H <newsgr...(a)johnhandwork.com> wrote:
>
> > Could you mention here or on your page what you mean by
> > "multipumping?" If you mean time multiplexed access, I can see why
> > multipumping is bad. [The "pure logic" approach also isn't obvious.]
>
> Yes, multipumping is time-multiplexing. It's not entirely bad, as
> there may be a speed margin leftover that you can trade for area using
> multipumping. Also, it is useful if you have few ports or low speed
> requirements.
>
> Pure logic refers simply to using only the reconfigurable fabric of
> the FPGA to implement the memory. It's not a very scalable
> solution. :)
>
> > Do you update the LVT in the same way I might update the RAM value in
> > a many-write BlockRAM?
>
> No. We've had several independent mentions of using XOR, but we hadn't
> heard of it at the time. We'll be looking at it in the future. The LVT
> is implemented in pure logic and has multiple read and write ports
> which can all work simultaneously. It remains practical because it is
> very narrow (log2(# of write ports) instead of full data word width).
>
> > Aside from wide data, however, I don't see (without going into the
> > attachments on that page) how updating the LVT is any different than
> > updating the memory in the first place.
>
> The LVT manages a bunch of Block RAMs with only one write and one read
> port, making them all behave as a single multiported memory. The LVT
> simply keeps track of which port last wrote to each address. Since the
> actual data is stored in Block RAMs, the end result is faster and more
> area efficient than other approaches.
>
> Please let me know if you have more questions.
>
> Eric

I guess I don't understand what you are accomplishing with this.
Block rams in FPGAs are almost always multiported. Maybe not N way
ported, but you assume they are single ported when they are dual
ported.

Can you give a general overview of what you are doing without using
jargon? I took a look and didn't get it at first glance.

Rick
From: Eric on
On Apr 22, 12:36 pm, rickman <gnu...(a)gmail.com> wrote:
> I guess I don't understand what you are accomplishing with this.
> Block rams in FPGAs are almost always multiported.  Maybe not N way
> ported, but you assume they are single ported when they are dual
> ported.

But what if you want more ports, say 2-write/4-read, without wait
states?
I assume them to be "simply dual-ported", which means one write port
and one read port, both operating concurrently. It is also possible to
run them in "true dual port" mode, where each port can either read or
write in a cycle. Some of the designs in the paper do that.

> Can you give a general overview of what you are doing without using
> jargon?  I took a look and didn't get it at first glance.

OK. Let me try:

Assume a big, apparently multiported memory of some given capacity and
number of ports. Inside it, I use a small multiported memory
implemented using only the fabric of an FPGA, which stores only the
number of the write port which wrote last to a given address. Thus
this small memory is of the same depth as the whole memory, but much
narrower, hence it scales better.

When you read at a given address from the big memory, internally you
use that address to look up which write port wrote there last, and use
that information to steer the read to the correct internal memory bank
which will hold the data you want. These banks are built-up of
multiple Block RAMs so as to have one write port each, and as many
read ports as the big memory appears to have.

The net result is a memory which appears to have multiple read and
write ports which can all work simultaneously, but which leaves the
bulk of the storage to Block RAMs instead of the FPGA fabric, which
makes for better speed and smaller area.

Does that help?

Eric
From: John_H on
On Apr 22, 1:55 pm, Eric <eric.lafor...(a)gmail.com> wrote:
> On Apr 22, 12:36 pm, rickman <gnu...(a)gmail.com> wrote:
>
> > I guess I don't understand what you are accomplishing with this.
> > Block rams in FPGAs are almost always multiported.  Maybe not N way
> > ported, but you assume they are single ported when they are dual
> > ported.
>
> But what if you want more ports, say 2-write/4-read, without wait
> states?
> I assume them to be "simply dual-ported", which means one write port
> and one read port, both operating concurrently. It is also possible to
> run them in "true dual port" mode, where each port can either read or
> write in a cycle. Some of the designs in the paper do that.
>
> > Can you give a general overview of what you are doing without using
> > jargon?  I took a look and didn't get it at first glance.
>
> OK. Let me try:
>
> Assume a big, apparently multiported memory of some given capacity and
> number of ports. Inside it, I use a small multiported memory
> implemented using only the fabric of an FPGA, which stores only the
> number of the write port which wrote last to a given address. Thus
> this small memory is of the same depth as the whole memory, but much
> narrower, hence it scales better.
>
> When you read at a given address from the big memory, internally you
> use that address to look up which write port wrote there last, and use
> that information to steer the read to the correct internal memory bank
> which will hold the data you want. These banks are built-up of
> multiple Block RAMs so as to have one write port each, and as many
> read ports as the big memory appears to have.
>
> The net result is a memory which appears to have multiple read and
> write ports which can all work simultaneously, but which leaves the
> bulk of the storage to Block RAMs instead of the FPGA fabric, which
> makes for better speed and smaller area.
>
> Does that help?
>
> Eric

I appreciate the elaboration here in the newsgroup.

The "true dual port" nature of the BlockRAMs allows one independent
address on each of the two ports with a separate write enable for each
port. The behavior of the BlockRAM can be modified to provide read
data based on the new write data, old data, or no change in the read
data value from last cycle (particularly helpful for multi-pumping).

For an M write, N read memory, your approach appears to need M x (N+1)
memories since you can have M writes all happening at the same time N
accesses are made to the same "most recently written" memory. Please
correct me if I'm wrong. This is the same number of memories required
with the XOR approach but without the LVT overhead. The time delay in
reading the LVT and multiplexing the memories feels like it would be
cumbersome. While this might not add "wait states" it appears the
system would not be able to run terribly quickly. XORs are pretty
quick.

There are always more ways to approach a problem that any one group
can come up with. Kudos on your effort to bring a better approach to
a tough system level issue for difficult designs.