From: Frank van Eijkelenburg on
Hi,

I have a custom made PCIe board with a Virtex 5 FPGA on which I
implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
I also implemented simple read/write operations from the PC to the
board (the board responds with completion TLPs). The read/write
operations are working, DMA is not working

The board is inserted in a pc with Windows 7 64 bits platform. An
application allocates virtual memory and passes the memory block to
the driver. The driver locks the memory and converts the virtual
addresses into physical addresses. These physical addresses are
written to the FPGA.

When I start an DMA operation, I can see in chipscope the correct
physical addresses in the TLP header. However, I do not see the
correct values in the allocated memory. What can I do to check where
it is going wrong?

Another question is about the memory request TLPs. What should I use,
32 or 64 bit write requests? Or do I have to check runtime if the
physical memory address is below or above the 4 GB (and use
respectively 32 and 64 bit requests)?


Thanks in advance,

Frank
From: maxascent on
I have done a similar design myself but using Windows 7, 32-bit. I use
32-bit TLPs and have had no problems with the design. BTW I have used
Windriver to generate the device driver.

Jon

---------------------------------------
Posted through http://www.FPGARelated.com
From: Michael S on
On Jul 1, 5:03 pm, Frank van Eijkelenburg <fei.technolut...(a)gmail.com>
wrote:
>
> Another question is about the memory request TLPs. What should I use,
> 32 or 64 bit write requests? Or do I have to check runtime if the
> physical memory address is below or above the 4 GB (and use
> respectively 32 and 64 bit requests)?
>
> Thanks in advance,
>
> Frank

Memory accesses below 4GB have to use 3DW (=32-bit) TLP headers.
4DW TLP headers addressing memory below 4GB are prohibited by PCIe
standard although they would occasionally work on some chipsets, e.g.
on Intel 5000P/5000X series.

From: Charles Gardiner on
Frank van Eijkelenburg schrieb:
> Hi,
>
> I have a custom made PCIe board with a Virtex 5 FPGA on which I
> implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
> I also implemented simple read/write operations from the PC to the
> board (the board responds with completion TLPs). The read/write
> operations are working, DMA is not working
>
> The board is inserted in a pc with Windows 7 64 bits platform. An
> application allocates virtual memory and passes the memory block to
> the driver. The driver locks the memory and converts the virtual
> addresses into physical addresses. These physical addresses are
> written to the FPGA.

How are you doing this? Normally, an application requests a buffer using malloc()
or new() and gets a handle to the driver using CreateFile(). You then use
WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
DeviceIoControl() to initiate a transfer to/from the device. Thats the
application side.

On the driver(kernel) side, I would strongly recommend that you write a KMDF based
driver. Download the windows WDK, all it costs is your email. (You have to log in
over Microsoft Connect, last time I looked). There are lots of examples there,
including for PCI(e) based DMA. To (very quickly) summarise, your driver requests
the scatter/gather list describing the buffers (see
WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starting point)
above and passes these to your hardware one-by-one which then does DMA in or out.
With a call to WdfRequestComplete the buffers are released by the kernel and your
application can reuse them or free them up as required. (This is of course all
considerably more than a days work, by the way.)

You do not have to explicitly lock down the buffer yourself. Windows does this for
you while the I/O request is active. (Read/WriteFile from your app up to
WdfRequestComplete from the driver)

>
> When I start an DMA operation, I can see in chipscope the correct
> physical addresses in the TLP header. However, I do not see the
> correct values in the allocated memory. What can I do to check where
> it is going wrong?
>

In this case, I would first doubt whether the addresses are correct.

> Another question is about the memory request TLPs. What should I use,
> 32 or 64 bit write requests? Or do I have to check runtime if the
> physical memory address is below or above the 4 GB (and use
> respectively 32 and 64 bit requests)?
>

The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a transfer
above 4 GB must use a 4 DWord header. i.e. a four dword header wth address[63:32]
set to zero is invalid.

>
> Thanks in advance,
>
> Frank
From: Frank van Eijkelenburg on
On Jul 2, 2:19 am, Charles Gardiner <charles.gardi...(a)invalid.invalid>
wrote:
> Frank van Eijkelenburg schrieb:
>
> > Hi,
>
> > I have a custom made PCIe board with a Virtex 5 FPGA on which I
> > implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
> > I also implemented simple read/write operations from the PC to the
> > board (the board responds with completion TLPs). The read/write
> > operations are working, DMA is not working
>
> > The board is inserted in a pc with Windows 7 64 bits platform. An
> > application allocates virtual memory and passes the memory block to
> > the driver. The driver locks the memory and converts the virtual
> > addresses into physical addresses. These physical addresses are
> > written to the FPGA.
>
> How are you doing this? Normally, an application requests a buffer using malloc()
> or new() and gets a handle to the driver using CreateFile(). You then use
> WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
> DeviceIoControl() to initiate a transfer to/from  the device. Thats the
> application side.
>
> On the driver(kernel) side, I would strongly recommend that you write a KMDF based
> driver. Download the windows WDK, all it costs is your email. (You have to log in
> over Microsoft Connect, last time I looked). There are lots of examples there,
> including for PCI(e) based DMA. To (very quickly) summarise, your driver requests
> the scatter/gather list describing the buffers (see
> WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starting point)
> above and passes these to your hardware one-by-one which then does DMA in or out.
> With a call to WdfRequestComplete the buffers are released by the kernel and your
> application can reuse them or free them up as required. (This is of course all
> considerably more than a days work, by the way.)
>
> You do not have to explicitly lock down the buffer yourself. Windows does this for
> you while the I/O request is active. (Read/WriteFile from your app up to
> WdfRequestComplete from the driver)
>
>
>
> > When I start an DMA operation, I can see in chipscope the correct
> > physical addresses in the TLP header. However, I do not see the
> > correct values in the allocated memory. What can I do to check where
> > it is going wrong?
>
> In this case, I would first doubt whether the addresses are correct.
>
> > Another question is about the memory request TLPs. What should I use,
> > 32 or 64 bit write requests? Or do I have to check runtime if the
> > physical memory address is below or above the 4 GB (and use
> > respectively 32 and 64 bit requests)?
>
> The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a transfer
> above 4 GB must use a 4 DWord header. i.e. a four dword header wth address[63:32]
> set to zero is invalid.
>
>
>
> > Thanks in advance,
>
> > Frank

The way it works is as follows:
- the application allocates the memory (malloc).
- a pointer to this memory is passed to the driver (custom made
driver).
- the driver creates a scatter-gather list by using the
GetScatterGatherList method from the DMA_ADAPTER object.
- the driver writes each entry of the scatter-gather list (which
contains a physical address and length) to the FPGA.
- the FPGA receives data (though another interface) and writes this
data to the memory of the pc by use of DMA (just generates write
requests).
- after writing the data the FPGA generates an interrupt of PCIe (not
working yet, but we know when the FPGA finished a transaction).

I now understand I have to verify runtime if the physical address is
below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I
will change that in the FPGA and give it a try.

About the addresses, these are correct. We did the following test:
write the virtual memory from the application and read the memory by
using the physical addresses in the driver. In the driver we read what
the application has written.

Any other suggestions?

Frank