|
From: Nico Coesel on 13 Apr 2008 16:54 John Larkin <jjlarkin(a)highNOTlandTHIStechnologyPART.com> wrote: >Hi, > >I'm working on a proposal to design a box that will control a >scientific gadget. Our box will output frequency sweeps, arbitrary >waveforms, a couple of dozen voltages that can be changed/ramped per >user desires, and some discrete logic levels and triggers. > >One architecture would pack an Intel-cpu SBC and a custom board in a >2U rack box. The SBC would talk gigabit ethernet to the customer's >system and PCI to our board. > >Something like this, maybe: > >http://us.kontron.com/index.php?id=226&cat=527&productid=1726 > >Our board would have a PCI interface driving a biggish FIFO, say 8k >deep by 48 bits wide, inside an FPGA. A simple state machine/latch/mux >thing repacks the 32-bit pci transfers into the input of the 48-bit >wide fifo. The output side of the FIFO would be driving a fairly >simple state machine; each fifo word has an opcode field and a data >field, with different opcodes feeding various devices connected to the >physics... dds synthesizers, ttl outputs, whatever. The state machine >that unloads the fifo would run at 128 MHz, but one opcode is WAIT, so >we can slow down operations to match the realtime needs of the >experiment and reduce the average fifo feed rate. > >OK, we finally get to a question: If we run some flavor of Linux on >the SBC, what's a good strategy for keeping the fifo loaded? Assuming >that we have the recipe for an entire experimental shot in program >ram, some tens of megabytes maybe, we could... > >1. Have the fifo logic interrupt the cpu when the fifo is, say, half >empty. The isr would compute how empty the fifo actually is at that >instant and set up a short dma transfer to top it off. > >2. A task (or isr) would be run periodically, a thousand times per >second might work, and it would be responsible for topping off the >fifo, either dma or maybe just poking in the data in a loop. > >3. Best, if possible: set up a single DMA transfer to do the entire >shot. That involves a dma controller that understands that the target >is sometimes busy, and retries after getting bounced. I know the pci >bus has hooks for split transfers, but I don't know if standard >Intel-type dma controllers can work in this mode. With PCI there is no DMA like DMA used to be like. A lot of people get confused here. PCI is about pushing data into memory area's in a fast way. The idea behind PCI is that you setup a transfer from one memory area to another and be told when the transfer is ready. >4. If it's a dual-core cpu, is it hard (under Linux) to assign one cpu >to just do the fifo transfers? > >5. Other ideas? Yes. Make the card a PCI master. You can prepare a buffer, lock the buffer for PCI access, tell the card where to fetch the data and off it goes. An interrupt when the buffer is nearly done so the driver can prepare a new buffer is what it takes to feed the next buffer into the card. PCI is designed to do burst transfers. If you don't use burst transfers, then the bandwidth will decrease dramatically, worst, the CPU will have to wait for each transfer to finish which consumes huge amounts of CPU cycles. -- Programmeren in Almere? E-mail naar nico(a)nctdevpuntnl (punt=.)
From: Joel Koltner on 13 Apr 2008 19:25 John, "John Larkin" <jjlarkin(a)highNOTlandTHIStechnologyPART.com> wrote in message news:6t1204lakho2cj19kit5ub50r5p96jg7cg(a)4ax.com... > We could buy an FPGA pci soft core (or use one of the public ones) or > even just use a PLX chip to handshake the PCI transactions for the > fpga. FYI, I've used the old PLX9054 (before PCI Express took over the word), and it was a *very* nice chip. The board was, essentially, a frame grabber with 4GB of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so that a camera looking at multiple logical "windows" could have each window appear as a contiguous stream of pixels) which fed the DMA engine in the PLX9054. From the end-user's perspective then, what would happen would be: 1) User would request a particular frame buffer, that would already have been set up such that on the "local bus" (the address/data bus connecting the PLX9054 and the FPGA) sequential addresses would grab the correct pixels. The user would want that frame buffer transferred into a contiguous buffer in their own user-mode memory space. 2) The device driver for the frame grabber would ask Windows for all the *physical* addresses of that user's frame buffer, since of course in many cases Window had run off and used a large number of discontinuous physical memory (pages) to create the user's (virtual) contiguous buffer. 3) For the benefit of the PLX9054, the device driver builds a "scatter-gather" list in the PC's memory, where each list entry just contains information such as the number of bytes to transfer, the physical address to transfer to, the local bus address to transfer from, and whether or not this is the last entry in the list. 4) The device driver writes to the appropriate control registers in the PLX9054... and it does the rest! Poof! (An interrupt was generated when it finished.) In other words, the PLX9054 would start walking through the scatter-gather list, automatically creating read requests on the local bus and write requests on the PCI bus as needed, keeping its own internal FIFOs full (it had some modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking the write requests into multiple pieces as needed to keep the PCI bus protocol happy. On quality motherboards, we got ~80Mbps, which was considered pretty decent given the 33MHz/32 bit PCI bus architecture of the day. It was really pretty impressive. The only caveat was that it couldn't transfer more than 16MB or thereabouts in one complete setup, so in software we just broke apart any larger transfers into multiple 16MB transfers (since transferring 16MB took about 200ms anyway, the additional overhead of some us setting up the next transfer was negligible). I imagine the sequence of steps above is quite similar in Linux. Although I've never written a Linux device driver, I've been told that they're actually simpler in many ways that Windows device drivers are. If you end up using Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy who's going to write the device driver to the week-long classes by, e.g., OSR to learn how to do so. My main point here is that going with a chip such as those from PLX gives you one heck of a lot of power that would otherwise take a LOT of time and effort to implement yourself. Although for a high-volume project it probably makes sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big believer in using someone else's "all in one" IC. ---Joel
From: Hal Murray on 13 Apr 2008 19:41 >One architecture would pack an Intel-cpu SBC and a custom board in a >2U rack box. The SBC would talk gigabit ethernet to the customer's >system and PCI to our board. > >Something like this, maybe: > >http://us.kontron.com/index.php?id=226&cat=527&productid=1726 Several odds and ends... There are several Linux distributions targeted at running without a hard disk. That avoids the heat, space, and the unreliability of a hard disk. Here is one. There are others. http://www.linuxonastick.com/ Almost everything gets copied to ram at boot time. /etc is still on disk. Maybe a few others. If you want files preserved over booting you have to think about it. There are Flash disk modules that plug into 40/44 pin IDE sockets. (no ribbon cables) Works well with above. The 40 pin versions need power, typically from an IDE connector. Google for >disk on module<. Modern FPGAs don't get along with 5V PCI. You can save yourself a pile of kludgery if your target is 3V PCI. I think 66 MHz PCI is 3V. The board above is 5V. If your box has room for an old/big CD (rather than the modern thin ones), you can get LCD modules that will fit in that slot. That lets you display the MAC address (for use with BOOTP) or key in an IP Address to get your box off the ground. After that you can use ssh/web or whatever. No keyboard or display required at all. (They might be handy for debugging, but ssh generally works fine for me.) -- These are my opinions, not necessarily my employer's. I hate spam.
From: John Larkin on 13 Apr 2008 22:45 On Sun, 13 Apr 2008 16:25:55 -0700, "Joel Koltner" <zapwireDASHgroups(a)yahoo.com> wrote: >John, > >"John Larkin" <jjlarkin(a)highNOTlandTHIStechnologyPART.com> wrote in message >news:6t1204lakho2cj19kit5ub50r5p96jg7cg(a)4ax.com... >> We could buy an FPGA pci soft core (or use one of the public ones) or >> even just use a PLX chip to handshake the PCI transactions for the >> fpga. > >FYI, I've used the old PLX9054 (before PCI Express took over the word), and it >was a *very* nice chip. The board was, essentially, a frame grabber with 4GB >of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so >that a camera looking at multiple logical "windows" could have each window >appear as a contiguous stream of pixels) which fed the DMA engine in the >PLX9054. From the end-user's perspective then, what would happen would be: > >1) User would request a particular frame buffer, that would already have been >set up such that on the "local bus" (the address/data bus connecting the >PLX9054 and the FPGA) sequential addresses would grab the correct pixels. The >user would want that frame buffer transferred into a contiguous buffer in >their own user-mode memory space. >2) The device driver for the frame grabber would ask Windows for all the >*physical* addresses of that user's frame buffer, since of course in many >cases Window had run off and used a large number of discontinuous physical >memory (pages) to create the user's (virtual) contiguous buffer. >3) For the benefit of the PLX9054, the device driver builds a "scatter-gather" >list in the PC's memory, where each list entry just contains information such >as the number of bytes to transfer, the physical address to transfer to, the >local bus address to transfer from, and whether or not this is the last entry >in the list. >4) The device driver writes to the appropriate control registers in the >PLX9054... and it does the rest! Poof! (An interrupt was generated when it >finished.) > >In other words, the PLX9054 would start walking through the scatter-gather >list, automatically creating read requests on the local bus and write requests >on the PCI bus as needed, keeping its own internal FIFOs full (it had some >modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking >the write requests into multiple pieces as needed to keep the PCI bus protocol >happy. On quality motherboards, we got ~80Mbps, which was considered pretty >decent given the 33MHz/32 bit PCI bus architecture of the day. > >It was really pretty impressive. The only caveat was that it couldn't >transfer more than 16MB or thereabouts in one complete setup, so in software >we just broke apart any larger transfers into multiple 16MB transfers (since >transferring 16MB took about 200ms anyway, the additional overhead of some us >setting up the next transfer was negligible). > >I imagine the sequence of steps above is quite similar in Linux. Although >I've never written a Linux device driver, I've been told that they're actually >simpler in many ways that Windows device drivers are. If you end up using >Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy >who's going to write the device driver to the week-long classes by, e.g., OSR >to learn how to do so. > >My main point here is that going with a chip such as those from PLX gives you >one heck of a lot of power that would otherwise take a LOT of time and effort >to implement yourself. Although for a high-volume project it probably makes >sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big >believer in using someone else's "all in one" IC. > >---Joel > Yup, I'm leaning towards using a PLX chip as the PCI interface. I didn't know they were that smart! I suspect we can persuade Linux and our application to make the shot program (the opcodes we poke into the fpga FIFO) physically contiguous in real memory. Thanks John
From: krw on 13 Apr 2008 22:58
In article <r4h5045s5dlicuohipc5b9l1nrpkqjc7cc(a)4ax.com>, jjlarkin(a)highNOTlandTHIStechnologyPART.com says... > On Sun, 13 Apr 2008 16:25:55 -0700, "Joel Koltner" > <zapwireDASHgroups(a)yahoo.com> wrote: > > >John, > > > >"John Larkin" <jjlarkin(a)highNOTlandTHIStechnologyPART.com> wrote in message > >news:6t1204lakho2cj19kit5ub50r5p96jg7cg(a)4ax.com... > >> We could buy an FPGA pci soft core (or use one of the public ones) or > >> even just use a PLX chip to handshake the PCI transactions for the > >> fpga. > > > >FYI, I've used the old PLX9054 (before PCI Express took over the word), and it > >was a *very* nice chip. The board was, essentially, a frame grabber with 4GB > >of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so > >that a camera looking at multiple logical "windows" could have each window > >appear as a contiguous stream of pixels) which fed the DMA engine in the > >PLX9054. From the end-user's perspective then, what would happen would be: > > > >1) User would request a particular frame buffer, that would already have been > >set up such that on the "local bus" (the address/data bus connecting the > >PLX9054 and the FPGA) sequential addresses would grab the correct pixels. The > >user would want that frame buffer transferred into a contiguous buffer in > >their own user-mode memory space. > >2) The device driver for the frame grabber would ask Windows for all the > >*physical* addresses of that user's frame buffer, since of course in many > >cases Window had run off and used a large number of discontinuous physical > >memory (pages) to create the user's (virtual) contiguous buffer. > >3) For the benefit of the PLX9054, the device driver builds a "scatter-gather" > >list in the PC's memory, where each list entry just contains information such > >as the number of bytes to transfer, the physical address to transfer to, the > >local bus address to transfer from, and whether or not this is the last entry > >in the list. > >4) The device driver writes to the appropriate control registers in the > >PLX9054... and it does the rest! Poof! (An interrupt was generated when it > >finished.) > > > >In other words, the PLX9054 would start walking through the scatter-gather > >list, automatically creating read requests on the local bus and write requests > >on the PCI bus as needed, keeping its own internal FIFOs full (it had some > >modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking > >the write requests into multiple pieces as needed to keep the PCI bus protocol > >happy. On quality motherboards, we got ~80Mbps, which was considered pretty > >decent given the 33MHz/32 bit PCI bus architecture of the day. > > > >It was really pretty impressive. The only caveat was that it couldn't > >transfer more than 16MB or thereabouts in one complete setup, so in software > >we just broke apart any larger transfers into multiple 16MB transfers (since > >transferring 16MB took about 200ms anyway, the additional overhead of some us > >setting up the next transfer was negligible). > > > >I imagine the sequence of steps above is quite similar in Linux. Although > >I've never written a Linux device driver, I've been told that they're actually > >simpler in many ways that Windows device drivers are. If you end up using > >Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy > >who's going to write the device driver to the week-long classes by, e.g., OSR > >to learn how to do so. > > > >My main point here is that going with a chip such as those from PLX gives you > >one heck of a lot of power that would otherwise take a LOT of time and effort > >to implement yourself. Although for a high-volume project it probably makes > >sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big > >believer in using someone else's "all in one" IC. > > > >---Joel > > > > Yup, I'm leaning towards using a PLX chip as the PCI interface. I > didn't know they were that smart! They'll save you a TON of work. PCI isn't easy, though PLX makes it (relatively) easy. I also highly recommend the MindShare books as reference. > I suspect we can persuade Linux and our application to make the shot > program (the opcodes we poke into the fpga FIFO) physically contiguous > in real memory. -- Keith |