in [DSP]

Prev: Control System: Routh-Hurwitz Stability Criterion with GUI MATLAB V3.2
Next: Some questions about radix-4 FFT algorithm using fixed-point arithmetic
From: pallav on 8 Feb 2010 02:08 On Feb 8, 12:32 am, kevin <kevinjmc... (a)netscape.net> wrote:> On Feb 7, 11:28 pm, kevin <kevinjmc... (a)netscape.net> wrote:> > > If I recall, a output of a radix 2 butterfly can achieve a maximum > > value of 2.1414... , > > I meant to write: a maximum of 2.414... > > Dang! Now it's even later than before. > > Kevin McGee Hi, Thank you for your response. I will search Google / IEEE Transactions and try to find some papers on this fixed-point implementation. Regarding my hardware implementation, for starters, I was just thinking of a fully-parallel architecture where all the data comes in parallel, block FFT is computed, and then, all the outputs are available (also in parallel). Obviously, this will have a huge number of I/Os but its very simple. Alternatively, the data could come in sequentially and I just buffer it. After 64 clock cycles, I start the FFT computation for a number of clock cycles, and then a transpose stage where one point is available every clock cycle (another 64 cycles). There are probably more efficient ways but I'd like to keep it simple initially. Again, I'll search a few papers to search for some simple architectures. Understanding the radix-4 FFT is not a problem as I seem to have a good grasp on that. I will have to search in some books for the maximum value an output can take in a radix-4 FFT as that will determine the number of bits to use for the integer part. Just out of curiosity: I notice that Altera/Xilinx provide FFT IP generators and their interface seems to have one port for data in and one port for data out. Now say there are a large number of points (maybe 1024?). So does the computation start only when all the data inputs have been stored/buffered internally (and thus, the block FFT computation can start)? Thank you again for your time/help. Kind regards.
From: kevin on 8 Feb 2010 20:47
John Monro wrote: > I say 2.828 Maybe we can average these values :-) Yep, 2.8284... is the maximum for a DIF algorithm butterfly, and 2.4142... is the maximum for a DIT butterfly. Late last night, after reading what I posted, I realized that the number in my first post was wrong and quickly did the calculation using a DIT butterfly. I forgot about the DIF case. Thats why I sometimes look this kind of thing up, because I remember a letter to the editor of EDN magazine many years back that mentioned a number of hazards when doing FFT arithmetic. I actually found my copy of it today! (naturally, I found it in my folder for EDN, and not in the one for FFT error analysis). FFT algorithm contains overflow hazard by Dr. J. W. Locke, plus response by J. Niehaus, EDN magazine, Feb. 23, 1984, pps. 29, 30 and 34. The above contains some further items about maximum bit growth and scaling issues in stages of an FFT. John, it turns out that we were both right! Thanks for pointing out the DIF case. As for pallav, here are a few of the papers about error analysis you might try to obtain: Tran-Thong and Bede Liu, Fixed-Point Fast Fourier Transform Error Analysis, IEEE Trans. on Acoustics, Speech and Signal Proc., vol. ASSP-24, no. 6, Dec. 1976, pps. 563-573. E. O. Brigham and L. R. Cecchini, A Nomogram for Determining FF System Dynamic Range, IEEE Intl Conf. on Acoustics, Speech and Signal Proc., pps. 623-627. P. D. Welch, A Fixed-Point Fast Fourier Transform Error Analysis, IEEE Trans. on Audio and Electroacoustics, vol. AU-17, no. 2, June 1969, pps. 151-157. And a book: L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Sect. 10.5, pps. 587-594, Introduction to Quantization Effects in FFT Algorithms, Prentice-Hall, 1975. There are dozens more, but I dont want to spend too much time typing them in because there are other things to add (and Im a slow typist). As I mentioned yesterday, the butterflies in a conventional FFT are sequential in/sequential out. So if youre simulating an FFT butterfly in MATLAB, you have to be aware of the fact that your result has been bit reversed somewhere to get all the outputs in sequence. If youre trying to do things in hardware, youre going to have to mimic that process. But most FFT architectures do bit reversal at the front or back end of the algorithm (and some that do it internally because of the way buffers feed adders/subtractors and multipliers). How you implement something in hardware or software can be very different. > Just out of curiosity: I notice that Altera/Xilinx provide FFT IP > generators and their interface seems to have one port for data in and > one port for data out. Now say there are a large number of points > (maybe 1024?). So does the computation start only when all the data > inputs have been stored/buffered internally (and thus, the block FFT > computation can start)? FFT architectures are made up of memory and arithmetic units (AUs) plus control. They can be broadly categorized into: single AU, single column AUs (does one stage at a time), pipelines (one AU per column) and systolic. The pipelines are particularly good at processing the most data in the least amount of time with a reasonable amount of hardware resources. Basically, they overlap data transfer with calculation. Note that for a sequential-in radix-2 graph, you can start calculating the butterflies of the first column after receiving the first N/2+1 input points. Outputs are fed immediately to a buffer for the second stage. When that buffer gets enough data, the second stage AU starts calculating, etc. I cant tell you specifically how the Altera/Xilinx implement their FFT cores without looking at the specs. I just scanned Alteras site, and one of the figures shows the even/odd graph that is typical of a bit reversed in/sequential out algorithm. So they probably load all the data into a buffer first. The bit reversal at the front end can easily be accomplished by proper loading/unloading of the buffer. But I dont know if the data is processed internally by a single AU or multiple AUs per column, or pipeline. Id have to look at it further. The Rabiner and Gold book is a good place to start for pipeline architectures. You might also refer to any one of the hundreds of papers and patents available to learn about the many variations of FFT processors. For example, you might want to read section 2 (description of the prior art) in the PDF file of the following patent: http://www.freepatentsonline.com/4534009.pdf (Hmmm. I think I know that guy). The architecture is the only one I know of that achieves 100 per cent AU efficiency with the minimum amount of memory. And an 8 point version of it would make for a good 8 point butterfly processor. But maybe what you want to do requires something different. Youll have to explore a lot of the prior art to get up to speed on it (and theres quite a bit of published material). Kevin McGee |