From: pallav on
On Feb 8, 12:32 am, kevin <kevinjmc...(a)netscape.net> wrote:
> On Feb 7, 11:28 pm, kevin <kevinjmc...(a)netscape.net> wrote:
>
> > If I recall, a output of a radix 2 butterfly can achieve a maximum
> > value of 2.1414... ,
>
> I meant to write: a maximum of 2.414...
>
> Dang!  Now it's even later than before.
>
> Kevin McGee

Hi,

Thank you for your response. I will search Google / IEEE Transactions
and try to find some papers on this fixed-point implementation.
Regarding my hardware implementation, for starters, I was just
thinking of a fully-parallel architecture where all the data comes in
parallel, block FFT is computed, and then, all the outputs are
available (also in parallel). Obviously, this will have a huge number
of I/Os but its very simple. Alternatively, the data could come in
sequentially and I just buffer it. After 64 clock cycles, I start the
FFT computation for a number of clock cycles, and then a transpose
stage where one point is available every clock cycle (another 64
cycles). There are probably more efficient ways but I'd like to keep
it simple initially. Again, I'll search a few papers to search for
some simple architectures.

Understanding the radix-4 FFT is not a problem as I seem to have a
good grasp on that. I will have to search in some books for the
maximum value an output can take in a radix-4 FFT as that will
determine the number of bits to use for the integer part.

Just out of curiosity: I notice that Altera/Xilinx provide FFT IP
generators and their interface seems to have one port for data in and
one port for data out. Now say there are a large number of points
(maybe 1024?). So does the computation start only when all the data
inputs have been stored/buffered internally (and thus, the block FFT
computation can start)?

Thank you again for your time/help.

Kind regards.
From: kevin on

John Monro wrote:

> I say 2.828 Maybe we can average these values :-)

Yep, 2.8284... is the maximum for a DIF algorithm butterfly, and
2.4142... is the maximum for a DIT butterfly. Late last night, after
reading what I posted, I realized that the number in my first post was
wrong and quickly did the calculation using a DIT butterfly. I forgot
about the DIF case. That’s why I sometimes look this kind of thing
up, because I remember a letter to the editor of EDN magazine many
years back that mentioned a number of hazards when doing FFT
arithmetic. I actually found my copy of it today! (naturally, I found
it in my folder for EDN, and not in the one for ‘FFT error analysis’).

“FFT algorithm contains overflow hazard” by Dr. J. W. Locke, plus
response by J. Niehaus, EDN magazine, Feb. 23, 1984, pps. 29, 30 and
34.

The above contains some further items about maximum bit growth and
scaling issues in stages of an FFT.

John, it turns out that we were both right! Thanks for pointing out
the DIF case.

As for pallav, here are a few of the papers about error analysis you
might try to obtain:

Tran-Thong and Bede Liu, “Fixed-Point Fast Fourier Transform Error
Analysis,” IEEE Trans. on Acoustics, Speech and Signal Proc., vol.
ASSP-24, no. 6, Dec. 1976, pps. 563-573.

E. O. Brigham and L. R. Cecchini, “A Nomogram for Determining FF
System Dynamic Range,” IEEE Int’l Conf. on Acoustics, Speech and
Signal Proc., pps. 623-627.

P. D. Welch, “A Fixed-Point Fast Fourier Transform Error Analysis,”
IEEE Trans. on Audio and Electroacoustics, vol. AU-17, no. 2, June
1969, pps. 151-157.

And a book:

L. R. Rabiner and B. Gold, Theory and Application of Digital Signal
Processing, Sect. 10.5, pps. 587-594, Introduction to Quantization
Effects in FFT Algorithms, Prentice-Hall, 1975.

There are dozens more, but I don’t want to spend too much time typing
them in because there are other things to add (and I’m a slow
typist). As I mentioned yesterday, the butterflies in a conventional
FFT are sequential in/sequential out. So if you’re simulating an FFT
butterfly in MATLAB, you have to be aware of the fact that your result
has been bit reversed somewhere to get all the outputs in sequence.
If you’re trying to do things in hardware, you’re going to have to
mimic that process. But most FFT architectures do bit reversal at the
front or back end of the algorithm (and some that do it internally
because of the way buffers feed adders/subtractors and multipliers).
How you implement something in hardware or software can be very
different.

> Just out of curiosity: I notice that Altera/Xilinx provide FFT IP
> generators and their interface seems to have one port for data in and
> one port for data out. Now say there are a large number of points
> (maybe 1024?). So does the computation start only when all the data
> inputs have been stored/buffered internally (and thus, the block FFT
> computation can start)?

FFT architectures are made up of memory and arithmetic units (AU’s)
plus control. They can be broadly categorized into: single AU, single
column AU’s (does one stage at a time), pipelines (one AU per column)
and systolic. The pipelines are particularly good at processing the
most data in the least amount of time with a reasonable amount of
hardware resources. Basically, they overlap data transfer with
calculation. Note that for a sequential-in radix-2 graph, you can
start calculating the butterflies of the first column after receiving
the first N/2+1 input points. Outputs are fed immediately to a buffer
for the second stage. When that buffer gets enough data, the second
stage AU starts calculating, etc.

I can’t tell you specifically how the Altera/Xilinx implement their
FFT cores without looking at the specs. I just scanned Altera’s site,
and one of the figures shows the even/odd graph that is typical of a
bit reversed in/sequential out algorithm. So they probably load all
the data into a buffer first. The bit reversal at the front end can
easily be accomplished by proper loading/unloading of the buffer. But
I don’t know if the data is processed internally by a single AU or
multiple AU’s per column, or pipeline. I’d have to look at it
further.

The Rabiner and Gold book is a good place to start for pipeline
architectures. You might also refer to any one of the hundreds of
papers and patents available to learn about the many variations of FFT
processors. For example, you might want to read section 2
(description of the prior art) in the PDF file of the following
patent:

http://www.freepatentsonline.com/4534009.pdf

(Hmmm. I think I know that guy). The architecture is the only one I
know of that achieves 100 per cent AU efficiency with the minimum
amount of memory. And an 8 point version of it would make for a good
8 point butterfly processor.

But maybe what you want to do requires something different. You’ll
have to explore a lot of the prior art to get up to speed on it (and
there‘s quite a bit of published material).

Kevin McGee