From: chris on
The FFTW source code and design will be a good source but in general
for new hardware design it may not provide too much insight because
it
was designed to be an optimal FFT for current processors (ie Intel,
etc). There is an interesting forward in IEEE SP Magazine from the
last couple months that touches on the subject of the FFTW.

If you are experimenting with new hardware design you have much more
flexibility in your data flow, computation, etc. You will have to
balance resources, size, power consumption, etc. If you do some
searches you will find interesting implementation based on CORDIC,
Quaterion blocks, and others.

Good Luck.
From: markt on
>I found intresting C implementions in FFTW, did you think they have
>required computation and implementation simlicity for my work ?

Hard to say. As Chris noted, if you're developing your own hardware
processor it's a different game. FFTW optimizes itself based on what it
finds to be the fastest for your specific platform. It has a variety of
builtins that are "best" for most of the mainstream processors. I
particularly worked on porting the FFTW code to work with the SIMD
processor in a MIPS-based system on a chip (Broadcom's quad-core 1480). If
you don't have a specific architecture, then FFTW isn't really applicable
in that sense. FFTW has just about every major algorithm in its code-base,
but it is written through OCAML (as I recall), and difficult to decipher
for a port. When I ported the SIMD code to the 1480, I had to actually
take the PowerPC code and substitute all of my assembly primitives, with
some modification in the routines, to get that to work. Of course, after
all that work Code Sourcery comes out with a public port NOW, rather than
18 months ago. ;)

Mark
From: ARH on
On May 5, 8:54 am, "markt" <tak...(a)pericle.com> wrote:
> >I found intresting C implementions in FFTW, did you think they have
> >required computation and implementation simlicity for my work ?
>
> Hard to say.  As Chris noted, if you're developing your own hardware
> processor it's a different game.  FFTW optimizes itself based on what it
> finds to be the fastest for your specific platform.  It has a variety of
> builtins that are "best" for most of the mainstream processors.  I
> particularly worked on porting the FFTW code to work with the SIMD
> processor in a MIPS-based system on a chip (Broadcom's quad-core 1480).  If
> you don't have a specific architecture, then FFTW isn't really applicable
> in that sense.  FFTW has just about every major algorithm in its code-base,
> but it is written through OCAML (as I recall), and difficult to decipher
> for a port.  When I ported the SIMD code to the 1480, I had to actually
> take the PowerPC code and substitute all of my assembly primitives, with
> some modification in the routines, to get that to work.  Of course, after
> all that work Code Sourcery comes out with a public port NOW, rather than
> 18 months ago. ;)
>
> Mark

Hi Mark

Thanks for your reply, seems you done hard but interesting work,
congratulation!

ARH

From: ARH on
Hi Chris

Sorry for my late reply, I am new to FFT and DSP algorithms, so I try
to found more about CORDIC algorithm in wiki and I found that this
algorithm developed to calculate hyperbolic and trigonometric
functions no hardware multiplier is available but I have no limitation
in my design so I want to use multiplier for upgrading matrix
multiplication performance. Regarding to Quaternion I found no related
text in the net, I think it would be an extension of complex
numbers !

I think it would be better for me to forget optimize FFT algorithm,
because these algorithms have complexity which aren’t suitable for
starting a design. So I prefer to start with a very simple FFT
algorithm, would you mind help me to figure out this from jungle of
information in the net ?


Regards
ARH

On May 4, 4:21 pm, chris <chris.fel...(a)gmail.com> wrote:
> The FFTW source code and design will be a good source but in general
> for new hardware design it may not provide too much insight because
> it
> was designed to be an optimal FFT for current processors (ie Intel,
> etc). There is an interesting forward in IEEE SP Magazine from the
> last couple months that touches on the subject of the FFTW.
>
> If you are experimenting with new hardware design you have much more
> flexibility in your data flow, computation, etc. You will have to
> balance resources, size, power consumption, etc. If you do some
> searches you will find interesting implementation based on CORDIC,
> Quaterion blocks, and others.
>
> Good Luck.

From: markt on
>I think it would be better for me to forget optimize FFT algorithm,
>because these algorithms have complexity which aren=92t suitable for
>starting a design. So I prefer to start with a very simple FFT
>algorithm, would you mind help me to figure out this from jungle of
>information in the net ?

There's another thread in the first page or two here that gets in to some
of the concepts involved with optimizing an FFT. Steve Johnson, one of the
authors of FFTW, has made several posts in this regard. The term optimal,
ultimately, needs to be referred to some parameter such as FLOPS or speed.
Fewer FLOPS does not always equal faster speed for a variety of reasons.

The simplest FFT design (sort of) is a standard radix-2 decomposition
assuming you have a power of 2 data set. Every power of 2 in data size
requires another column of radix-2 butterflies, but the butterflies are
easy to construct (the "hard" part is determining the mapping from column
to column).

Mark