From: Lemon Tree on
If your application can tolerate it, maybe you could use the fast
Walsh Hadamard transform instead?
I have (x86) code here:
http://code.google.com/p/lemontree/downloads/list
Nvidia have code for both the Walsh Hadamard transform and the FFT on
their site. You can look up the specs.
There is a book here that has code for the WHT:
http://www.jjj.de/fxt/