This floating point complex FFT package is in C, with assembler assist for especially fast execution on the Intel Pentium. The core of the Pentium assembler assist is a radix 4 FFT routine that has been heavily hand-optimized for the Pentium's floating point pipeline.
This package requires the GCC (Gnu C Compiler) and assembler (gas). Both ELF and a.out formats are supported; ifdefs in the asm code automatically take care of the different external symbol name formats.
Usage notes:
0. Before compiling, set the variable MAXPOINTS in fft.h to the largest number of points you will need in your FFTs. This dimensions the internal twiddle factor array.
1. At run time, you must call the function fftinit() before doing your first FFT. This computes the twiddle factors. You need only do this once.
2. The user entry point names are of the form fftn(x), where n is a power of 4 (e.g., 4, 16, 64, 256, etc) and x is an array of complex variables of appropriate size. The complex data type is defined in fft.h (which you must include in your program). The results are computed "in place", i.e., they overwrite the input array.
3. The complex arguments are in single-precision floating point. Even though the Pentium can perform double-precision operations at the same speed, I chose single-precision to halve cache memory requirements. This makes the code considerably faster on larger buffers while keeping enough signal resolution for most applications.
4. Note that the FFT routines leave their frequency components in bit-reversed order. If you want the components in correct order, you must call fftswap() immediately after the fft. I made this a separate function to speed up applications like carrier search (SETI, etc) that need not operate on in-order frequency components.
The test programs ffttime and ffttest are provided as demos of how to call the FFT functions. The ffttest program initializes a buffer to a sine wave, takes the FFT, reorders the frequency components into the correct order and prints them. The ffttime program repeatedly calls the FFT function (without swapping) on a single buffer as a timing test. (Invoke this one with the "time" command as a prefix.)
Speed test: on a P133 laptop, the Pentium version of the code does a 256-complex-point floating point FFT in about 216 microseconds. The pure C version (no assembler speedups) compiled with the gcc -O4 and -ffast-math options takes about 351 microseconds.
October 17, 1997
Phil Karn, KA9Q
email the Webmaster | entire website copyright © The SETI League, Inc. this page last updated 23 November 2002 |
Top of Page |