Contents
Previous
Next
Fastest Fourier Transform in the West.
Version : 2.1.3
Author(s) : M. Frigo and S. G. Johnson,
License : GPL
Website : http://www.fftw.org/
Disk space required for installation is 811.00 Kb
Summary
FFTW is a comprehensive collection of fast C routines for computing the
discrete Fourier
transform (DFT) in one or more dimensions, of both real and complex
data, and of arbitrary input size. FFTW also includes parallel
transforms for both shared- and distributed-memory
systems. We assume herein that the reader is already familiar with
the properties and uses of the DFT that are relevant to her
application. Otherwise, see e.g. The Fast Fourier Transform by
E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974). Our web
page also has links to FFT-related information online.
FFTW is usually faster (and sometimes much faster) than all other
freely-available Fourier transform programs found on the Net. For
transforms whose size is a power of two, it compares
favorably with the FFT codes in Sun's Performance Library and IBM's
ESSL library, which are targeted at specific machines. Moreover, FFTW's
performance is portable. Indeed, FFTW is
unique in that it automatically adapts itself to your machine, your
cache, the size of your memory, the number of registers, and all the
other factors that normally make it impossible to optimize
a program for more than one machine. An extensive comparison of
FFTW's performance with that of other Fourier transform codes has been
made. The results are available on the Web at the
benchFFT home page.
In order to use FFTW effectively, you need to understand one basic
concept of FFTW's internal structure. FFTW does not used a fixed
algorithm for computing the transform, but it can adapt
the DFT algorithm to details of the underlying hardware in order to
achieve best performance. Hence, the computation of the transform is
split into two phases. First, FFTW's planner is
called, which "learns" the fastest way to compute the transform on
your machine. The planner produces a data structure called a plan that
contains this information. Subsequently, the plan is
passed to FFTW's executor, along with an array of input data. The
executor computes the actual transform, as dictated by the plan. The
plan can be reused as many times as needed. In
typical high-performance applications, many transforms of the same
size are computed, and consequently a relatively-expensive
initialization of this sort is acceptable. On the other hand, if
you need a single transform of a given size, the one-time cost of
the planner becomes significant. For this case, FFTW provides fast
planners based on heuristics or on previously computed
plans.
The pattern of planning/execution applies to all four operation
modes of FFTW, that is, I) one-dimensional complex transforms (FFTW),
II) multi-dimensional complex transforms
(FFTWND), III) one-dimensional transforms of real data (RFFTW), IV)
multi-dimensional transforms of real data (RFFTWND). Each mode comes
with its own planner and executor.
Besides the automatic performance adaptation performed by the
planner, it is also possible for advanced users to customize FFTW for
their special needs. As distributed, FFTW works most
efficiently for arrays whose size can be factored into small primes
(2, 3, 5, and 7), and uses a slower general-purpose routine for other
factors. FFTW, however, comes with a code generator
that can produce fast C programs for any particular array size you
may care about. For example, if you need transforms of size 513 =
19*33, you can customize FFTW to support the factor 19
efficiently.
FFTW can exploit multiple processors if you have them. FFTW comes
with a shared-memory implementation on top of POSIX (and similar)
threads, as well as a distributed-memory
implementation based on MPI. We also provide an experimental
parallel implementation written in Cilk, the superior programming tool
of choice for discriminating hackers (Olin Shivers). (See
the Cilk home page.)
Contents
Previous
Next