Zoom FFT

Basic principle of the Zoom FFT

The zoom FFT (Fast Fourier Transform) is a signal processing technique used to analyse a portion of a spectrum at high resolution. Fig. 1a shows the spectrum of a real signal, with the region of interest shaded. The steps to apply the zoom FFT to this region are as follows:

Frequency translate to shift the frequency range of interest down to near 0 Hz (DC), as shown is Fig. 1b.
Low pass filter to prevent aliasing when subsequently sampled at a lower sample rate, see Fig 1c.
Re-sample at a lower rate.
FFT the re-sampled data. Multiple blocks of data are needed to have an FFT of the same length. The resulting spectrum will now have a much smaller resolution bandwidth, compared to an FFT of non-translated data, as shown in Fig. 1d.

Figure 1. Schematic diagram of the Zoom FFT process.
a) Original spectrum with region of interest shaded.
b) Spectrum after frequency translation.
c) Spectrum after low pass (anti-alias) filtering.
d) Spectrum after sub-sampling (decimation).

Background Information: Spectrum of a real signal

If a time domain signal is real ie. imaginary components are 0, such as generated by a physical process, then its spectrum will consist of an EVEN real part and an ODD imaginary part. EVEN real components means that the amplitude of the real values at corresponding positive and negative frequencies are equal. ODD imaginary components means the imaginary negative frequency components have equal amplitude but opposite sign to the positive frequency values. Fig 2 shows the spectrum of a real signal.

Figure 2. Spectrum of real time domain signal. Its real part is EVEN and imaginary part is ODD.

Figure 3. Spectrum of cos(ωt), a real function with EVEN real Fourier transform consisting of two delta functions one at ω and one at -ω.

Figure 4. Spectrum of sin(ωt), a real function with ODD imaginary Fourier transform consisting of two delta functions one at ω and one at -ω.

Frequency translation

Moving a signal to a different frequency has long been used in amplitude modulated (AM) radio signals where an audio frequency signal is mixed with (multiplied by) a high frequency carrier. Multiplying signals in the time domain is equivalent to convolving their spectra in the frequency domain. So a copy of the audio spectrum appears centered on the carrier frequency in the RF part of the spectrum.

So if ω is the (angular) frequency in the center of the region to be analyzed, then multiplying by, say, cos(-ωt) can be expected to move the region of interest down in frequency centered on 0 frequency. Fig 5 shows the result of multiplying by cos(-ωt).

Figure 5. Schematic diagram of the convolution of spectra of cos(ωt) and a real signal spectrum. Resulting spectrum show that two copies of the signal spectrum are generated.

Unfortunately, a second, frequency reversed copy of the spectrum is also there. Since the Fourier transform of cos(-ωt) has two delta functions, one at -ω and one at +ω. The convolution with -ω delta function effectively moved the signal spectrum down in frequency as desired, but the convolution with the +ω delta function moved a second copy of the signal spectrum up in frequency placing the signals -ve frequencies around -ω up to 0 Hz.

The second copy of the signal spectrum must be eliminated, as it corrupts the information around the 0 Hz. To do this, the convolution needs to be done with a single delta function. This can be achieved by using a sum of cos(ωt) and sin(ωt). sin(ωt) has the same frequency components as the cos(ωt) but has an ODD spectrum, ie. it has a negative delta function at ω and a positive delta function at -ω, but these lie in the imaginary domain. Multiplying sin(ωt) by i would convert all imaginary points to negative real equivalent (i*i = -1), see Fig 6. Thus cos(ωt) - isin(ωt) should have a frequency spectrum comprising just a single real delta function at -ω, see Fig 7.

Figure 6. Spectrum of cos(ωt) in blue, and the spectrum of -i_*sin(ωt) in green.

Figure 7. Spectrum of cos(ωt)-isin(ωt) consisting of a delta function at -ω.

Multiplying the signal by cos(ωt) - isin(ωt) is equivalent to convolving its spectrum with the single delta function at -ω producing the shifted spectrum required.

Figure 8. Spectrum of frequency shifted (complex) time domain signal.

Note that:

cos(ωt) - isin(ωt) = e^-iωt

so the formula is usually written as:

Y = X * e^-i(ωt)

Frequency translation algorithm

If the digital time domain real signal is represented by X_k, where the subscript 'k' represents successive time domain samples, then the array X(real) has all the signal samples. No X(imag) array is necessary as all values would be 0 for a real input signal. The frequency shifted signal, Y, will be a complex time domain signal and is given by:

Y_k = X_k(real) * cos(ωt)_k - iX_k(real) * sin(ωt)_k.

Y_k(real) = X_k(real) * cos(ωt)_k
Y_k(imag) = X_k(real) * -sin(ωt)_k.

As in all complex number manipulation, the addition of real and imaginary components is only notation, the two arrays hold the real and imaginary components and corresponding elements from each array are treated as the components of a single complex value in all calculations. The multiplication by i is implicit, by placing the data in the Y(imag) array all subsequent calculations will treat Y(imag) array elements as having an i coefficient.

Frequency translation source code

Using JavaScript as a demonstration language, here is the code to frequency translate a time domain signal.

var Np = yReal.length;      // number of data points in the data buffer
var dT = 1.0/Fsmp;          // Fsmp is the sample frequency
// Fc = center frequency of spectral region to be zoom analysed

for (var k=0; k<Np; k++)
{
  yReal[k] = xReal[k]*Math.cos(2*Math.PI*Fc*k*dT);
  yImag[k] = -xReal[k]*Math.sin(2*Math.PI*Fc*k*dT);
}

Frequency translation of complex time data

If the time domain data to be frequency translated is already complex, ie. the imaginary components of the samples, X(imag) are not all 0, then the multiply by e^-iωt becomes a complex multiply:

Input:

X_k(real) + iX_k(imag)

Output:

Y_k = (X_k(real) + iX_k(imag)) * (cos(ωt)_k - isin(ωt)_k)

ie.

Y_k(real) = X_k(real).cos(ωt)_k + X_k(imag).sin(ωt)_k
Y_k(imag) = X_k(imag).cos(ωt)_k - X_k(real).sin(ωt)_k

Low pass filter

Multi-pass filtering

The choice of filter characteristics will depend on the amount of 'zooming' required. An increase in resolution of 4 or 5 may be sufficient. On the other hand, zoom factors of 50 or more may be required. The most flexible way to address the filter stage of the Zoom FFT is to use a short FIR (finite impulse response) anti-aliasing filter and decimate (sub-sample) the output data, then filter again, usually with the same FIR coefficients, and decimate again. Typically the frequency resolution is increased by a factor of about 4 for each pass.

Filter parameter selection

An example may be the easiest way to illustrate the technique. Assume the following values:

Input sample frequency = 65536 Hz (power of 2 make the frequency bin spacing nice numbers)
Decimation rate  = 4              (save every 4th point, first pass decimation rate is 16384 Hz)
Output dynamic range = 96dB       (give a clean signal for a 16 bit DAC)

The anti-aliasing filter is required before decimation, the filter cutoff frequency should be chosen to be the highest frequency possible without pushing the transition region so high in frequency that its alias, folding back from the Nyquist rate, reaches the filter pass band. Aliases at higher frequencies will fall in the pass band but they will be attenuated by at least 96dB. The filter cut off frequency lies in the middle of the pass band edge and the stop band edge so the cut off frequency should be chosen as Fs/2. With this cut off frequency half of the transition region will alias back from Fs/2 falling wholly within the unaliased half of the transition region. Spectrum analysers typically maintain a pass band for display and measurement that comprises 400 point from the 512 positive frequencies of a 1024 point FFT. This means that with a sampling frequency of 65536 the frequency bin spacing is 64Hz so the 400 bin pass band will span 6400Hz (after decimation), and half the transition band is 1792Hz (8192-6400).

The width of the transition region is determined by the number of points in the FIR filter. The filter length should be as small as possible for efficiency, i.e. small compared to the length of the FFTs that are used in the filtering process. If a 1024 point FFT is used a filter width of around 100 points would be acceptable.

The FIR filter design using the Kaiser-Bessel window method is suitable for this type of situation. A description of this method and an interactive filter design page is available at the FilterDesign page. Various values of cut off frequency and filter length may be tried, and the resulting filter spectrum displayed.

Selecting 115 point filter gives a half transition width of 1760Hz, so the cut off frequency of 8.19kHz gives a pass band 0 to 6400Hz as desired.

Fig 9 shows the filter frequency response as it would be seen in the FilterDesign page. Fig 10 shows how the filter cutoff frequency and filter transition bandwidth will affect the frequency response of the signal after decimation.

Figure 9. Anti-aliasing filter design.

Figure 10. Anti-aliasing filter spectrum (blue) and its alias (red) after decimation.

Frequency domain filtering

The impulse response of the desired filter having been calculated, it remains to do the filtering. This may be done by direct convolution of the time domain data with the impulse response of the FIR filter, a computationally intensive method. Alternatively the equivalence of convolution in the time domain to multiplication in the frequency domain may be used to exploit the efficiency of the FFT algorithm.

The steps in frequency domain filtering are as follows:

Transform the FIR filter impulse response to the frequency domain using the FFT.
Take the next block of time domain signal data, transform to the frequency domain using the FFT.
Multiply signal spectrum and the FIR filter spectrum point by point.
Inverse FFT the result, converting the signal back to the time domain. Save the filtered data to the output buffer.

The last three operations are then repeated until the input is consumed.

Overlap-save filtering

When directly convolving in the time domain, the first and last few points of the input data do not get filtered correctly. This may be explained as follows. If the length of the impulse response of the filter, M, is 113 points, then Np = (M-1)/2 is 56 points. Convolution takes 56 points before and 56 points after each input data point, and multiplies them by the corresponding points in the filter impulse response. These product values are then summed to generate a single filtered data point. Clearly point number Np (=56) is the first point in the buffer with 56 valid points preceding it, similarly the point 57 places from the end of the buffer is the last that can be correctly filtered.

In the frequency domain filtering, the FFT of a data block produces the Fourier transform of a signal corresponding to the block of time domain points repeated again and again, butted together. So the filtering of the 56 points at the start of the FFT buffer use data points from the end of the buffer as part of the convolution. Similarly the last 56 use data from the beginning of the buffer in their filtering. As a result, 113 of the points of each FFT buffer are not correctly filtered and should be discarded. If the FFT is 1024 points long then only 912 (1024-112) points should be saved. To compensate the next 1024 point selected from the input should overlap the previous set by 112 points to avoid 112 point gaps in the output buffer. Once again it is not possible to correctly filter the first and last 56 points in the input data, but the overlap-save method correctly generates all the data that time domain convolution can generate.

The remaining vital point about the overlap-save method is that the set of points from the output buffer are not corrupt and should be saved, will vary depending on the placement of the impulse response in the FFT buffer prior to its transformation to the frequency domain. If the impulse response, say 113 points long, is placed in the first 113 points of the FFT buffer, then the after multiplication and inverse FFT, the uncorrupted data will be in the block from point 113 to 1023. Note that this positioning if the impulse response has introduced a phase shift in the output data relative to the input data buffer. With the impulse response position as described, the 113th point in FFT is actually the filtered version of the 56^th in the input buffer. This may not matter, but if a signal is being extracted and its arrival time is to be measured from the start of the input data buffer this may be quite significant. As a result, the recommended positioning of the impulse response in the FFT buffer is to place the center point of the impulse response at the start, bin 0, of the FFT buffer and the points 0 to 55 of the impulse response are placed at the back of the FFT buffer (see Fig 13). This introduces no phase shift if the bin numbers are maintained between input and output buffer copies. ie. the uncorrupted data points after inverse FFT are from 56 to 1023-56, which should be save to points 56 to 1023-56 in the output buffer if a phase shift is to be avoided. The first 56 points of the output buffer should be ignored.

Fig 11 shows a schematic diagram of overlap-save filtering. The FIR impulse response is shown as green areas of a time domain buffer this is then FFT'd. The filtering proceeds by taking a the FFT of a slab of data from the long input buffer (shown in red) these frequency domain data are multiplied by the FFT of the filter impulse response. The inverse FFT is then taken and the uncorrupted center section (shown in blue) of the result is saved in the output buffer. The next slab of the input data overlaps the previous section of the input buffer (as shown by the darker red block). This process is then repeated.

Figure 11. Schematic diagram of overlap-save filtering.

Sub-sampling

After anti-alias filtering, the time domain data is sub-sampled. Assuming a sub-sample rate of 4, the buffer length of the signal data is now ¼ of the original length. If there is sufficient data, the filter sub-sample process can be repeated. This zooms-in further on the region of interest (now near 0Hz).

The anti-aliasing filter coefficients required to allow a second sub-sampling by 4 are identical to those previously generated. The sampling frequency is 16384Hz (65536/4) and the low pass cutoff frequency is 2048Hz (8192/4). So filtering the sub-sampled signal a second time with the same coefficients will give a spectrum suitable for decimation at a frequency of 4096Hz. This will have a clean pass band of 1600Hz (6400/4).

Zoom FFT

If the two pass filtering were done as described, and then the FFT of 1024 points of the data is taken, then the spectrum from 0 to 1600Hz will be a zoomed view of the original region of interest. The frequency resolution of this zoomed spectrum will be 4Hz (4096/16) sixteen times finer than the original 64Hz (65536/1024).

To demonstrate the complete process a data buffer of 20000 points has been created, it contains random noise along with two high frequency tones, close together, around 19 kHz.

Fig 12 shows the sample spectrum of the raw data buffer.

Figure 12. Spectrum of raw data with signal noticeable peak near 19 kHz.

Here is the source code of the zoom FFT process. The zoom frequency is 19 kHz, two passes of the anti-alias filter, sub-sample process are then carried out. The filter coefficients are those calculated above, the sub-sample factor is 4, so the frequency resolution is improved from 128 Hz to 8 Hz.

Fig 13 shows the spectrum of the data from the zoom FFT process.

Figure 13. Spectrum of the signal from Fig 13 after zoom FFT processing. The feature near 19kHz is resolved to be two tones, one at 18811 and one at 18925 Hz.

Zoom FFT example source code

The source code to generate the Zoom FFT output whose spectrum is shown in figure 13, is given below.

function genTone2(buf)
{
  const fs = 65536,
        f1 = 18811,
        f2 = 18925,
        dT = 1/fs,
        PI = Math.PI;

  for (let i=0; i<buf.length; i++)
  {
    buf[i] = Math.sin(2*PI*f1*i*dT)+Math.sin(2*PI*f2*i*dT)+Math.random()+Math.random()-1;
  }
}

function zoomFFT(cvsIDa, cvsIDb)
{
  const Fc = 19000,    // zoom frequency
        tSize = 1024,  // FFT size 
        Fs = 65536,    // sample freq
        Fa = 0,        // anti-aliasing filter parameters
        Fb = 8190,
        M = 115,
        Att = 96,
        coeffs = calcFilter(Fs, Fa, Fb, M, Att),  // generate anti-aliasing filter
        Np = (M-1)/2,  // half filter length of data at front and back of buffer is corrupted 
        sigBuf = new Array(24000),
        opReal = [], opImag = [],
        aReal = [], aImag = [];
  let tBufLen = sigBuf.length;  // signal buffer length

  genTone2(sigBuf);    // generate the test signal

  // Zoom FFT starts here
  freqTranslate(sigBuf, aReal, aImag, tBufLen, Fs, Fc);
  tBufLen = overlapSaveFilter(coeffs, aReal, aImag, opReal, opImag);  // anti-alias filter
  // 1st filter, sub-sample pass
  for (let ip=Np, op=0; ip<tBufLen; ip+=4, op++)    // sub-sample by 4
  {
    aReal[op] = opReal[ip];
    aImag[op] = opImag[ip];
  }
  tBufLen = overlapSaveFilter(coeffs, aReal, aImag, opReal, opImag);  // anti-alias filter  
  // 2nd filter, sub-sample pass
  aReal.length = 0; // reset array
  aImag.length = 0;
  for (let ip=Np, op=0; ip<tBufLen; ip+=4, op++)  // sub-sample by 4
  {
    aReal[op] = opReal[ip]; 
    aImag[op] = opImag[ip];
  }
  // zoomed complex time domain data now in aReal, aImag

  // for diagnostics only, grab a buffer full of time data
  plotSpectrum(cvsIDa, sigBuf, [], tSize, Fs);
  // for diagnostics only, grab a buffer full of zoomed (complex) time data
  plotSpectrum(cvsIDb, aReal, aImag, tSize, Fs/16, Fc);
}