The Butterfly Algorithm

Danny Hermes (dhermes)

February 11, 2015

You can find the source of these slides on GitHub.

In [1]:
%matplotlib inline
from make_partition_plots import make_butterfly
make_butterfly()

Outline

  • Background for Butterfly
  • Motivation from DFT
  • Details of Algorithm and Speedup
  • Code and Comparison

Setting the Stage

$$\widehat{f}(\xi) = \mathcal{F}f(\xi) = \int_{\mathbb{R}} \text{exp}\left(- 2 \pi x \xi \sqrt{-1}\right) f(x) \, dx$$

$$\widehat{f}_k = \sum_{j = 0}^{N - 1} \text{exp}\left(- \frac{2 \pi k j \sqrt{-1}}{N}\right) f_j$$

In [2]:
from IPython import display

display.Audio(filename='resources/bluewhale.wav')
Out[2]:

Blue Whale

In [3]:
import numpy as np


def load_whale():
    # http://www.mathworks.com/help/matlab/math/fast-fourier-transform-fft.html
    data = np.load('resources/bluewhale.npz')
    X = data['X']
    sampling_rate = int(data['rate'])

    blue_whale_begin = 24500 - 1
    blue_whale_end = 31000 - 1

    blue_whale_call = X[blue_whale_begin:blue_whale_end + 1]
    size_call = len(blue_whale_call)

    # Pad signal with zeros up to the next power of 2.
    N = int(2**np.ceil(np.log2(size_call)))

    blue_whale_call = np.hstack([
        blue_whale_call,
        np.zeros(N - len(blue_whale_call)),
    ])
    return sampling_rate, blue_whale_call
In [4]:
sampling_rate, blue_whale_call = load_whale()

N = len(blue_whale_call)
time_base = np.arange(N) * 10.0 / sampling_rate
In [5]:
from matplotlib import pyplot as plt


plt.plot(time_base, blue_whale_call)
plt.title('Blue Whale B Call')
plt.xlim((time_base[0], time_base[-1]))
plt.xlabel('Time (seconds)')
plt.ylabel('Amplitude')
plt.show()
In [6]:
import time


start = time.time()
dft_whale_call = np.fft.fft(blue_whale_call, n=N)
fft_duration = time.time() - start

message = r'\text{Duration with FFT optimized: } %2.10f' % (fft_duration,)
display.display(display.Math(message))
$$\text{Duration with FFT optimized: } 0.0009500980$$
In [7]:
re_sampled_time = np.arange(len(dft_whale_call)) * sampling_rate / (10.0 * N)
amplitude = np.abs(dft_whale_call) / N

plt.plot(re_sampled_time[:N/2], amplitude[:N/2])

plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.title('Component Frequencies')
plt.show()

From DFT to Butterflies

Real Live Butterflies

$$\widehat{f}_k = \sum_{j = 0}^{N - 1} K(t_k, s_j) \, f_j$$

In [8]:
from make_partition_plots import get_random_intervals
from make_partition_plots import naive_interaction


S_VALUES, T_VALUES = get_random_intervals(16)
naive_interaction(S_VALUES, T_VALUES, N=16)

$$\widehat{f}(t) = \sum_{s} K(t, s) \, D(s)$$

DFT as Butterfly

$$t_k = 2 \pi k, \quad s_j = \frac{j}{N}, \quad K(t, s) = e^{- t s \sqrt{-1}}$$

Naive gets Quadratic

We see that in the most naive implementation, the runtime is quadratic.

In [9]:
def compute_f_hat(f, t, s, kernel_func):
    f_hat = np.zeros(f.shape, dtype=np.complex128)
    for k in xrange(len(f)):
        # Vectorized update.
        f_hat[k] = np.sum(kernel_func(t[k], s) * f)

    return f_hat
In [10]:
# We expect a "slowdown factor" of N / log N
expected_quadratic = fft_duration * N / np.log2(N)

message = r'\text{Expected Quadratic Run-time: } %2.10f' % (expected_quadratic,)
display.display(display.Math(message))
$$\text{Expected Quadratic Run-time: } 0.5987079327$$
In [11]:
N_vals = np.arange(N, dtype=np.float64)
t = 2 * np.pi * N_vals
s = N_vals / N


def dft_kernel(t, s):
    return np.exp(- 1.0j * t * s)


start = time.time()
f_hat = compute_f_hat(blue_whale_call, t, s, dft_kernel)
naive_duration = time.time() - start

message = r'\text{Actual Naive Duration: } %2.10f' % (naive_duration,)
display.display(display.Math(message))
$$\text{Actual Naive Duration: } 7.0778009892$$

Is the naive implementation correct?

In [12]:
error = np.linalg.norm(f_hat - dft_whale_call, ord=2)

sig, exponent = str(error).split('e')
expr = r'\|e\|_2 = %s \cdot 10^{%s}' % (sig, exponent)
display.display(display.Math(expr))
$$\|e\|_2 = 3.65989629865 \cdot 10^{-10}$$

Doing Better than Quadratic

In [13]:
naive_interaction(S_VALUES, T_VALUES, N=16)