I over-engineered a Fast Fourier Transform for Arduino

There’s an alternate way to save space: the discrete Hartley transform.

https://news.ycombinator.com/item?id=27386319

https://twitter.com/theshawwn/status/1400383554673065984

  dht(x) = fft( (1 + 1j)*x ).real / area(x)**0.5

  area(x) = np.prod(np.shape(x))

The dht is its own inverse, which is a neat trick. No need for an inverse fft. In other words, dht(dht(x)) == x.

Notice that the output is the real component of the fft. That means the storage required is exactly identical to the input signal. And since the input signal is copied to the real and imaginary components (1 + 1j)*x, the intermediate representation size is also equal to the input signal size. (Just take sin and cos of the input signal.)

The dht has almost all the same properties of the fft. You can blur an image by taking a hamming window and multiplying. You can resize an image by fftshift + pad with zeros + fftshift. Etc.

In practice the dht seems to have all the benefits of fft with very few downsides. I’m amazed it isn’t more popular.

enriquto · 3 years ago

The convolution theorem is much uglier for the hartley transform. I guess this is the main reason. You can see the Fourier transform as a simple variant of the Hartley transform where the linear filtering can be done directly in the frequency domain, without all the fuss of the hartley transform convolutions.

Another advantage of Fourier with respect to Harley is that derivatives become products with polynomials. Thus the Fourier transform transforms linear pde into trivial algebraic equations. For the Hartley transform, you get more complicated functional equations, so that it's not as useful.

> You can blur an image by taking a hamming window and multiplying.

It doesn't seem like you can? (Unless your image has some symmetries).

sillysaurusx · 3 years ago

Here's an image: https://i.imgur.com/hNCwisy.png

Here's fftshift(dht2(img)): https://i.imgur.com/EB6yIb4.png

Here's a hamming window: https://i.imgur.com/NsGKb0X.png

Here's a hamming window raised to power of 20: https://i.imgur.com/ejCviD7.png

Here's the result of multiplying that with the transformed image: https://i.imgur.com/trHdWZS.png

And here's dht(fftshift(that)): https://i.imgur.com/of4bvhv.png

Presto, blurry image. No symmetries required.

We can increase the blur factor by changing the power from 20 to 200: https://i.imgur.com/yqkR3qb.png

And 2000 (megablur): https://i.imgur.com/7h9patv.png

The code I used to blur:

    def blur(img, factor):
        x = dht2(img)
        x = fftshift(x)
        x = x * hamming(img) ** factor
        x = fftshift(x)
        x = dht2(x)
        return x

There doesn't seem to be any limitations -- anything you can do with FFT, you can do here. And since the DHT isn't complex, you don't need to deal with phase/amplitude.

It's kind of like working with wavelet coefficients in a jpg transform, except wavelets are a lot harder to work with compared to fft signals (which is why people use fft instead of wavelets everywhere). But dht is the best of both worlds -- all the power of fft, none of the hassle of complex values.

goldenkey · 3 years ago

To explain further, HT is unable to compose complex valued functions - it is only a complete system for real valued functions, that is: R -> R. So naturally, it's simpler, because it's less expressive.

Let me wow you since HT is not even on the map of exotic or alternative wavelet decompositions.

For example, Hermite transform: https://en.wikipedia.org/wiki/Hermite_transform

or if looking in the discrete space: https://en.wikipedia.org/wiki/Hadamard_transform

See: https://en.wikipedia.org/wiki/Orthogonal_functions

I'm amazed that folks think Fourier series is so special when it's just one of many in a deluge of complete function systems.

kragen · 3 years ago

the fourier transform is very special indeed because of what its basis functions are the eigenvectors of

tails4e · 3 years ago

Genuinely curious - How does this save space? If the input was real, the first operation (inside the fft call) makes it complex, which doubles the storage space. Sure the output is real again, but the intermediate storage was 2x the input and output.

sillysaurusx · 3 years ago

Since the input is copied to the real and imaginary components, you don’t actually need to copy it at all.

Think of it like having a pointer (in the C sense) to the real and imaginary image buffers. Set both pointers to the input signal, then run your FFT algo on that.

The key is that it minimizes memory bandwidth, not computation. It’s highly cache coherent.

wdkrnls · 3 years ago

What is `area(x)` here? Some kind of numerical integration like the trapezoid rule? Can someone please provide example code which can actually be tested?

It would be interesting to know what the side project that required FFT on the Arduino was. I'd also be interested in which other applications the FFT algorithm has on Arduino given the performance shown. Could you create a real time guitar tuner for example?

Gordonjcp · 3 years ago

You wouldn't use an FFT for a guitar tuner, you'd just time zero crossings.

I did use an Arduino to detect "key tones" on a wireline radio controller, sampling at 16kHz (ADC running flat out on one channel) and implementing Goertzel's Method to detect the 2175Hz guard tone and 1950Hz and 1850Hz keying tones, and a twin-T notch filter to remove the 2175Hz tone from the audio. The controller sent a short burst of 2175Hz followed by a short burst of one of the lower tones, to key up the transmitter on the correct channel, then the presence of any audio on the line (including the "low level guard tone", 2175Hz but much quieter) kept the transmitter on until you'd finished speaking.

It was sensitive enough to detect the low-level guard tone even when the audio over it was quite loud.

As far as I can tell, it's been running for a good ten years now and I've never been called back in to fix it.

Edit: just to be clear the notch filter was analogue in the audio path! I did make a similar device to process audio from an RTP stream between devices linking radios that did everything digitally but that required a Raspberry Pi with a massive amount of CPU horsepower and a limitless almost infinite amount of RAM - it used almost 128MB!

mjochim · 3 years ago

Apart from the fact that FFT is way more complex than needed for a guitar tuner, I also believe it’s not even really possible to use a Fourier Transform for a useful guitar tuner. Because you won’t achieve good enough frequency resolution.

I would be interested if I am missing any possible tricks here. But the low E string on a guitar is about 82.4 Hz. For a useful basic tuner you would need a precision of maybe 5 cents (~ percent of a semitone), or roughly 0.25 Hz (roughly 1 Hz for the high E string). Frequency resolution with the FFT improves when you analyse a longer signal. For 0.25 Hz resolution, you need more than two seconds of audio. Two seconds where the pitch doesn’t change. I don’t think you’d get anywhere with such a design.

I’m not sure if this could be changed with a higher sample rate. With a higher sample rate, you have more samples per second to feed into the FFT, but also the upper boundary for frequency increases, so I believe frequency resolution just stays the same. I’d need to look up the formulas, though. Maybe with extreme zero padding? Or any other tricks?

joshvm · 3 years ago

One example is audio visualisers, though you can buy chips like the MSGEQ7 that will output amplitudes at a range of fixed frequencies.

See for example: http://elm-chan.org/works/akilcd/report_e.html - Elm Chan's fftavr library has been around for ages now.

dragontamer · 3 years ago

Yeah, I know Arduino is really low power and cheap, but these days you can spend like $35 to get a Rasp. Pi (or equivalent) small computer for actual compute girth for an application. Alternatively, a UART-to-wireless link could connect your data-stream to a server somewhere for stronger processing.

My guess is that you need something incredibly small power-usage, that must run on Arduino only.

A lot of FFT projects I'm thinking of are just... crappy SDRs. Maybe audio is "slow" enough that an Arduino's FFT is fine, but most applications are probably doable with just regular digital filters.

HappyPanacea · 3 years ago

The author links to another blog post called "A nice approximation of the norm of a 2D vector" [1], Recently I learnt from Youtube about the Alpha max plus beta min algorithm [2] which is a better approximation, it uses 1 max and linear combination to achieve 3.96% largest error compared to the approximation in the blog post which uses 2 max and has 5.3% error. I also wrote a comment on his blog so hopefully he will see it.

[1]: https://klafyvel.me/blog/articles/approximate-euclidian-norm... [2]: https://en.wikipedia.org/wiki/Alpha_max_plus_beta_min_algori...

rocqua · 3 years ago

From wikipedia it looks like both the article's approximation and this method have octagonal level sets. I was wondering what the actual difference was.

Looking into it a bit, I came across the error graph [2] from the article[1]. The main difference seems to be that the octagon used by the article is 'too small'. The maximum error would be smaller with a larger octagon. Currently there is an undershoot of 5.2 % and an overshoot of 2.5%. To reduce the maximum error, you need undershoot and overshoot to be equal.

Another notable difference is that the article's octagons have a different rotation than the alpha plus max beta algorithm. The article's octagons cross the axes perpendicularly, whilst the alternative algorithm is rotated by 360/16=22.5 degrees so it has vertices at the axes. That certainly changes at which angles the approximations are better, which might matter in practice.

What I wasn't able to find quickly is the average error for the article's approximation. Would be interesting to compare that to the 2.41% for the better algorithm.

[1] https://klafyvel.me/blog/articles/approximate-euclidian-norm... [2] https://klafyvel.me/assets/blog/articles/approximate-euclidi...

adgjlsfhk1 · 3 years ago

the difference is error metric. the article is minimizing L2 error instead of maximum error.

utopcell · 3 years ago

Little known fact: FFT is an O(nlgn) algorithm, but incremental FFT is O(n), meaning that if you want to compute FFT on the past n points of a stream of samples, it only takes O(n) to compute it for every new incoming sample.

jononor · 3 years ago

O(n) for every new samples is O(n*2) when seen over time/samples?

artemisart · 3 years ago

Yes but you have no choice for streaming/real-time applications or anytime you need results for each sample, where doing the full recompute would be O(n^2 log n) total.

fhars · 3 years ago

No, it is O(mn) if your stream is m samples long, compred to O(m n lg(n)) for the alternative.

rencrisa · 3 years ago

@utopcell do you have a reference for incremental FFT? I'm not sure if I understand what your implying.

Given a polynomial f(X), obtaining f's evaluations over the n-th roots of unity takes O(nlogn) using FFT. Are you stating that obtaining g(X) != f(X) evaluations over the n-th roots of unity should take O(n) time assuming some precomputation derived in the FFT for f?

If X_k is the coefficient for the sample window (x_0, x_1, .., x_{n-1}) and X'_k is the coefficient for window (x_1, .., x_n) then X'_k = (X_k + x_n - x_0)exp(2pik/n).

jaygreco · 3 years ago

One tidbit I got out of this that is not directly related to the content of the post is that there are apparently some high-quality instructables out there, and that people spend time writing them!

99% of the instructables I’ve come across in the last 10 or so years have been low-effort junk that aren’t even worth reading. The few the author linked are quite great.

nsajko · 3 years ago

The Oogoo articles on Instructables were eye-opening for me.

JKCalhoun · 3 years ago

Maybe should have done it in a GitHub README.

spuz · 3 years ago

femto · 3 years ago

Think about using the Despain algorithm, derived by recursively decomposing an N-point FFT into FFTs of size sqrt(N). You end up with a bunch of 4-point FFTs where the twiddle factors are rotations by multiples of pi/2, which can be implemented with swapping of real/imaginary parts and a change of sign. Non-degenerate twiddle factors still occur between those 4-point FFTs In this way the number of maths operations can be reduced. The cost is that the input and output are both in a weird order.

lloydatkinson · 3 years ago

Really this should be titled for AVR - Arduino covers a broad range of microcontrollers with very different capabilities. As much as some people deliberately try suggest Arduino is a microcontroller, it isn't.

ChuckNorris89 · 3 years ago

True, but Arduino sounds better and drives clicks as it is well known.

Most readers, even on HN, don't know exactly what AVR is. It could very well be Automatic Voltage Regulator.

> abhilash_patel's instructable is a Great instructable

Christ, I hate Instructables. A site with such shitty presentation that should be so much better.

Years ago I tried using an FFT library for the Arduino to convert audio to frequency domain in order to "bucket" the audio spectrum and (hint: music visualizer) light up some LEDs.

The Arduino+FFT were not however up to the task and I ended up going with a dedicated IC to do the audio frequency bucketing.

I suspect these days a better FFT or hardware like the Teensy could likely handle the job.

> Christ, I hate Instructables. A site with such shitty presentation that should be so much better.

Yeah, the proper way of using Instructables is to download the PDF version of the article.