The Fourier transform is a neural network

Is there anything deep here? There's a parametrized representation of a function and it's being interpreted as a neural network, why is this surprising? It's like saying that Newton's second law is a neural network: F=ma can be written log(F) = log(m) + log(a). Aha, this is a neural network! The inputs are m and a, the first layer is sparsely connected with log activation functions, the second layer is fully connected with an exponential activation function:

F = exp(c1 * o1 + c2 * o2) o1 = log(c3 * a + c4 * m) o2 = log(c5 * a + c6 * m)

If you feed it enough data you'll find c1=c2=c3=c6=1 and c4=c5=0.

But saying that Newton's second law is a Neural Network, while correct, seems a bit deceptive in that it's not a deep idea at all.

montebicyclelo · 5 years ago

Op here, IMO the "deepest" bit is [1] - the network learns the DFT in order to reconstruct the signal, and is not explicitly trained on the DFT values from the FFT.

Admittedly, I should have mentioned that any linear transform can be considered to be a single layer neural network (if you want to see the world through a neural network lens), and will add this to the post at some point.

In fact, I have a series of posts planned, which will reveal that well known algorithms/models are actually neural networks...

[1] https://sidsite.com/posts/fourier-nets/#learning-the-fourier...

hctaw · 5 years ago

> I should have mentioned that any linear transform can be considered to be a single layer neural network

I think this should be turned around. A single layer neural network can be considered a linear mapping (but not necessarily an orthogonal transform or change of basis, like the DFT).

A more clear example of this are adaptive filters, which are trained in real time using gradient descent.

This is an important distinction because thinking of "X is a Neural Net" doesn't provide meaningful insight, whereas "Neural Nets with X properties are a case of linear dynamic systems, here's an example of how we can equate one linear transformation to a neural net" leads you to deeper conclusions on the analysis and synthesis of ANNs in the context of dynamics - which encompasses a much larger surface area than the DFT.

mochomocha · 5 years ago

You might be interested in this line of work: https://eng.uber.com/neural-networks-jpeg/

Training straight from DCT coefficients to avoid spending time learning a similar representation in the bottom layers of the net. I've personally toyed with something similar on GANs to gauge the computational benefits of not doing convolutions in the bottom layers of a net but learning directly in a FFT-like compressed space instead.

aesthesia · 5 years ago

As far as I can tell, you're still using a fixed inverse DFT as the reconstruction layer, so it's not just rediscovering the DFT on its own. Instead of learning a linear transformation from input-output pairs, it's learning the inverse of a linear transformation when that transformation is given as an oracle. It's not terribly surprising that this works, although there are probably some interesting issues of numerical conditioning in the general case.

suvakov · 5 years ago

Actually, "learns" here means to fit reverse linear transformation a.k.a. inverse matrix. He defines reverse FT matrix and using gradient descent numerically converges to inverse matrix. Nothing more than inefficient way to solve system of linear equations.

volta83 · 5 years ago

> But saying that Newton's second law is a Neural Network, while correct, seems a bit deceptive in that it's not a deep idea at all.

I guess the point is that neither is the idea of a Neural Network.

Nasrudith · 5 years ago

A neural network can also be set to learn unconditionally return a fixed value with no learning feedback. I don't think lower bounds on capabilities are very informative. So could many arbitrarily complex arrangements that do massive amounts of work only to discard it and return a constant. An upper bound of what an approach is capable of is more useful. Say no matter how vast a look up table is it will never return a different value for the same input regardless of prior sequence.

korijn · 5 years ago

Perhaps it is the compsci glasses talking, but this is just one very specific instance where someone figured out a way to map the DFT problem to a neural net. I agree that it is unfortunate that it is being presented as some kind of big discovery, and that the fundamental lesson is either still undiscovered to the author or just unclearly communicated, but there is still good intention in there (sharing something you've learned and eliciting feedback).

windsignaling · 5 years ago

It's a stretch to call it a neural network. It was already well-known that the Fourier transform can be seen as a matrix multiply which minimizes some least squares problem.

This can be found somewhere in S.M. Kay's Fundamentals of Statistical Signal Processing: Estimation Theory (Vol 1), Detection Theory (Vol 2).

Imnimo · 5 years ago

I think this is the sort of thing that is very obvious if you are already comfortable with neural networks and the FFT. But if you're only comfortable with neural networks, and the FFT feels like arcane magic, this exercise might be very instructive.

candiodari · 5 years ago

Well the point is that it does not make sense to DFT data before feeding it into a multilayered neural network (or calculating the force generated by mass and acceleration as you point out). Those formulas make no sense: the network can just learn them on the fly.

In fact you'll find that this does not just work for the Fourier transform, but for any FIR filter (and some other classes), and therefore neural networks can deal with signals and construct low-pass filters, high-pass filters, bandgap filters, ... as required for the task at hand without the network (or it's designer) having any idea at all what is happening.

I mean there's some basic assumptions these reasonings make (main one is that you need to feed many discretized values from a time window).

Of course, a problem remains: local optima. Just because a neural network can construct a filterbank or do a DFT, doesn't mean that it will actually do it when the situation warrants it. If there's a local optimum without filters ... well, you may get unlucky. It there's many local optima without filters ... sucks to be you.

nonameiguess · 5 years ago

I don't think I can agree with that. You do a Fourier transform when the data you're working with doesn't admit easily or at all certain operations in the time domain but it does in frequency domain. If you already know that to be the case, preprocessing with a FFT is a better idea than hoping a neural network with enough layers uses a few of those to much less efficiently perform a DFT. Always take advantage of pre-existing knowledge of structure in your data. With the FFT especially, depending on how your data is being ingested, you might be able to use specialized DSPs that implement the FFT directly in hardware. These are cheap and easy to find since they're used in frequency-division multiplexing.

monocasa · 5 years ago

Which is weird, because biological neural nets seem to have some evolutionary pressure to do hardware fourier transforms before neurons even get involved.

You can see this most clearly in the auditory system where the incoming signal is transformed into the frequency domain by the cochlea before the signal is received by the epithelial cells.

Neurons absolutely love working in the frequency domain, but they seem to prefer to not be the ones to do the binning in the first place.

ska · 5 years ago

> Those formulas make no sense: the network can just learn them on the fly.

> Just because [...] can construct a filterbank or do a DFT, doesn't mean that it will actually do it when the situation warrants it.

These statements seem in conflict, no?

xyzzy21 · 5 years ago

The "Deep" part ia to realize what this really means in terms of limitations of ML/NN!

haecceity · 5 years ago

Can be represented as a neural network is not the same as is a neural network??

namelessone · 5 years ago

I think the issue here is that almost anything can be represented as a neural network. You could create a neural network that does a xor operation, for example. There is nothing new about this.

Here is the special sauce: “We can consider the the discrete Fourier transform (DFT) to be an artificial neural network: it is a single layer network, with no bias, no activation function, and particular values for the weights. The number of output nodes is equal to the number of frequencies we evaluate.”

A single layer neural network is a sum of products, the basic Fourier equation is a sum of products.

In this view there are lots of single layer neural networks out there. For me, it’s the training algorithm (backprop) that sets apart the neural net.

nerdponx · 5 years ago

In my mind, the layers make the network "non-trivial". A Fourier transform is only a neural network in a trivial, definitional sense.

hyperman1 · 5 years ago

The activation function is what keeps layers separated. When it isn't there, a pair of layers devolves to a matrix multiplication, which can be replaced by its resulting matrix, a single layer.

omarhaneef · 5 years ago

That is a thoughtful distinction, too. I think you're right.

29athrowaway · 5 years ago

Biological neural networks do not use backpropagation, or at least, there's no evidence of that yet.

Backpropagation is a placeholder for stuff we don't understand.

etienne618 · 5 years ago

It seems that backpropagation might be a good abstraction for biological learing rules after all: "local predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules" from https://arxiv.org/abs/2006.04182

cl3misch · 5 years ago

Isn't "backpropagation" just a synonym for the partial derivative of the scalar cost function with respect to the weights? And in the mathematical formulation of biological neural networks these derivatives can't be computed analytically. Your comment sounds like "backpropagation" is some kind of natural phenomenon.

Deleted Comment

ta988 · 5 years ago

Not exactly the same kind of backpropagation but signals fly the other way in neurons as well https://en.wikipedia.org/wiki/Neural_backpropagation

pontus · 5 years ago

threatripper · 5 years ago

The Fourier transformation is a special case of a linear transformation. Linear neural networks can represent any linear transformation. Both can also be expressed as matrix multiplication.

It's not really all that surprising if you know some of the math behind neural networks, matrix algebra, linear transformations or the Fourier transformation.

amcoastal · 5 years ago

I dont think its surprising but its definitely a cool little exercise. Somewhat aside of the content of the post, I could see some use cases where specifically implementing DFT layers in your architectures could lead to improved performance over using just the raw inputs or activations. Noisy datasets, or synthetic data based transfer learning come to mind as potential uses for the DFT as a step in your ML pipeline.

There is the FFT in TensorFlow already: https://www.tensorflow.org/api_docs/python/tf/signal/fft

thearn4 · 5 years ago

I've come to internalize the Fourier transform as a type of sparse matrix algorithm. I.e. something that implements a particular matrix-vector product without requiring explicit construction of said matrix.

srean · 5 years ago

And pushing the parantheses to factor out common subexpressions. Essentially exploiting the distributive law

enchiridion · 5 years ago

Do you mean the FFT? I've been trying to wrap my head around the Fourier transform lately, and I can see the matrix connection to an FFT, but not the Fourier transform in general.

amatic · 5 years ago

On the other hand, if you don't yet know a lot of NN math, like me, it is super surprising. Definitely a cool way to connect FFT and NNs.

amelius · 5 years ago

Yes but this article shows that the matrix coefficients of the Fourier transform can be learned.

This is also not really surprising for a single layer neural network.

madhadron · 5 years ago

That's because "learning" in neural networks is a fancy way of saying "curve fitting."

cochne · 5 years ago

This is kind of silly. Neural networks are universal function approximators, meaning any function "is" (more accurately, "can be represented by") a neural network. In the DFT case, it is a linear function so we could get an exact representation. Though there is also nothing stopping you from just saying, f(x) is a neural network, because I can choose my activation function to be f.

TheRealPomax · 5 years ago

If silly but interesting things weren't allowed, there would be no point to having a website like this. The article is a neat exploration of two things that many, many people don't realise are related because they simply don't know the maths involved in either of the two subjects.

vletal · 5 years ago

Maybe, if the title wan not so overblown the pushback againt the articles premise would not be so hard.

visarga · 5 years ago

> Neural networks are universal function approximators

What about discontinuous functions?

estebarb · 5 years ago

Every function can be made continuous if you add one dimension in it's output for defined/not defined.

hyperbovine · 5 years ago

Continuous functions are dense in <pick your favorite function space> so yes.

> Neural networks are universal function approximators,

Do we have tight error bounds proofs for neural networks as approximators ?

https://en.wikipedia.org/wiki/Universal_approximation_theore...

phonebucket · 5 years ago

That was a fun read. While it might be unsurprising to some, it's a testament to breadth of applications of modern machine learning frameworks.

I enjoyed this sentence in particular.

> This should look familiar, because it is a neural network layer with no activation function and no bias.

I thought it should look familiar because it's matrix multiplication. That it looks like a neural network layer first and foremost to some is maybe a sign of the times.

6gvONxR4sf7o · 5 years ago

> it's a testament to breadth of applications of modern machine learning frameworks.

More like a testament to the the breadth of applications of linear algebra. It is absolutely remarkable what we're able to compute and analyze in the form of y = A x (hilbert spaces are a wild invention).

But it really isn't a testament to modern ML frameworks in any way. The fourier transform has been easy to compute/fit in this exact way (fourier = linear problem + solving linear problems by optimization) by modern-at-the-time frameworks for over two centuries.

cantagi · 5 years ago

Great post!

I once had to port a siamese neural network from Tensorflow to Apple's CoreML to make it run on an iPhone. Siamese neural networks have a cross convolution step which wasn't something CoreML could handle. But CoreML could multiply two layer outputs element-wise.

I implemented it using a fourier transform (not a fast fourier transform), with separate re and im parts, since a fourier transform is just a matrix multiplication, and convolution is element-wise multiplication in the fourier domain. Unsurprisingly it was very slow.

photonemitter · 5 years ago

I'll leave these here: https://en.wikipedia.org/wiki/Multiresolution_analysis https://en.wikipedia.org/wiki/Discrete_wavelet_transform

And as a lot of people have mentioned in here, DFT is pretty much implicated in neural networks already because of the mathematics (especially in convolutional/correlational neural networks, which often make use of the convolution theorem (which is "just" fourier coefficient multiplication) to do the convolution)

Extending this post it seems more interesting to look more generally at the correspondence with wavelet-transforms.

dplavery92 · 5 years ago

>especially in convolutional/correlation neural networks, which often make use of the convolution theorem to do the convolution

Is this true? With the learned filters being so much smaller than the input imagery/signals, and with "striding" operations and different boundary conditions being wrapped into these algorithms, it doesn't seem like a natural fit.

Aardwolf · 5 years ago

So is the identity transform...

I wish your comment is voted up. Where do we go next ? -- look addition is a neural network, look weighted average is a neural network, look linear regression, logistic regression, Poisson regression are neural networks ...

Certhas · 5 years ago

I have definitely seen regressions referred to as linear machine learning. Not even kidding.

> Where do we go next ?

Neural nets are all you need[*].

[*] if what you need is non-robust black boxes