Tensors, the geometric tool that solved Einstein's relativity problem

If you have any linear algebra background, then the definition of a tensor is straightforward: given a vector space V over a field K (in physics, K = R or C), a tensor T is a multilinear (i.e. linear in each argument) function from vectors and dual vectors in V to numbers in K. That's it! A type (p, q) tensor T takes p vectors and q dual vectors as arguments (p+q is often called the rank of T but is ambiguous compared to the type).

(If you're unfamiliar with the definition of dual vector, it's even simpler: it's just a linear function from V to K.)

mr_mitm · 2 years ago

Yes, very simple, except that when physicists say "tensor", they mean tensor fields, on smooth, curved manifolds, in at least four dimensions, often with a Lorentz metric. Things stop being simple quickly.

adrian_b · 2 years ago

It does not matter on what set a tensor field is defined.

A tensor field is not a tensor, but the value of a tensor field at any point is a tensor, which satisfies the definition given above, exactly like the value of a vector field at any point is a vector.

The "fields" are just functions.

There are physics books that do not give the easier to understand definition given above, but they give an equivalent, but more obscure, definition of a tensor, by giving the transformation rules for its contravariant components and for its covariant components at a change of the reference system.

The word "tensor" with the current meaning has been used for the first time by Einstein and he has not given any explanation for this word choice. The theory of tensors that Einstein has learned had not used the word "tensor".

Before Einstein, the word "tensor" (coined by Hamilton) was used in physics with the meaning of "symmetrical matrix", because the geometric (affine) transformation of a body that is determined by the multiplication with a symmetric matrix extends (or compresses) the body towards certain directions (the axes that correspond to a rotation that would diagonalize the symmetric matrix). The word "tensor" in the old sense was applied only to what is called now "symmetric tensor of the second order" (which remains the most important kind of the tensors that are neither vectors nor scalars).

g15jv2dp · 2 years ago

That's incorrect if V is infinite-dimensional. A (0,1)-tensor is just supposed to be an element of V but with your definition you get an element of the bidual of V. Which is not isomorphic to V when dim V is infinite. And even when dim V is finite, you need to choose a basis of V to find an isomorphism with the bidual. From a math point of view, that's just no good.

senderista · 2 years ago

No, the isomorphism between V and V** (for finite-dimensional V) is canonical. The canonical isomorphism T:V->V** is easy to construct: map a vector v in V to the element of V** which takes an element w from V* and applies it to v: T(v)(w) = w(v).

btilly · 2 years ago

You are right about infinite dimensions, wrong about finite dimensions. V and V* are naturally isomorphic for finite dimensions.

In finite dimensions, V and V* are isomorphic, but not naturally so. The isomorphism requires additional information. You can specify a basis to get the isomorphism, but many bases will give the same isomorphism. The exact amount of information that you need is a metric. If you have a metric, then every orthonormal basis in that metric will give the same isomorphism.

cfgauss2718 · 2 years ago

Can you provide some examples of important tensors in physics for which the underlying vector space is infinite dimensional? I’m most familiar with the setting of tensor fields on manifolds, in which case the vector bundle consists of finite dimensional vector spaces. Nevertheless, I suppose in the absence of a pseudo-Riemannian metric one lacks a natural isomorphism between vectors/dual vectors. Does this “bidual” distinction arise in that case as well?

will-burner · 2 years ago

The definition may be simple, but it's not very concrete and I'd argue that makes it not strait forward. While examples of vector spaces can be very concrete (think R, R^2, R^30), I struggle to think of a concrete example of a multilinear function from vectors and dual vectors in V to numbers in K. On top of that when working with tensors, you don't usually use the definition os a multilinear function at least as far as I remember.

cshimmin · 2 years ago

A simple example of a multilinear function is the inner (a.k.a dot) product <a, b>: it takes a vector (b), and a dual vector (a^T), and returns a number. In tensor notation it's typically written δ_ij.

It's multilinear because it's linear in each of its arguments separately: <ca, b> = c<a,b> and <a, cb> = c<a,b>.

Another simple but less obvious example is a rotation (orthogonal) matrix. It takes a vector as an input, and returns a vector. But a vector itself can be thought of as a linear function that takes a dual vector and returns a number (via the inner product, above!). So, applying the rotation matrix to a vector is a sort of "currying" on the multilinear map, while the matrix alone can be considered a function that takes a vector and a dual vector, and returns a number.

In functional notation, you can consider your rotation matrix to be a function (V x V*) -> K, which can in turn be considered a function V -> (V* -> K), where V* is the dual space of V.

adrian_b · 2 years ago

In physics, the first and even now the most important application of multilinear functions, a.k.a. tensors, is in the properties of anisotropic solids.

A solid can be anisotropic, i.e. with properties that depend on the direction, either because it is crystalline or because there are certain external influences, like a force or an electric field or a magnetic field that are applied in a certain direction.

In (linear) anisotropic solids, a vector property that depends on another vector property is no longer collinear with the source, but it has another direction, so the output vector is a bilinear function of the input vector and of the crystal orientation, i.e. it is obtained by the multiplication with a matrix. This happens for various mechanical, optical, electric or magnetic properties.

When there are more complex effects, which connect properties from different domains, like piezoelectricity, which connects electric properties with mechanical properties, then the matrices that describe vector transformations, a.k.a. tensors of the second order, may depend on other such tensors of the second order, so the corresponding dependence is described by a tensor of the fourth order.

So the tensors really appear in physics as multilinear functions, which compute the answers to questions like "if I apply a voltage on the electrodes deposited on a crystal in this positions, which will be the direction and magnitude of the displacements of certain parts of the crystal". While in isotropic media you can have relationships between vectors that are described by scalars and relationships between scalars that are also described by scalars, the corresponding relationships for anisotropic media become much more complicated and the simple scalars are replaced everywhere by tensors of various orders.

What in an isotropic medium is a simple proportionality becomes a multilinear function in an anisotropic medium.

The distinction between vectors and dual vectors appears only when the coordinate system does not use orthogonal axes, which makes all computations much more complicated.

The anisotropic solids have become extremely important in modern technology. All the high-performance semiconductor devices are made with anisotropic semiconductor crystals.

cfgauss2718 · 2 years ago

Here’s maybe a useful example. Consider a scalar potential function F on R^3 that describes some nonlinear spring law. At a point p=(x,y,z), the differential dF can be thought of as a (1,0) tensor measuring the spring force. It acts on a particle at p moving with velocity v to give the instantaneous work of the particle on the spring dF(p)(v). Now, suppose that we want to know how this quantity changes when we vary the x coordinate. The x coordinate is also a function of p, we can represent its differential as dx, which is a co-vector(field). The quantity that captures this change can be thought of as a (1,1) tensor field, which is related to the stiffness of the spring potential in the x direction at each point p. In the usual undergraduate setting, this tensor field is given as the hessian of F, call this H. The action of this tensor looks like the product u^T H(p) v, where in our case, u^T = dx(p) = [1 0 0]. A good giveaway for when a “co-vector” appears in a tensor calculation is whenever there is a “row vector” in a matrix operation (most people identify “column” vectors with proper vectors). It’s helpful in this case that “row” rhymes with “co-“.

klodolph · 2 years ago

I think part of this is “if you have a linear algebra background”. There are a few different explanations of tensors, and different explanations make sense for different people.

tel · 2 years ago

Not really to push back as I do agree that this is a bit trickier to get an intuition for than the OP suggests, but the most trivial concrete example of a (1, 1) tensor would just be the evaluation function (v, f) |-> f(v), which, given a metric, corresponds to the inner product.

ndriscoll · 2 years ago

I think the people who find this definition to be mysterious are really looking for (borrowing from Ravi Vakil[0]) "why is a tensor" rather than "what is a tensor". In that case, a better answer IMO is that it's the "most generic" way to multiply vectors that's compatible with the linear structure: "v times w" is defined to be the symbol "v⊗w". There is no meaning to that symbol.

But these things are vectors, so you could write e.g. v = a⋅x+b⋅y, and then you want e.g. (a⋅x+b⋅y)⊗w = ax⊗w + by⊗w, and so on.

So in some sense, the quotient space construction[1] gives a better "why". It says

* I want to multiply vectors in V and W. So let's just start by writing down that "v times w" is the symbol "v⊗w", and I want to have a vector space, so take the vector space generated by all of these symbols.

* But I also want that (v_1+v_2)⊗w = v_1⊗w + v_2⊗w

* And I also want that v⊗(w_1+w_2) = v⊗w_1 + v⊗w_2

* And I also want that (sv)⊗w = s(v⊗w) = v⊗(sw)

And that's it. However you want to concretely define tensors, they ought to be "a way to multiply vectors that follows those rules". Quotienting is a generic technique to say "start with this object, and add this additional rule while keeping all of the others".

Another way to say this is that the tensor algebra is the "free associative algebra": it's a way to multiply vectors where the only rules you have to reduce expressions are the ones you needed to have.

[0] https://www.youtube.com/live/mqt1f8owKrU?t=500

[1] https://en.wikipedia.org/wiki/Tensor_product#As_a_quotient_s...

senderista · 2 years ago

That abstract approach tends to be how mathematicians view the tensor product (there is also a categorical construction), but I don't find it very helpful for understanding what tensors do, or why they are useful in physics. With the "multilinear map" definition, taking the tensor product T of tensors U and V just means evaluating U and V respectively on the arguments of T and multiplying their outputs. Extend this definition by linearity and you have the tensor product of spaces of tensors.

DemocracyFTW2 · 2 years ago

A monad is just a monoid in the category of endofunctors, what’s the problem?

bjourne · 2 years ago

Well, I can write a definition that is both easier to understand and shorter than yours:

A tensor is a multi-dimensional array.

ithinkso · 2 years ago

This is actually a harmful definition, both (1,1) and (0,2) tensors can be written as a matrix but they are very different. It's like calling vector an array but vectors require vector space and arrays are just arrays. It doesn't help that std::vector is very common in CS but 'pushing back' to a mathematical vector just doesn't make any sense

eigenket · 2 years ago

This only works in finite dimensions, which for mathematicians excludes pretty much all of the interesting cases.

Koshkin · 2 years ago

More simply, a tensor is an element of a tensor product of linear spaces. (And a monad is simply a monoid in the category of endofunctors.)

bratwurst3000 · 2 years ago

you could also call it a matrix to modify a matrix in a vector space. not 100% correct. xD

AnotherGoodName · 2 years ago

For those without a strong math background but more of a programmers background;

You know the matrices you work with in 2D or 3D graphics environments that you can apply to vectors or even other matrices to more easily transform (rotate, translate, scale)?

Well tensors are the generalisation of this concept. If you’ve noticed 2D games transformation matrices seem similar (although much simpler) to 3D games transformation mateices you’ve probably wondered what it’d look like for a 4D spacetime or even more complex scenarios. Well you’ve now started thinking about tensors.

jesuslop · 2 years ago

To add a bit, kudos to root-parent boil-down. Programmers have already the good representation, and call it n-dimensional array, that being a list of lists of lists ... (repeat n times) ... of lists of numbers. The only nuisance is that what programmers call dimension, math people call it rank. It is the sizes of those nested lists what math people call dimensions. It's all set up for a comedy of errors. Also in math the rank is split to make explicit how much of the rank-many arguments are vectors and how many are dual vectors. You'd say something like this is a rank 7 tensor, 3 times covariant (3 vector arguments) and 4 times contravariant (4 dual vector arguments) summing 7 total arguments. I'm assuming a fixed base, so root-parent map determines a number array.

andrewla · 2 years ago

Slight disagree here -- matrices are enough for transformations in 2, 3, 4, and 100 dimensions. Tensors are not arrays with more rows and columns; they are higher dimensional objects -- more indices, not greater range of indices.

mitthrowaway2 · 2 years ago

Is there a difference between a 4x4 matrix and a 4x4 tensor?

paulpauper · 2 years ago

this is still not it. 4d would simply be a 4x4 matrix instead of 3x3

tensors are something which no one has been able to fully or adequately describe. I think you simply have to treat them as a set of operations and not try to map or force them unto existing concepts like linear algebra or matrices. they are similar but otherwise something completely different.

pyinstallwoes · 2 years ago

Is there a similar "gimbal lock" problem that people ran into?

I've always thought the use of "Tensor" in the "TensorFlow" library is a misnomer. I'm not too familiar with ML/theory, is there a deeper geometric meaning to the multi-dimensional array of numbers we are multiplying or is "MatrixFlow" a more appropriate name?

adrian_b · 2 years ago

Since the beginning of computer technology, "array" is the term that has been used for any multi-dimensional array, with "vectors" and "matrices" being special kinds of arrays. An exception was COBOL, which had a completely different terminology in comparison with the other programming languages of that time. Among the long list of differences between COBOL and the rest were e.g. "class" instead of "type" and "table" instead of "array". Some of the COBOL terminology has been inherited by languages like SQL or Simula 67 (hence the use of "class" in OOP languages).

A "tensor", as used in mathematics in physics is not any array, but it is a special kind of array, which is associated with a certain coordinate system and which is transformed by special rules whenever the coordinate system is changed.

The "tensor" in TensorFlow is a fancy name for what should be called just "array". When an array is bidimensional, "matrix" is an appropriate name for it.

twothreeone · 2 years ago

I agree. Just like NumPy's Einsum. "Multi-Array Flow" doesn't sound sexy and associating your project with a renowned physicist's name gives your project that "we solve big science problems" vibe by association. Very pretentious, very predictable, and very cringe.

MathMonkeyMan · 2 years ago

The joke I learned in a Physics course is "a vector is something that transforms like a vector," and "a tensor is something that transforms like a tensor." It's true, though.

The physicist's tensor is a matrix of functions of coordinates that transform in a prescribed way when the coordinates are transformed. It's a particular application of the chain rule from calculus.

I don't know why the word "tensor" is used in other contexts. Google says that the etymology of the word is:

> early 18th century: modern Latin, from Latin tendere ‘to stretch’.

So maybe the different senses of the word share the analogy of scaling matrices.

ogogmad · 2 years ago

The mathematical definition is 99% equivalent to the physical one. I find that the physical one helps to motivate the mathematical one by illustrating the numerical difference between the basis-change transformation for (1,0)- and (0,1)-tensors. The mathematical one is then simpler and more conceptual once you've understood that motivation. The concept of a tensor really belongs to linear algebra, but occurs mostly in differential geometry.

There is still a "1% difference" in meaning though. This difference allows a physicist to say "the Christoffel symbols are not a tensor", while a mathematician would say this is a conflation of terms.

TensorFlow's terminology is based on the rule of thumb that a "vector" is really a 1D array (think column vector), a "matrix" is really a 2D array, and a "tensor" is then an nD array. That's it. This is offensive to physicists especially, but ¯\_(ツ)_/¯

Koshkin · 2 years ago

> something that transforms

Well, they don't, it is their components that do (under a change of the coordinate system).

itishappy · 2 years ago

The tensors in tensorflow are often higher dimensional. Is a 3d block of numbers (say 1920x1080x3) still a matrix? I would argue it's not. Are there transformation rules for matrices?

You're totally correct that the tensors in tensorflow do drop the geometric meaning, but there's precedence there from how CS vs math folk use vectors.

andrewla · 2 years ago

Matrices are strictly two-dimensional arrays (together with some other properties, but for a computer scientist that's it). Tensors are the generalization to higher dimensional arrays.

blt · 2 years ago

There is no geometric meaning. It's a really bad name.

dannymi · 2 years ago

In the first example on https://www.tensorflow.org/api_docs/python/tf/math/multiply you can see that they use the Hadamard product (not the matrix product):

    x = tf.constant(([1, 2, 3, 4]))
    tf.math.multiply(x, x)
    <tf.Tensor: shape=(4,), dtype=..., numpy=array([ 1,  4,  9, 16], dtype=int32)>

I could stop right here since it's a counterexample to x being a matrix (with a matrix product defined on it; P.S. try tf.matmul(x, x)--it will fail; there's no .transpose either). But that's only technically correct :)

So let's look at tensorflow some more:

The tensorflow tensors should transform like vectors would under change of coordinate system.

In order to see that, let's do a change of coordinate system. To summarize the stuff below: If L1 and W12 are indeed tensors, it should be true that A L1 W12 A^-1 = L1 W12.

Try it (in tensorflow) and see whether the new tensor obeys the tensor laws after the transformation. Interpret the changes to the nodes as covariant and the changes to the weights as contravariant:

    import tensorflow as tf
    # Initial outputs of one layer of nodes in your neural network
    L1 = tf.constant([2.5, 4, 1.2], dtype=tf.float32)
    # Our evil transformation matrix (coordinate system change)
    A = tf.constant([[2, 0, 0], [0, 1, 0], [0, 0, 0.2]], dtype=tf.float32)
    # Weights (no particular values; "random")
    W12 = tf.constant(
        [[-1, 0.4, 1.5],
         [0.8, 0.5, 0.75],
         [0.2, -0.3, 1]], dtype=tf.float32
    )
    # Covariant tensor nature; varying with the nodes
    L1_covariant = tf.matmul(A, tf.reshape(L1, [3, 1]))
    A_inverse = tf.linalg.inv(A)
    # Contravariant tensor nature; varying against the nodes
    W12_contravariant = tf.matmul(W12, A_inverse)
    # Now derive the inputs for the next layer using the transformed node outputs and weights
    L2 = tf.matmul(W12_contravariant, L1_covariant)
    # Compare to the direct way
    L2s = tf.matmul(W12, tf.reshape(L1, [3, 1]))
    #assert L2 == L2s

A tensor (like a vector) is actually a very low-level object from the standpoint of linear algebra. It's not hard at all to make something a tensor. Think of it like geometric "assembly language".

In comparison, a matrix is rank 2 (and not all matrices represent tensors). That's it. No rank 3, rank 4, rank 1 (!!). So what does a matrix help you, really?

If you mean that the operations in tensorflow (and numpy before it) aren't beautiful or natural, I agree. It still works, though. If you want to stick to ascii and have no indices on names, you can't do much better (otherwise, use Cadabra[1]--which is great). For example, it was really difficult to write the stuff above without using indices and it's really not beautiful this way :(

More detail on https://medium.com/@quantumsteinke/whats-the-difference-betw...

See also http://singhal.info/ieee2001.pdf for a primer on information science, including its references, for vector spaces with an inner product that are usually used in ML. The latter are definitely geometry.

[1] https://cadabra.science/ (also in mogan or texmacs) - Einstein field equations also work there and are beautiful

andrewla · 2 years ago

In TensorFlow the tf.matmul function or the @ operator perform matrix multiplication. Element-wise multiplication ends up being useful for a lot of paralellizable computation but should not be confused with matrix multiplication.