Algèbre linéaire

I love minutia because they—like Leibniz's monads—so often reflect important features of the larger world around them. This is particularly true in mathematics where minor details of definitions can reflect years of collective thought and refinement by generations of mathematicians, and where changes to those details can have major effects on the resulting theory. In this post, let's examine some of the minutia we encounter in basic linear algebra.

Zero

I'll assume you already know what a vector space is. So you know that a vector space always has a zero vector—in particular, there's no such thing as an empty vector space. In fact, a zero vector is all we need to form a vector space, namely the zero space: \[\{0\}\] Look how beautiful it is. It beckons you to diligently verify that all the vector space axioms hold for it. The zero space is also the smallest subspace of any vector space. Wonderful.

It might seem silly or useless, but the zero space is actually very useful in linear algebra—just like the number zero is useful in arithmetic. It took a long time in the history of mathematics for the number zero to be fully recognized and accepted, but we're way past all that now. So we shouldn't fear or avoid the zero vector space either—we should warmly embrace it.

The zero space has some unique properties. Apart from being the smallest vector space, it's the only space on which the zero transformation is injective, for example.

Span

If \(V\) is a vector space over a field \(F\) and \(S\) is a subset of \(V\), recall that the span of \(S\) in \(V\) is the smallest subspace of \(V\) containing \(S\)—namely, the intersection of all the subspaces of \(V\) containing \(S\). Equivalently, it's the set of all (finite) linear combinations of elements of \(S\) with coefficients in \(F\)—these things: \[\sum_{i=1}^n\alpha_is_i=\alpha_1s_1+\cdots+\alpha_ns_n\qquad(\alpha_i\in F,s_i\in S)\] There's some beautiful minutia to behold right here: the first definition of span is "top down" in that it describes the span of \(S\) as a big intersection. This type of idea goes back to Gottlob Frege with his definition of the natural numbers—in modern language, the set of natural numbers is the smallest set containing the number \(0\) and closed under the successor operation \(n\mapsto n+1\), which is just the intersection of all such sets, given that at least one such set exists. By contrast, the second definition of span is "bottom up" in that it explicitly constructs the objects in the span (linear combinations) from primitive objects (scalars, vectors). These two definitions are equivalent, which is glorious.

Now, we recall that the empty set is a thing. It's the unique set having no elements, and it's denoted by \(\{\}\) or \(\varnothing\). It's a subset of every set. Like zero, it shouldn't be feared. What's the span of the empty set of vectors in \(V\)? At first blush we might think the span is also empty—no vectors, nothing spanned, right? Wrong! The span is by definition the smallest subspace of \(V\) containing the empty set, which is just the smallest subspace of \(V\), which is just our old friend the zero space. Neat! By the above, this means that the zero vector can be written as a linear combination of vectors from the empty set. How's that possible? Like this: \[0=\] The thing on the right is the empty linear combination. This also might seem silly but it's not. In fact, in any additive monoid \((M,+,0)\)—a set \(M\) together with an associative binary operation \(+\) and zero element \(0\), like the vectors in a vector space—we can unambiguously define the sum of any finite (ordered) sequence of elements, and it's a standard convention that the empty sum is taken to be the zero element \(0\). This convention is natural in the context of the monoid because adding nothing to something is the same as adding zero to it. Likewise in a multiplicative monoid \((N,\cdot,1)\) the empty product is taken to be the identity element \(1\). The important thing is that there's no logical contradiction or inconsistency here, so we don't need to be afraid of anything or forbid the empty set of vectors when taking a span in a vector space.

Independence

I'll assume you've encountered the concept of linear independence before. If the vectors \(v_1,\ldots,v_n\in V\) are linearly independent, what does the property of linear independence actually apply to? Is each individual vector \(v_i\) itself linearly independent? No. Each vector is independent of the other vectors, so the property must apply to the plurality somehow. Here's how some books try to capture this:

Definition ( 🤔 ): The set \(\{v_1,\ldots,v_n\}\) is linearly independent if whenever \[\alpha_1v_1+\cdots+\alpha_nv_n=0\] for \(\alpha_1,\ldots,\alpha_n\in F\), then \(\alpha_1=\cdots=\alpha_n=0\).

On this definition, linear independence applies to the set of vectors, so we must interpret the condition as holding for any enumeration \(v_1,\ldots,v_n\) of the vectors in the set. But what happens if we take \(v_1=v_2=v\) for some fixed vector \(v\ne 0\) and consider the set \(\{v_1,v_2\}\), for example? This set is not linearly independent on this definition because \[1\cdot v_1+(-1)\cdot v_2=v-v=0\] but \(\pm 1\ne 0\). On the other hand, \(\{v_1,v_2\}=\{v\}\), so the set consists of a single nonzero vector, and intuitively such a singleton set should be linearly independent.

This definition only works if we assume that the vectors \(v_1,\ldots,v_n\) are distinct. Some authors miss this. Some explicitly write it in the definition. Others might say "Oh, whenever we write a set like \(\{v_1,\ldots,v_n\}\) we implicitly assume that \(v_1,\ldots,v_n\) are distinct."—but that's a gross convention for sets and those people should be shamed. We can instead just bite the bullet and work with multisets, or more commonly lists:

Definition: The list \((v_1,\ldots,v_n)\) is linearly independent if whenever \[\alpha_1v_1+\cdots+\alpha_nv_n=0\] for \(\alpha_1,\ldots,\alpha_n\in F\), then \(\alpha_1=\cdots=\alpha_n=0\).

Notice \((v,v)\ne(v)\), and \((v,v)\) is never linearly independent, while \((v)\) is linearly independent if and only if \(v\ne 0\). No need to fuss over distinctness here. Lists also have the advantage that the ordering of vectors is often important in linear algebra. What exactly is a list like \((v_1,\ldots,v_n)\), as an object? It depends on your foundations. You could take it as a primitive. You could view it as an element of the cartesian product set \(V^n=V\times\cdots\times V\) (\(n\) factors). Alternatively, you could view it as a function \(f:\{1,\ldots,n\}\to V\) where \(f(i)=v_i\) for \(1\le i\le n\). Importantly, there's an empty list, which is denoted \(()\).

The empty set—or the empty list, if you swing that way—is linearly independent. Verifying this from the definition is an amusing exercise in vacuous implication. Recall that the only linear combination of the empty set of vectors is the empty linear combination, which has no coefficients and is equal to zero. Since there aren't any coefficients, it's vacuously true that they're all equal to zero. (If you doubt this, try to exhibit a nonzero one.) Therefore we have linear independence of the empty set. Again, the important thing is that there's no logical inconsistency here.

Recalling that a basis is a linearly independent and spanning set—or equivalently a minimal spanning set, or a maximal linearly independent set—and that dimension is the size of any basis, we have a happy little theorem:

Theorem: The empty set is the unique basis for the zero vector space. The dimension of the zero vector space is zero.

Can you feel the pleasure of the minutia yet? Can you feel it?! This theorem is a minor piece of the following major theorem of linear algebra, which involves the use of possibly infinite sets or lists for bases:

Theorem: Every vector space has a basis.

Note well: it's every vector space. Every. Fucking. One. We harbor no irrational fear of zero in the finite-dimensional case, and no fear of the axiom of choice in the infinite-dimensional case. No vector space is our enemy, no matter how big or how small.

Eigenvalues and eigenvectors

There's lots of minutia around eigenvalues and eigenvectors that can cause trouble. Recall that an eigenvalue of a linear transformation \(T:V\to V\) is a scalar \(\lambda\in F\) with an eigenvector \(v\in V\) for which \(Tv=\lambda v\). Right? Well, maybe. First, remember we always have \[T0=0=\lambda\cdot 0\] for \(\lambda\in F\), so we need some sort of restriction somewhere to prevent every scalar from being an eigenvalue.

One approach is to require the existence of some \(v\ne 0\) in the definition above for \(\lambda\) to be considered an eigenvalue, then take all the vectors in the eigenspace \[E_{\lambda}=\{\,v\in V\mid Tv=\lambda v\,\}\] to be the eigenvectors corresponding to \(\lambda\). In this case, if \(\lambda\) is an eigenvalue, then the zero vector is an eigenvector corresponding to \(\lambda\), so we can truthfully say that the eigenvectors corresponding to \(\lambda\) form a subspace—namely \(E_{\lambda}\). This approach isn't very popular, probably because it's confusing—the zero vector can be an eigenvector but can't be the only eigenvector for an eigenvalue—and you frequently need to exclude the zero eigenvector in theorems. Another more popular approach is to require \(v\ne 0\) always in the definition above, so eigenvectors are nonzero. In this case the eigenspace is no longer the set of eigenvectors, but is the set of eigenvectors together with the zero vector. Oh well.

Can the zero scalar be an eigenvalue? Not on the zero space because there are no nonzero vectors there, but otherwise absolutely—for example it's an eigenvalue of the zero transformation on a nonzero space, and more generally of any transformation which isn't injective. In the finite-dimensional case, the eigenvalues of \(T\) are the scalars \(\lambda\in F\) satisfying the equation \[\det(\lambda I-T)=0\] where \(I:V\to V\) is the identity transformation. This is true even on the zero space! The determinant on the left defines a polynomial expression in \(\lambda\) with coefficients in \(F\) called the characteristic polynomial of \(T\), denoted \[p_T(\lambda)=\det(\lambda I-T)\] So the eigenvalues of \(T\) are the roots of the characteristic polynomial of \(T\) in \(F\). Careful here! There may be other roots of the characteristic polynomial living in an extension field of \(F\)—for example if \(F=\mathbb{R}\), which is not algebraically closed, there may be other roots in \(\mathbb{C}\). We have adopted a "geometric" definition of eigenvalue by requiring that eigenvalues live in the scalar field \(F\), but some authors adopt an "algebraic" definition in which an eigenvalue is any root of the characteristic polynomial.

Another thing. What exactly is the "polynomial expression" \(p_T(\lambda)\) above? Those are weasel words. Is it a formal polynomial in the ring \(F[\lambda]\)? That would mean \(\lambda\) is an indeterminate, not a scalar in \(F\), and then what is \(\lambda I-T\)? A polynomial whose coefficients are linear transformations, apparently. But what's the determinant of that? If \(I\) and \(T\) were instead matrices in \(F^{n\times n}\), then \(\lambda I-T\) would be a polynomial with matrix coefficients, or equivalently a matrix with polynomial entries—an element of \(F^{n\times n}[\lambda]\cong F[\lambda]^{n\times n}\). We could certainly take the determinant of the latter to obtain a polynomial in \(F[\lambda]\), and some authors do. But now we're venturing into different territory. The ring \(F[\lambda]\) isn't a field, so by working with such matrices we're leaving the familiar land of vector spaces and entering the wild world of modules. Also if we work directly with polynomials in \(F^{n\times n}[\lambda]\), we have to be careful because the coefficients don't commute in general, and substitution isn't a homomorphism in general. Of course we could instead work over the extension field \(F(\lambda)\) of rational functions, but we'd still have to make sense of that and show that we do obtain a polynomial. Maybe we should head back home?

We could alternatively take the "expression" \(p_T(\lambda)\) to be a polynomial function—that is, the function mapping each scalar \(\lambda\in F\) to the scalar \(\det(\lambda I-T)\in F\), which is induced by a formal polynomial. This is fine provided that the field \(F\) is infinite—like \(\mathbb{R}\) or \(\mathbb{C}\), or any field of characteristic zero—but is problematic for finite fields because a formal polynomial is not uniquely determined by a polynomial function over a finite field—for example over \(\Z_2\), both the zero polynomial and the nonzero polynomial \(\lambda^2+\lambda\) induce the zero function since \(0^2+0=0\) and \(1^2+1=0\). We need a formal polynomial to talk about certain formal algebraic properties which are important in linear algebra. Luckily if you're just using linear algebra to build bridges or something, you'll probably be fine working over \(\mathbb{R}\) or \(\mathbb{C}\), and if you're working with finite fields you'll probably be comfortable with more abstract algebra.

Isometries

Things get even more fun with inner product spaces over \(\mathbb{R}\) and \(\mathbb{C}\). (I'll assume you already know what those are, and we'll restrict attention to finite-dimensional spaces.) Consider for example the notion of an isometry, which is a map that preserves the geometrical structure of a space. Many (most?) linear algebra books define it like this:

Definition: A linear transformation \(T:V\to V\) of an inner product space \(V\) is an isometry if \[(Tv,Tw)=(v,w)\] for all \(v,w\in V\).

Here I'm using parentheses to denote the inner product. This definition just says that an isometry preserves the inner product, and it works for both real and complex spaces. In the real case, such transformations are also confusingly called orthogonal, although the condition is stronger than mere preservation of orthogonality; in the complex case, they're called unitary. Using polarization identities, it's easy to see for linear \(T\) that the condition in the definition is equivalent to \(\|Tv\|=\|v\|\) for all \(v\in V\), where the norm here is that induced by the inner product.

Some books instead define the concept of isometry like this:

Definition ( 🙄 ): A map (not assumed to be linear!) \(T:V\to V\) of an inner product space \(V\) is an isometry if \[\|Tv-Tw\|=\|v-w\|\] for all \(v,w\in V\).

This definition just says that an isometry preserves distance between vectors. Is it equivalent to the first definition? It's easy to see that if \(T\) is linear and preserves the inner product—and so preserves the norm—then it preserves distance: \[\|Tv-Tw\|=\|T(v-w)\|=\|v-w\|\] So if \(T\) satisfies the first definition, then it satisfies the second. But the converse is trivially false—for example any translation by a nonzero vector preserves distance but isn't linear!

What if in the second definition we also require that \(T(0)=0\)? It turns out that in the real case, this is enough to ensure linearity of \(T\), and hence that \(T\) preserves the norm and satisfies the first definition. But this isn't enough for the complex case! For example conjugation in \(\mathbb{C}\) preserves distance and the zero vector, but is conjugate-linear not linear. For this reason the second definition is slightly awkward in the context of linear algebra, and should be reserved for authors who hate their readers—or those with a fetish for affine geometry.

Conclusion

That's enough. There are other interesting minutia in linear algebra—including myriad notational issues—but this post is already long. What are the takeaways here? Attend to minutia! Don't fear zero! Feel the pleasure!