/ home / articles / matrices Matrices for Dummies
Introduction
Matrices are very powerful mathematical tools
applicable in a wide range of real world problems.
Intuitively, the main advantage in using matrices is that
they allow the logical manipulation of very large sets of
numbers at once. More rigorously, they allow the manipulation
(and solution) of linear equation systems which
in turn reppresent real world problems.
Just as example, the voltages and currents
in an electronic board with 1000 components can be described
by a very huge system of linear equations. 1000 components
imply 1000 unknown voltages and 1000 unknown currents. Attempting
to find such a large number of unknown variables without a method
is substantially a suicide. With the use of matrices such a system
can be solved in a very elegant way.
Today, matrices are used not simply for solving systems of simultaneous linear equations,
but also for describing the quantum mechanics of atomic structure, designing computer
game graphics, analyzing relationships, and even plotting complicated dance steps :)
In this article we try to develop our mathematical tool step by step.
Definition
A matrix is an ordered, bidimensional collection
of mathematical expressions usually rapresented as a rectangular table.
The horizontal lines in a matrix are called rows and the vertical
lines are called columns. A matrix with m rows and
n columns is called an m-by-n matrix (written mxn)
and m and n are called its dimensions. The dimensions of a matrix
are always given with the number of rows first, then the number of columns.
If the number of rows of a matrix equals the number of columns (m = n) then
the matrix is said to be square otherwise it's just rectangular.
Square matrixes have several interesting properties that we'll talk about later.
The entry of a matrix A that lies in the i-th row and the j-th column is called
the i,j entry or (i,j)-th entry of A. This is written as ai,j,
aij or A[i,j]. The row is always noted first, then the column.
A special matrix with one of the dimensions set to 1 is called vector.
A 1xn matrix is called row vector
while a mx1 matrix is called column vector.
If the entries of a matrix are all real numbers then the matrix is said
to be real. If the entries are complex numbers then
the matrix is too said to be complex. If the entries
are polynomials then (guess what?) the matrix is said to be polynomial too.
The entries of a matrix usually have some associated meaning but for now
let's just say they are mathematical expressions (maybe numbers) and
concentrate on matrix manipulation.
Let's play with it
We define the matrix sum as an operation that given
two mxn matrices A,B returns a mxn matrix C with entries that are sums
of the corresponding entries in A and B. Please note that the sum
is defined only for matrices of exactly the same dimensions: we say
that such matrices are sum-compatible.
For sum-compatible matrices it's obvious that
and
We define the scalar multiplication as an operation
that given a mxn matrix A and a scalar expression K returns a mxn matrix B
with each entry made of the corresponding entry of A multiplied by K.
It's again obvious that for sum-compatible matrices A,B and any scalar expression k
and for any matrix A and any couple of scalar expressions
k1, k2
Food for thoughs: Multiplication by scalar is commutative
if the underlying ring (of expressions) is commutative. This is true when the expressions
are (real or complex) numbers or polynomials, that is most real-world cases in that matrices
are applicable. However, the matrix algebra can be applied also to non commutative rings
(for example quaternions) where the multiplication by scalar must be splitted in two
different operations: left multiplication and right multiplication.
Not that obvious
We define the matrix multiplication as an operation
that given a mxp matrix A and a pxn matrix B
returns a mxn matrix C with element i,j computed
as the scalar vector product of the i-th row of A and the j-th column
of B.
Note that the matrix multiplication is well defined only for
couples in that the left matrix has the number of columns equal to the
number of rows of the right matrix. We say such two matrices
to be multiplication-compatible.
Food for thoughs: the multiplication
of two nxn matrices processes 2 n2
entries. However there is no known algorithm with computational
cost of O(n2). Most algorithm
are O(n3) and the most clever
implementations are O(n2.8).
An O(n2.376) algorithm has been proposed
by Coppersmith and Winograd but its implicit factor hidden by the O() notation
is so big that its implementation is worthwile only if we're going to
multiply matrices with n that is out of our current computing possibilities.
It's very easy to show that (and here comes the non obvious) the matrix
multiplication is generally not commutative, that is
except for very few special cases. The (square) matrices for that
are said to commute and must satisfy strict rules
on their elements.
The non commutativity of the matrix multiplication makes
the algebraic manipulation to become non trivial and causes
infinite headcaches to engineering students.
However, we're lucky since the associative and distributive properties
still apply and it can be proven that the following equations are all true
(given that the matrices involved are multiplication-compatible and
the underlying ring is commutative).
Transpose
We define the transpose of a mxn matrix A
as a nxm matrix B obtained from A by swapping rows with columns.
The transpose of a matrix A is often written as AT
or as A'.
Note that swapping rows with means effectively swapping
the order of indices of each element. The element aij
of the matrix A becomes the element aji of
the transpose.
Food for thoughs: This property is interesting in computer
matrix processing. To apply an algorithm to the transpose of a matrix instead of the original
one we can simply swap the parameters of all the matrix element access functions...
A matrix whose transpose is equal to itself is called a symmetric matrix;
that is, A is symmetric if AT = A. Note that A must be
square to be symmetric and internally the elements must satisfy the relation
aij = aji.
It's easy to show that
for any matrix A, thus the transposition is a self-inverse operation.
Also for two matrices with the same dimensions
If the matrices A and B are multiplication-compatible then
Note that the order of multiplication is inverted.
And finally taking the transpose of a scalar (1x1 matrix) is a null operation
The identity
A particular square matrix that commutes with all other
matrices of the same size is the identity matrix.
The identity matrix has all unit elements on
its main diagonal.
It's easy to prove that
and thus the identity matrix is the "unity" element of the matrix algebra
and the multiplication by the identity matrix is an idempotent operation.
Obviously the transpose of an identity matrix is still an identity matrix.
The inverse
Given a square matrix A we define the inverse matrix of A
as the matrix that when multiplied by A gives the identity matrix as result.
The inverse matrix is usually written as A-1.
The inverse matrix does not necessairly exist. A matrix that has no inverse
is said to be non invertible or singular.
Note that A and its inverse (when it exists) do commute.
Food for thoughs: for non square matrices
we can define the left (A-1A=I) and the
right inverse (AA-1)=I. Such inverses
have few real world applications...
It can be shown that the inverse of a matrix is again invertible and that
for any invertible matrix A and that
for any invertible matrix A and any non null scalar k.
It can be also proven that
for invertible matrices A and B of the same size. Note that the order
of factors is inverted and the formula is very similar to the one
that involves transposition.
Finding the inverse of a matrix is a very common highly intensive computational task.
There are several algorithms that implement this operation and many of
them operate better on matrices with elements that satisfy certain properties
or conformations.
Matrices and systems of linear equations
A very large application of matrices is in solving
systems of linear equations.
The above system of m linear equations in n unknown variables can be rewritten
by using matrices as follows
The rows of the A matrix on the left contain the coefficients
of the linear equations of the system. The unknown variables and
the known terms are collected in row vectors x and b.
Solving the system means finding the set of values x1,x2,...,xn
that satisfy all the equations at the same time.
In most cases (if the field of the matrix elements is
infinite) exactly one of the following statements is true:
- the system is undetermined (the set of solutions is empty)
- the system is overdetermined (the set of solutions contains infinitely many elements)
- the system is exactly determined (the set of solutions contains contain exactly one element)
The most interesting case is obviously the one with exactly one solution.
The Rouche'-Capelli theorem states that for a system of linear equations with
n unknown variables to be exactly determined there must be at
least n linearly independent equations.
Since more equations are unnecessary then the most interesting
linear systems are described by a square coefficient matrix.
By using trivial manipulations we can now show that by inverting
the matrix of coefficients we can solve the system.
We multiply boths sides of the system equation by the inverse of A
Since by definition
so the formula above can be rewritten as
and since
then trivially
The last formula is very important since it provides the
real motivation for studying the inverse matrices. If we're
able to find the inverse of A, we can solve the system.
Determinants
The definition of the matrix determinant is quite scary.
Given a square nxn matrix A, it's determinant is defined as
where E(n) is the set of permutations
of the numbers {1,2,....n}, P is a single permutation taken out of that set and sgn(P)
is a function that returns +1 if the permutation P is
even
and -1 if the permutation P is odd.
P(i), then, is the i-th element of the permutation P.
The formula above is also known as Liebniz formula.
and is better explained with an example. Consider the following 2x2 matrix.
In this case n is 2 and thus E(n) is the set of the possible
permutations of numbers {1,2}. E(n) obviously contains only
two elements: the trivial "null" permutation {1,2} (which is even)
and the permutation {2,1} (which is odd).
The Liebniz formula expands then to a sum of two elements
dictated by the two permutations just described.
Each element of the sum is a product of n elements
of the matrix taken from distinct columns. The row of each element
is choosen by the permutation.
In the first product we move along the columns (indexed by i)
from left to right and take the first and the second row (null
permutation). This means a11 and a22.
Since the permutation used is even, then the sign of this product is positive.
In the second product we move along the columns
from left to right and take the second and then the first row
(the odd permutation). This means a21 and a12.
Since the permutation is odd, then the sign of this product is negative.
The determinant of the 2x2 matrix is then
This is rather easy to remember if you note that it's the
sum of the products of the two diagonals of the 2x2 matrix.
The product is positive if the diagonal goes "up" from left
to right while it's negative if the diagonal goes "down"
from left to right.
The formula for 3x3 matrices is more complex
and it contains 6 elements.
If you look close you can still spot the same pattern.
There are positive diagonal lines that go down from left
to right and negative diagonal lines that go up
from left to right.
The formula for 4x4 matrices becomes really complex
and is rarely written explicitly. For larger n values
the formula becomes very difficult to use because
it's hard to enumerate all the permutations of n numbers.
Other methods for finding the determinant exist
and later we'll probably spot some of them
but now let's look at some of the determinant properties.
It can be shown that for square matrices A and B, any scalar r
and the square unit matrix I the following properties are true
Invertible matrices and their determinants
Since for an invertible matrix A
then by property 4 of the previous paragraph
and since
then
which implies that
and that
Conversely, it can be shown that a non-zero determinant
for matrix A implies that the inverse of A exists.
The proof requires the notion of the rank of a matrix
which we haven't covered so I'm going to omit it here.
However, the two deductions lead us to the following theorem:
"A matrix is singular if and only if its determinant is zero"
or alternatively
"A matrix is invertible if and only if its determinant is non zero"
Food for thoughs:
As a rule of thumb, almost all square matrices are invertible. Over
the field of real numbers, this can be made precise as follows: the set
of singular n-by-n matrices, is a null set, i.e., has Lebesgue measure zero.
Intuitively, this means that if you pick a random square matrix over
the reals, the probability that it will be singular is zero.
In practice however, one may encounter non-invertible matrices. And in numerical
calculations, matrices which are invertible, but close to a non-invertible
matrix, can still be problematic; such matrices are said to be ill conditioned.
|