16. Direct methods for solving linear systems#
Saleh Rezaeiravesh, saleh.rezaeiravesh@manchester.ac.uk
Department of Mechanical and Aerospace Engineering, The University of Manchester, Manchester, UK
16.1. Intended Learning Outcomes#
After reading and running this notebook, the students should be able to:
Use
numpybuilt-in functions to do matrix algebra required for direct solution of linear systems.Develop Python functions/scripts based on different direct methods for solving linear systems.
16.2. Import libraries#
In this notebook, we rely on the methods provided by numpy and numpy.linalg. However, for the LU decomposition, we require scipy.linalg.
import numpy as np
import numpy.linalg as la
import scipy.linalg as sc
16.3. System of linear equations#
Consider the following system of linear equations,
where,
\(a_{ij}\): known coefficients, \(i,j=1,2,\cdots,n\),
\(b_i\): right-hand side values, \(i=1,2,\ldots,n\)
\(x_j\): unknown solutions, \(j=1,2,\ldots,n\).
We can rewrite this system in a matrix form:
where,
\(\mathbf{A}\) is an \(n\times n\) matrix (coefficient matrix),
\(\mathbf{x}\) is an \(n\times 1\) matrix (solution vector),
\(\mathbf{b}\) is an \(n\times 1\) matrix (right-hand-side (RHS) vector).
In the following sections, we demonstrate how to use built-in functions and write your own scripts to solve linear systems adopting direct methods. For the iterative methods, refer to this notebook.
16.4. Matrix inversion method#
If the matrix \(\mathbf{A}\) is invertible, i.e. \(\mathbf{A}^{-1}\) exists, then the above linear system can be directly solved:
This approach is usually cost-effective only for small systems (i.e. low \(n\) values). As the number of unknown increases, the computational cost of matrix inversion quickly increases, urging for adoptive more efficient methods, particularly the iterative ones.
To apply this method in Python using the numpy built-in functions, we have two options:
Compute \(\mathbf{A}^{-1}\) by
numpy.linalg(A)and multiply it by \(\mathbf{b}\).Directly use
x = numpy.linalg.solve(A,b). In this case, \(\mathbf{A}\) must be a full rank matrix, meaning that all its rows (equivalently, all its columns) must be linearly independent.
Example: Solve the linear system with the following \(\mathbf{A}\) and \(\mathbf{b}\) using the direct matrix inversin method.
A = np.asarray([[4,2,-0.5,2],
[2,3,1,0],
[0,-1,2,0.5],
[1,0,1,5]])
b = np.array([21,69,34,22])
We could directly compute \(\mathbf{x}\) from \(\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}\):
x1 = la.inv(A) @ b
print(x1)
[ 1.74229692 13.80672269 24.0952381 -0.767507 ]
Let’s use x = numpy.linalg.solve(A,b):
x2 = la.solve(A,b)
print(x2)
[ 1.74229692 13.80672269 24.0952381 -0.767507 ]
Clearly, the two functions return exactly the same solution vector of size 4. Note that, before using solve, it is better to check if the matrix \(\mathbf{A}\) is full rank. This can be done by numpy.linalg.matrix_rank(A):
print(la.matrix_rank(A))
4
Exercise: Do an experimentation and measure the computational time required to solve the linear system (1) by the direct matrix inversion method for different values of \(n\), say \(n=4, 6, 10, 20, 50, 100\). Plot of the required computational time versus \(n\). What can you conclude about how the cost of the direct inversion method increases with \(n\)?
16.5. Cramer’s rule#
Consider the linear system (1). The Cramer’s rule to find the solutions of this system reads as,
where,
\(D=\det(\mathbf{A})\),
\(D_i\) is the determinent of \(\mathbf{A}\) where its \(i\)-th column is replaced by \(\mathbf{b}\) (RHS vector).
Applicability of the Cramer’s rule:
if \(D\neq 0\), then the system is consistent.
if \(D=0\),
if \(D_i\neq 0\) for at least one \(i=1,2,\cdots,n\), then system is inconsistent and has no solution.
if \(D_i=0\) for all \(i=1,2,\cdots,n\), then the Cramer’s rule is not applicable.
Implementation of the Cramer’s rule in a Python function is straightforward (just follow the above formula). Below, we have done this considering the above points on the applicability of the Cramer’s rule:
def cramer(A,b):
"""
Find all slution of Ax = b using Cramer's rule.
"""
D = la.det(A)
n = A.shape[0]
x = np.zeros(n)
for i in range(n):
#form a copy of A with its i-th column replaced by b
Ai = np.copy(A)
Ai[:,i] = b #replace i-th column of A by b
Di = la.det(Ai) #||Ai||
if D == 0:
if Di == 0:
raise ValueError("Cramer's rule not applicable as det(A)=Di=0!")
else:
raise ValueError("System is inconsistent with no solution, det(A)=0, Di non-zero!")
else:
x[i] = Di/D
return x
Example: Apply the Cramer’s rule to find the solution of the system defined in the previous example:
xc = cramer(A,b)
print(xc)
[ 1.74229692 13.80672269 24.0952381 -0.767507 ]
Clearly, the solution is the same as what we got from the direct matrix inversion method in the previous exmple.
16.6. Gaussian elimination method#
The aim of the Gaussian eliminatin method is to reduce a linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\) to a new system \(\mathbf{U}\mathbf{x}=\mathbf{c}\) where the coefficient matrix \(\mathbf{U}\) is upper-triangular:
The conversion is done by equivalent transformation through the following operations:
Multiplication of an equation (row) by a scalar and adding the result to another equation (row).
Swapping columns of the coefficient matrix.
These operations are also called row reduction which is applied in the following Python function.
def rowReduction(A,b):
"""
Row reduction applied to Aaug=[A|b] in Gaussian elimination to make A triangular.
The original linear system is Ax=b.
Args:
`A`: numpy array of shape (n,n), coefficient matrix
`b`: numpy array of shape (n,), right-hand side vector
Returns:
`Aaug`: numpy array of shape (n,n+1), reduced [A|b]
"""
n = A.shape[0]
Aaug = np.hstack((A,b[:,None])) #stacking b to the end of matrix A to get a n x (n+1) matrix
for i in range(n): #loop over columns
for j in range(i+1,n): #loop over rows
m = -Aaug[j,i] / Aaug[i,i]
Aaug[j,i:] += m * Aaug[i,i:]
return Aaug
Let’s apply it to the following 4 x 4 system:
A = np.asarray([[4,2,-0.5,2],
[2,3,1,0],
[0,-1,2,0.5],
[1,0,1,5]])
b = np.array([21,69,34,22])
Aaug = rowReduction(A,b)
print(Aaug)
print(Aaug.shape)
[[ 4.00000000e+00 2.00000000e+00 -5.00000000e-01 2.00000000e+00
2.10000000e+01]
[ 0.00000000e+00 2.00000000e+00 1.25000000e+00 -1.00000000e+00
5.85000000e+01]
[ 0.00000000e+00 0.00000000e+00 2.62500000e+00 0.00000000e+00
6.32500000e+01]
[ 0.00000000e+00 0.00000000e+00 -2.22044605e-16 4.25000000e+00
-3.26190476e+00]]
(4, 5)
The resulting Aaug matrix is \(4 \times 5\) which after excluding its last column, gives us the upper-triangular matrix \(\mathbf{U}\).
After obtaining the new system \(\mathbf{Ux}=\mathbf{c}\), it can be solved through backward substitution:
The following function implements this backward substitution to find the solution \(\mathbf{x}\) of the original linear system \(\mathbf{Ax}=\mathbf{b}\):
def backSub(Aaug):
"""
Back substitution using Aaug=[U|c], where U is upper-triangular coefficient matrix
and c the RHS vector.
Args:
`Aaug`: numpy array of shape (n,n+1), reduced [A|b]
Returns:
`x`: numpy array of shape (n,), solution vector
"""
U=Aaug[:,:-1] #upper-triangular part of Aaug
c=Aaug[:,-1] #RHS vector
n=len(c)
x = np.zeros(n);
x[-1] = c[-1]/U[-1,-1] #x_n
for i in range(n-2,-1,-1): #backward loop to find x_{n-1}, ..., x_1
s = 0.
for j in range(i+1,n):
s += U[i,j]*x[j]
x[i] = (c[i] - s)/U[i,i]
return x
We can apply this to the above Aaug to find the solution vector \(\mathbf{x}\). We validate the results against the direct matrix inversion method:
xGE = backSub(Aaug)
print('solution by Gaussian Elimination method:',xGE)
x = la.solve(A,b)
print('solution by the direct inversion method:',x)
solution by Gaussian Elimination method: [ 1.74229692 13.80672269 24.0952381 -0.767507 ]
solution by the direct inversion method: [ 1.74229692 13.80672269 24.0952381 -0.767507 ]
16.7. LU decomposition method#
As explained in this notebook, an \(n\times n\) matrix \(\mathbf{A}\) can be decomposed as,
where \(\mathbf{L}\) and \(\mathbf{U}\) are lower- and upper-triangular \(n\times n\) matrices, respectively, and \(\mathbf{P}\) is an \(n\times n\) permutation matrix.
If \(\mathbf{A}\) is diagonally-dominant, then \(\mathbf{P}=\mathbf{I}\).
If \(\mathbf{A}\) is not diagonally-dominant, then \(\mathbf{P}\neq\mathbf{I}\). In this case, \(\mathbf{P}\) specified how the rows of the original linear system are switched to make the matrix \(\mathbf{A}\) diagonally dominant.
After \(\mathbf{A}\) is decomposed by the LU decomposition method, the linear system (1) can be re-written as two linear systems of the same size:
The solution of these two linear systems can be done by:
Forward substitution to solve \(\mathbf{L} \mathbf{y} = \mathbf{b}' \):
Backward substitution to solve \(\mathbf{U} \mathbf{x} = \mathbf{y}\):
The LU decomposition of \(\mathbf{A}\) and the above steps 1 and 2 can be written in a Python function to create a LU-decomposition solver for the linear system (1):
def LUmethod(A,b):
"""
Solution of Ax=b using the LU decomposition method
"""
n = A.shape[0]
#LU decomposition of A
P, L, U = sc.lu(A) #LU factorisation of A
b = la.inv(P) @ b #b' = P^{-1} b
#solving Ly=b' by forward substitution
y = np.zeros(n)
y[0] = b[0]/L[0,0]
for i in range(1,n):
s = 0.
for j in range(i):
s += L[i,j]*y[j]
y[i] = (b[i] - s)/L[i,i]
#solving Ux=y by backward substitution
x = np.zeros(n)
x[-1] = y[-1]/U[-1,-1]
for i in range(n-2,-1,-1):
s = 0.
for j in range(i+1,n):
s += U[i,j]*x[j]
x[i] = (y[i] - s)/U[i,i]
return x
Example: The LU method can be applied to a linear system with a diagonally-dominant \(\mathbf{A}\):
A = np.asarray([[4,2,-0.5,2],
[2,3,1,0],
[0,-1,2,0.5],
[1,0,1,5]])
b = np.array([21,69,34,22])
xlu = LUmethod(A,b)
print('solution by the LU decomp. method:',xlu)
x = la.solve(A,b)
print('solution by the direct inversion method:',x)
solution by the LU decomp. method: [ 1.74229692 13.80672269 24.0952381 -0.767507 ]
solution by the direct inversion method: [ 1.74229692 13.80672269 24.0952381 -0.767507 ]
Example: The LU method can be applied to a linear system with an \(\mathbf{A}\) that is not diagonally dominant:
A = np.array([[1.,-5.,6.],[9.,4.,7.],[-2.,5.,-1.]])
b = np.array([2.,4., -1.])
xlu = LUmethod(A,b)
print('solution by the LU decomp. method:',xlu)
x = la.solve(A,b)
print('solution by the direct inversion method:',x)
solution by the LU decomp. method: [ 0.26644737 -0.04276316 0.25328947]
solution by the direct inversion method: [ 0.26644737 -0.04276316 0.25328947]
16.8. Cholesky decomposition method#
As explained in the previous section, if \(\mathbf{A}\) is square positive definite (all eigenvalues are positive) and,
Hermitian (if some elements are complex numbers), or
symmetric (if all elements are real)
then, it can be decomposed as,
where \(\mathbf{L}\) is a lower triangular matrix of the same size as \(\mathbf{A}\). This is called Cholesky decomposition of \(\mathbf{A}\).
If the Cholesky decomposition is possible for \(\mathbf{A}\), then the linear system (1) can be rewitten as,
This results in two linear system of the same size as the original system:
where \(\mathbf{U}=\mathbf{L}^T\). The two linear systems (3) can be solved using the forward and backward substitution, exactly similar to (2).
def Choleskymethod(A,b):
"""
Solution of Ax=b using the Cholesky decomposition method
"""
n = A.shape[0]
#Cholesky decomposition of A
L = la.cholesky(A) #L: lower triangular matrix, A=LL^T
U = L.T
#solving Ly=b' by forward substitution
y = np.zeros(n)
y[0] = b[0]/L[0,0]
for i in range(1,n):
s = 0.
for j in range(i):
s += L[i,j]*y[j]
y[i] = (b[i] - s)/L[i,i]
#solving Ux=y by backward substitution
x = np.zeros(n)
x[-1] = y[-1]/U[-1,-1]
for i in range(n-2,-1,-1):
s = 0.
for j in range(i+1,n):
s += U[i,j]*x[j]
x[i] = (y[i] - s)/U[i,i]
return x
Let’s validate our function:
A = np.array([[4,-12,16],[-12,37,-43],[16,-43,98]])
b = np.array([-1,2,5])
xCh = Choleskymethod(A,b)
print('solution by the Cholesky decomp. method:',xCh)
x = la.solve(A,b)
print('solution by the direct inversion method:',x)
solution by the Cholesky decomp. method: [-32.80555556 -8.77777778 1.55555556]
solution by the direct inversion method: [-32.80555556 -8.77777778 1.55555556]