In this topic we compute with the precision matrix for the AR1 model component.
library(INLA)
We model u as an autoregressive
model of order 1 over time t
.
This is described as ut=ρut−1+ϵt where ϵt is iid Gaussian randomness.
The two hyper-parameters for this component are ρ and the marginal standard deviation σu.
The this topic is about is the Q matrix in u∼N(0,Q−1)
Per definition of precision matrix log(π(u))=c−12u⊤Qu.
Rewrite definition of our model component as a function of ϵ. ϵt=ut−ρut−1, giving the joint distribution log(π(u))=c−12(ϵ22+ϵ23+ϵ24+...+ϵ2T). Further, log(π(u))=c−12((u2−ρu1)2+(u3−ρu2)2+(u4−ρu3)2+...+(uT−ρuT−1)2).
Matching this term to the Q matrix, we get u⊤Qu=u22−2ρu2u1+ρ2u21+u23−2ρu3u2+ρ2u2+u24−2ρu4u3+ρ2u3+...
By experimenting a bit with the u⊤Qu (quadratic) form, we see that the only way to make a symmetric Q is by definig the following Q.
N = 10
rho = 0.95
Q = matrix(0, N, N)
diag(Q) = 1+rho^2
for (i in 1:(N-1)) {
Q[i, i+1] = -rho
Q[i+1, i] = -rho
}
print(Q)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1.90 -0.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] -0.95 1.90 -0.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [3,] 0.00 -0.95 1.90 -0.95 0.00 0.00 0.00 0.00 0.00 0.00
## [4,] 0.00 0.00 -0.95 1.90 -0.95 0.00 0.00 0.00 0.00 0.00
## [5,] 0.00 0.00 0.00 -0.95 1.90 -0.95 0.00 0.00 0.00 0.00
## [6,] 0.00 0.00 0.00 0.00 -0.95 1.90 -0.95 0.00 0.00 0.00
## [7,] 0.00 0.00 0.00 0.00 0.00 -0.95 1.90 -0.95 0.00 0.00
## [8,] 0.00 0.00 0.00 0.00 0.00 0.00 -0.95 1.90 -0.95 0.00
## [9,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.95 1.90 -0.95
## [10,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.95 1.90
A confusing part of the AR1 model is that we must define the first
value to have the correct distribution to make the process stationary,
see inla.doc("ar1")
for details. We fix this now:
Q[1,1] = 1
Q[N,N] = 1
For comparison later, we now create a function for Q, summarising our computations so far.
precision.ar1 = function(N, rho){
Q = matrix(0, N, N)
diag(Q) = 1+rho^2
for (i in 1:(N-1)) {
Q[i, i+1] = -rho
Q[i+1, i] = -rho
}
Q[1,1] = 1
Q[N,N] = 1
return(Q)
}
The matrix Q we have created stores all the zeroes! Let us convert this to a sparse matrix and compare them.
Q = precision.ar1(10, 0.9)
print(Q)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1.0 -0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## [2,] -0.9 1.8 -0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## [3,] 0.0 -0.9 1.8 -0.9 0.0 0.0 0.0 0.0 0.0 0.0
## [4,] 0.0 0.0 -0.9 1.8 -0.9 0.0 0.0 0.0 0.0 0.0
## [5,] 0.0 0.0 0.0 -0.9 1.8 -0.9 0.0 0.0 0.0 0.0
## [6,] 0.0 0.0 0.0 0.0 -0.9 1.8 -0.9 0.0 0.0 0.0
## [7,] 0.0 0.0 0.0 0.0 0.0 -0.9 1.8 -0.9 0.0 0.0
## [8,] 0.0 0.0 0.0 0.0 0.0 0.0 -0.9 1.8 -0.9 0.0
## [9,] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.9 1.8 -0.9
## [10,] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.9 1.0
Q = precision.ar1(10, 0.9)
Q = as(Q, "sparseMatrix")
print(Q)
## 10 x 10 sparse Matrix of class "dsCMatrix"
##
## [1,] 1.0 -0.9 . . . . . . . .
## [2,] -0.9 1.8 -0.9 . . . . . . .
## [3,] . -0.9 1.8 -0.9 . . . . . .
## [4,] . . -0.9 1.8 -0.9 . . . . .
## [5,] . . . -0.9 1.8 -0.9 . . . .
## [6,] . . . . -0.9 1.8 -0.9 . . .
## [7,] . . . . . -0.9 1.8 -0.9 . .
## [8,] . . . . . . -0.9 1.8 -0.9 .
## [9,] . . . . . . . -0.9 1.8 -0.9
## [10,] . . . . . . . . -0.9 1.0
The “.” signifies that the zeroes are not stored. How is this done instead? A list of indices (i,j) together with their value xi,j are stored as 3 lists.
str(Q)
## Formal class 'dsCMatrix' [package "Matrix"] with 7 slots
## ..@ i : int [1:19] 0 0 1 1 2 2 3 3 4 4 ...
## ..@ p : int [1:11] 0 1 3 5 7 9 11 13 15 17 ...
## ..@ Dim : int [1:2] 10 10
## ..@ Dimnames:List of 2
## .. ..$ : NULL
## .. ..$ : NULL
## ..@ x : num [1:19] 1 -0.9 1.81 -0.9 1.81 -0.9 1.81 -0.9 1.81 -0.9 ...
## ..@ uplo : chr "U"
## ..@ factors : list()
As you see, this “dgCMatrix” uses p instead of j, storing the entries in a slightly different way.
What is all the fuzz about?
for (N in c(10, 1E2, 1E3, 5E3 )) {
Q = precision.ar1(N, 0.9)
os1 = round(object.size(Q)/1000)
Q = as(Q, "sparseMatrix")
os2 = round(object.size(Q)/1000)
print(paste0("For N is ", N, " we go from ", os1, " kb to ", os2, " kb"))
}
## [1] "For N is 10 we go from 1 kb to 2 kb"
## [1] "For N is 100 we go from 80 kb to 5 kb"
## [1] "For N is 1000 we go from 8000 kb to 30 kb"
## [1] "For N is 5000 we go from 2e+05 kb to 142 kb"
Creating a full matrix and then using as.sparse
is bad
practice. You should create the i, j, x
indices/values and
use ?sparseMatrix
. Your entire inference algorithm, for
large problems, might be quicker than writing down the full matrix even
once.
The basic part of computing with precision matrices is being able to do the cholesky factorisation to get Q=LL⊤.
Q = precision.ar1(1000, 0.99)
L = chol(Q)
print(sum(abs(Q- t(L)%*%L)))
## [1] 0
We know that det(Q)=det(L⊤)det(L)=det(L)2. It is very important to know that we never want to compute determinants, only log-determinants. Similarly to how we never compute probabilities only log probabilities. This is because of numerical stability (if you don’t take the log, most things are infinite or zero).
logdet(Q)=2logdet(L) How to find the determinant of L? Since the cholesky is lower or upper triangular, we can just take the product of the diagonal! If you look at the matrix L this is actually upper triangular, since this is R default.
logdet(Q)=2∑ilog(Li,i)
Assume that we want to compute the (joint) probability that ui=0.1 for all i simultaneously (as an example).
The general formula log(π(u))=12log|(2π)−1Q|−12→u⊤Q→u gives us log(π(u=0.1))=−N2log(2π)+12log|Q|−120.1⊤Q0.1.
In code this is:
pt1 = rep(0.1, nrow(Q))
log.prob.pt1 = -nrow(Q)/2*log(2*pi) + 0.5*2*sum(log(diag(L))) - 0.5 *pt1 %*% Q %*% pt1
print(log.prob.pt1)
## [,1]
## [1,] -921
To sample from Q we just define u as the solution to Lu=z where z are iid Gaussians N(0,1). Then precision(u)=L⊤L. The sampling code we can write as follows.
set.seed(2017)
z.sample = rnorm(nrow(Q))
u.sample = solve(L, z.sample)
plot(u.sample, type="l")