G13 Chapter Introduction : NAG Library CL Interface, Mark 28

Let the given time series be

x_{1}, x_{2}, \dots, x_{n}

, where

n

is its length. The structure which is intended to be investigated, and which may be most evident to the eye in a graph of the series, can be broadly described as:

(a)trends, linear or possibly higher-order polynomial;
(b)seasonal patterns, associated with fixed integer seasonal periods. The presence of such seasonality and the period will normally be known a priori. The pattern may be fixed, or slowly varying from one season to another;
(c)cycles or waves of stable amplitude and period $p$ (from peak to peak). The period is not necessarily integer, the corresponding absolute frequency (cycles/time unit) being $f = 1 / p$ and angular frequency $ω = 2 π f$ . The cycle may be of pure sinusoidal form like $\sin (ω t)$ , or the presence of higher harmonic terms may be indicated, e.g., by asymmetry in the wave form;
(d)quasi-cycles, i.e., waves of fluctuating period and amplitude; and
(e)irregular statistical fluctuations and swings about the overall mean or trend.

Trends, seasonal patterns, and cycles might be regarded as deterministic components following fixed mathematical equations, and the quasi-cycles and other statistical fluctuations as stochastic and describable by short-term correlation structure. For a finite dataset it is not always easy to discriminate between these two types, and a common description using the class of autoregressive integrated moving-average (ARIMA) models is now widely used. The form of these models is that of difference equations (or recurrence relations) relating present and past values of the series. You are referred to Box and Jenkins (1976) for a thorough account of these models and how to use them. We follow their notation and outline the recommended steps in ARIMA model building for which functions are available.

2.1.1 Transformations

If the variance of the observations in the series is not constant across the range of observations it may be useful to apply a variance-stabilizing transformation to the series. A common situation is for the variance to increase with the magnitude of the observations and in this case typical transformations used are the log or square root transformation. A range-mean plot or standard deviation-mean plot provides a quick and easy way of detecting non-constant variance and of choosing, if required, a suitable transformation. These are plots of either the range or standard deviation of successive groups of observations against their means.

2.1.2 Differencing operations

These may be used to simplify the structure of a time series.

First-order differencing, i.e., forming the new series

\nabla x_{t} = x_{t} - x_{t - 1}

will remove a linear trend. First-order seasonal differencing

\nabla_{s} x_{t} = x_{t} - x_{t - s}

eliminates a fixed seasonal pattern.

These operations reflect the fact that it is often appropriate to model a time series in terms of changes from one value to another. Differencing is also, therefore, appropriate when the series has something of the nature of a random walk, which is by definition the accumulation of independent changes.

Differencing may be applied repeatedly to a series, giving

w_{t} = \nabla^{d} \nabla_{s}^{D} x_{t}

where

d

and

D

are the orders of differencing. The derived series

w_{t}

will be shorter, of length

N = n - d - s \times D

, and extend for

t = 1 + d + s \times D, \dots, n

2.1.3 Sample autocorrelations

Given that a series has (possibly as a result of simplifying by differencing operations) a homogeneous appearance throughout its length, fluctuating with approximately constant variance about an overall mean level, it is appropriate to assume that its statistical properties are stationary. For most purposes the correlations

ρ_{k}

between terms

x_{t}, x_{t + k}

w_{t}, w_{t + k}

separated by lag

k

give an adequate description of the statistical structure and are estimated by the sample autocorrelation function (ACF)

r_{k}

, for

k = 1, 2, \dots

As described by Box and Jenkins (1976), these may be used to indicate which particular ARIMA model may be appropriate.

2.1.4 Partial autocorrelations

The information in the autocorrelations,

ρ_{k}

, may be presented in a different light by deriving from them the coefficients of the partial autocorrelation function (PACF)

ϕ_{k, k}

, for

k = 1, 2, \dots

ϕ_{k, k}

which measures the correlation between

x_{t}

and

x_{t + k}

conditional upon the intermediate values

x_{t + 1}, x_{t + 2}, \dots, x_{t + k - 1}

. The corresponding sample values

{\hat{ϕ}}_{k, k}

give further assistance in the selection of ARIMA models.

Both autocorrelation function (ACF) and PACF may be rapidly computed, particularly in comparison with the time taken to estimate ARIMA models.

2.1.5 Finite lag predictor coefficients and error variances

The partial autocorrelation coefficient

ϕ_{k, k}

is determined as the final parameter in the minimum variance predictor of

x_{t}

in terms of

x_{t - 1}, x_{t - 2}, \dots, x_{t - k}

x_{t} = ϕ_{k, 1} x_{t - 1} + ϕ_{k, 2} x_{t - 2} + \dots + ϕ_{k, k} x_{t - k} + e_{k, t}

where

e_{k, t}

is the prediction error, and the first subscript

k

ϕ_{k, i}

and

e_{k, t}

emphasizes the fact that the parameters will alter as

k

increases. Moderately good estimates

{\hat{ϕ}}_{k, i}

ϕ_{k, i}

are obtained from the sample autocorrelation function (ACF), and after calculating the partial autocorrelation function (PACF) up to lag

L

, the successive values

v_{1}, v_{2}, \dots, v_{L}

of the prediction error variance estimates,

v_{k} = var (e_{k, t})

, are available, together with the final values of the coefficients

{\hat{ϕ}}_{k, 1}, {\hat{ϕ}}_{k, 2}, \dots, {\hat{ϕ}}_{k, L}

. If

x_{t}

has nonzero mean,

\bar{x}

, it is adequate to use

x_{t} - \bar{x}

in place of

x_{t}

in the prediction equation.

Although Box and Jenkins (1976) do not place great emphasis on these prediction coefficients, their use is advocated for example by Akaike (1971), who recommends selecting an optimal order of the predictor as the lag for which the final prediction error (FPE) criterion

(1 + k / n) {(1 - k / n)}^{−1} v_{k}

is a minimum.

2.1.6 ARIMA models

The correlation structure in stationary time series may often be represented by a model with a small number of parameters belonging to the autoregressive moving-average (ARMA) class. If the stationary series

w_{t}

has been derived by differencing from the original series

x_{t}

, then

x_{t}

is said to follow an ARIMA model. Taking

w_{t} = \nabla^{d} x_{t}

, the (non-seasonal) ARIMA

(p, d, q)

model with

p

autoregressive parameters

ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}

and

q

moving-average parameters

θ_{1}, θ_{2}, \dots, θ_{q}

, represents the structure of

w_{t}

by the equation

w_{t} = ϕ_{1} w_{t - 1} + \dots + ϕ_{p} w_{t - p} + a_{t} - θ_{1} a_{t - 1} - \dots - θ_{q} a_{t - q},

(1)

where

a_{t}

is an uncorrelated series (white noise) with mean

0

and constant variance

σ_{a}^{2}

. If

w_{t}

has a nonzero mean

c

, then this is allowed for by replacing

w_{t}, w_{t - 1}, \dots

w_{t} - c, w_{t - 1} - c, \dots

in the model. Although

c

is often estimated by the sample mean of

w_{t}

this is not always optimal.

A series generated by this model will only be stationary provided restrictions are placed on

ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}

to avoid unstable growth of

w_{t}

. These are called stationarity constraints. The series

a_{t}

may also be usefully interpreted as the linear innovations in

x_{t}

(and in

w_{t}

), i.e., the error if

x_{t}

were to be predicted using the information in all past values

x_{t - 1}, x_{t - 2}, \dots

, provided also that

θ_{1}, θ_{2}, \dots, θ_{q}

satisfy invertibility constraints. This allows the series

a_{t}

to be regenerated by rewriting the model equation as

a_{t} = w_{t} - ϕ_{1} w_{t - 1} - \dots - ϕ_{p} w_{t - p} + θ_{1} a_{t - 1} + \dots + θ_{q} a_{t - q} .

(2)

For a series with short-term correlation only, i.e.,

r_{k}

is not significant beyond some low lag

q

(see Box and Jenkins (1976) for the statistical test), then the pure moving-average model

MA (q)

is appropriate, with no autoregressive parameters, i.e.,

p = 0

Autoregressive parameters are appropriate when the autocorrelation function (ACF) pattern decays geometrically, or with a damped sinusoidal pattern which is associated with quasi-periodic behaviour in the series. If the sample partial autocorrelation function (PACF)

{\hat{ϕ}}_{k, k}

is significant only up to some low lag

p

, then a pure autoregressive model

AR (p)

is appropriate, with

q = 0

. Otherwise moving-average terms will need to be introduced, as well as autoregressive terms.

The seasonal ARIMA

(p, d, q, P, D, Q, s)

model allows for correlation at lags which are multiples of the seasonal period

s

. Taking

w_{t} = \nabla^{d} \nabla_{s}^{D} x_{t}

, the series is represented in a two-stage manner via an intermediate series

e_{t}

w_{t} = Φ_{1} w_{t - s} + \dots + Φ_{P} w_{t - s \times P} + e_{t} - Θ_{1} e_{t - s} - \dots - Θ_{Q} e_{t - s \times Q}

(3)

e_{t} = ϕ_{1} e_{t - 1} + \dots + ϕ_{p} e_{t - p} + a_{t} - θ_{1} a_{t - 1} - \dots - θ_{q} a_{t - q}

(4)

where

Φ_{i}

Θ_{i}

are the seasonal parameters and

P

and

Q

are the corresponding orders. Again,

w_{t}

may be replaced by

w_{t} - c

2.1.7 ARIMA model estimation

In theory, the parameters of an ARIMA model are determined by a sufficient number of autocorrelations

ρ_{1}, ρ_{2}, \dots

. Using the sample values

r_{1}, r_{2}, \dots

in their place it is usually (but not always) possible to solve for the corresponding ARIMA parameters.

These are rapidly computed but are not fully efficient estimates, particularly if moving-average parameters are present. They do provide useful preliminary values for an efficient but relatively slow iterative method of estimation. This is based on the least squares principle by which parameters are chosen to minimize the sum of squares of the innovations

a_{t}

, which are regenerated from the data using (2), or the reverse of (3) and (4) in the case of seasonal models.

Lack of knowledge of terms on the right-hand side of (2), when

t = 1, 2, \dots, \max (p, q)

, is overcome by introducing

q

unknown series values

w_{0}, w_{1}, \dots, w_{q - 1}

which are estimated as nuisance parameters, and using correction for transient errors due to the autoregressive terms. If the data

w_{1}, w_{2}, \dots, w_{N} = w

is viewed as a single sample from a multivariate Normal density whose covariance matrix

V

is a function of the ARIMA model parameters, then the exact likelihood of the parameters is

- \frac{1}{2} \log | V | - \frac{1}{2} w^{T} V^{- 1} w .

The least squares criterion as outlined above is equivalent to using the quadratic form

Q F = w^{T} V^{- 1} w

as an objective function to be minimized. Neglecting the term

- \frac{1}{2} \log | V |

yields estimates which differ very little from the exact likelihood except in small samples, or in seasonal models with a small number of whole seasons contained in the data. In these cases bias in moving-average parameters may cause them to stick at the boundary of their constraint region, resulting in failure of the estimation method.

Approximate standard errors of the parameter estimates and the correlations between them are available after estimation.

The model residuals,

{\hat{a}}_{t}

, are the innovations resulting from the estimation and are usually examined for the presence of autocorrelation as a check on the adequacy of the model.

2.1.8 ARIMA model forecasting

An ARIMA model is particularly suited to extrapolation of a time series. The model equations are simply used for

t = n + 1, n + 2, \dots

replacing the unknown future values of

a_{t}

by zero. This produces future values of

w_{t}

, and if differencing has been used this process is reversed (the so-called integration part of ARIMA models) to construct future values of

x_{t}

Forecast error limits are easily deduced.

This process requires knowledge only of the model orders and parameters together with a limited set of the terms

a_{t - i}, e_{t - i}, w_{t - i}, x_{t - i}

which appear on the right-hand side of the models (3) and (4) (and the differencing equations) when

t = n

. It does not require knowledge of the whole series.

We call this the state set. It is conveniently constituted after model estimation. Moreover, if new observations

x_{n + 1}, x_{n + 2}, \dots

come to hand, then the model equations can easily be used to update the state set before constructing forecasts from the end of the new observations. This is particularly useful when forecasts are constructed on a regular basis. The new innovations

a_{n + 1}, a_{n + 2}, \dots

may be compared with the residual standard deviation,

σ_{a}

, of the model used for forecasting, as a check that the model is continuing to forecast adequately.

2.1.9 Exponential smoothing

Exponential smoothing is a relatively simple method of short term forecasting for a time series. A variety of different smoothing methods are possible, including; single exponential, Brown's double exponential, linear Holt (also called double exponential smoothing in some references), additive Holt–Winters and multiplicative Holt–Winters. The choice of smoothing method used depends on the characteristics of the time series. If the mean of the series is only slowly changing then single exponential smoothing may be suitable. If there is a trend in the time series, which itself may be slowly changing, then linear Holt smoothing may be suitable. If there is a seasonal component to the time series, e.g., daily or monthly data, then one of the two Holt–Winters methods may be suitable.

For a time series

y_{t}

, for

t = 1, 2, \dots, n

, the five smoothing functions are defined by the following:

Single Exponential Smoothing

$\begin{matrix} m_{t} & = α y_{t} + (1 - α) m_{t - 1} \\ {\hat{y}}_{t + f} & = m_{t} \\ var ({\hat{y}}_{t + f}) & = var (ε_{t}) (1 + (f - 1) α^{2}) \end{matrix}$

Brown Double Exponential Smoothing

\begin{matrix} m_{t} & = α y_{t} + (1 - α) m_{t - 1} \\ r_{t} & = α (m_{t} - m_{t - 1}) + (1 - α) r_{t - 1} \\ {\hat{y}}_{t + f} & = m_{t} + ((f - 1) + 1 / α) r_{t} \\ var ({\hat{y}}_{t + f}) & = var (ε_{t}) (1 + \sum_{i = 0}^{f - 1} {(2 α + (i - 1) α^{2})}^{2}) \end{matrix}

Linear Holt Smoothing

\begin{matrix} m_{t} & = α y_{t} + (1 - α) (m_{t - 1} + ϕ r_{t - 1}) \\ r_{t} & = γ (m_{t} - m_{t - 1}) + (1 - γ) ϕ r_{t - 1} \\ {\hat{y}}_{t + f} & = m_{t} + \sum_{i = 1}^{f} ϕ^{i} r_{t} \\ var ({\hat{y}}_{t + f}) & = var (ε_{t}) (1 + \sum_{i = 1}^{f - 1} {(α + \frac{α γ ϕ (ϕ^{i} - 1)}{(ϕ - 1)})}^{2}) \end{matrix}

Additive Holt–Winters Smoothing

\begin{matrix} m_{t} & = α (y_{t} - s_{t - p}) + (1 - α) (m_{t - 1} + ϕ r_{t - 1}) \\ r_{t} & = γ (m_{t} - m_{t - 1}) + (1 - γ) ϕ r_{t - 1} \\ s_{t} & = β (y_{t} - m_{t}) + (1 - β) s_{t - p} \\ {\hat{y}}_{t + f} & = m_{t} + (\sum_{i = 1}^{f} ϕ^{i} r_{t}) + s_{t - p} \\ var ({\hat{y}}_{t + f}) & = var (ε_{t}) (1 + \sum_{i = 1}^{f - 1} ψ_{i}^{2}) \\ ψ_{i} & = {\begin{matrix} 0 & if ​ i \geq f \\ α + \frac{α γ ϕ (ϕ^{i} - 1)}{(ϕ - 1)} & if ​ i mod p \neq 0 \\ α + \frac{α γ ϕ (ϕ^{i} - 1)}{(ϕ - 1)} + β (1 - α) & otherwise \end{matrix} \end{matrix}

Multiplicative Holt–Winters Smoothing

\begin{matrix} m_{t} & = α y_{t} / s_{t - p} + (1 - α) (m_{t - 1} + ϕ r_{t - 1}) \\ r_{t} & = γ (m_{t} - m_{t - 1}) + (1 - γ) ϕ r_{t - 1} \\ s_{t} & = β y_{t} / m_{t} + (1 - β) s_{t - p} \\ {\hat{y}}_{t + f} & = (m_{t} + \sum_{i = 1}^{f} ϕ^{i} r_{t}) \times s_{t - p} \\ var ({\hat{y}}_{t + f}) & = var (ε_{t}) (\sum_{i = 0}^{\infty} \sum_{j = 0}^{p - 1} {(ψ_{j + i p} \frac{s_{t + f}}{s_{t + f - j}})}^{2}) \end{matrix}

and

ψ

is defined as in the additive Holt–Winters smoothing,

where

m_{t}

is the mean,

r_{t}

is the trend and

s_{t}

is the seasonal component at time

t

with

p

being the seasonal order. The

f

-step ahead forecasts are given by

{\hat{y}}_{t + f}

and their variances by

var ({\hat{y}}_{t + f})

. The term

var (ε_{t})

is estimated as the mean deviation.

The parameters,

α

β

and

γ

control the amount of smoothing. The nearer these parameters are to one, the greater the emphasis on the current data point. Generally these parameters take values in the range

0.1

0.3

. The linear Holt and two Holt–Winters smoothers include an additional parameter,

ϕ

, which acts as a trend dampener. For

0.0 < ϕ < 1.0

the trend is dampened and for

ϕ > 1.0

the forecast function has an exponential trend,

ϕ = 0.0

removes the trend term from the forecast function and

ϕ = 1.0

does not dampen the trend.

For all methods, values for

α

β

γ

and

ψ

can be chosen by trying different values and then visually comparing the results by plotting the fitted values along side the original data. Alternatively, for single exponential smoothing a suitable value for

α

can be obtained by fitting an

ARIMA (0, 1, 1)

model. For Brown's double exponential smoothing and linear Holt smoothing with no dampening, (i.e.,

ϕ = 1.0

), suitable values for

α

and, in the case of linear Holt smoothing,

γ

can be obtained by fitting an

ARIMA (0, 2, 2)

model. Similarly, the linear Holt method, with

ϕ \neq 1.0

, can be expressed as an

ARIMA (1, 2, 2)

model and the additive Holt–Winters, with no dampening, (

ϕ = 1.0

), can be expressed as a seasonal ARIMA model with order

p

of the form

ARIMA (0, 1, p + 1) (0, 1, 0)

. There is no similar procedure for obtaining parameter values for the multiplicative Holt–Winters method, or the additive Holt–Winters method with

ϕ \neq 1.0

. In these cases parameters could be selected by minimizing a measure of fit using nonlinear optimization.

2.1.10 Change point analysis

Given a time series

y_{1 : n} = {y_{j} : j = 1, 2, \dots, n}

, a change point

τ

is a place or time point such that segment of the series up to

τ

y_{1 : τ}

, follows one distribution and the segment after

τ

y_{τ + 1 : n}

, follows a different distribution. This idea can be extended to

m

change points, in which case

τ = {τ_{i} : i = 1, 2, \dots, m}

becomes a vector of ordered (strictly monotonic increasing) change points with

1 \leq τ_{i} \leq n

and

τ_{m} = n

. The

i

th segment, therefore, consists of

y_{τ_{i - 1} + 1 : τ_{i}}

where, for ease of notation, we define

τ_{0} = 0

. A change point problem is, therefore, twofold: estimating

m

the number of change points (and hence the number of segments) and estimating

τ

the location of those change points.

Given a cost function,

C (y_{τ_{i - 1} + 1 : τ_{i}})

one formulation of the change point problem can be expressed as the solution to:

\underset{m, τ}{minimize} \sum_{i = 1}^{m} (C (y_{τ_{i - 1} + 1 : τ_{i}}) + β)

(5)

where

β

is a penalty term used to control the number of change points. Two methods of solving equation (5) are available: the PELT algorithm and binary segmentation. The Pruned Exact Linear Time (PELT) algorithm of Killick et al. (2012) is a tree based method which is guaranteed to return the optimal solution to (5) as long as there exists a constant

K

such that

C (y_{(u + 1) : v}) + C (y_{(v + 1) : w}) + K \leq C (y_{(u + 1) : w})

(6)

for all

u < v < w

. Unlike PELT, binary segmentation is an iterative method that only results in an approximate solution to (5). A description of the binary segmentation algorithm can be found in Section 3 in g13ndc and g13nec.

2.2 Univariate Spectral Analysis

In describing a time series using spectral analysis the fundamental components are taken to be sinusoidal waves of the form

R \cos (ω t + ϕ)

, which for a given angular frequency

ω

0 \leq ω \leq π

, is specified by its amplitude

R > 0

and phase

ϕ

0 \leq ϕ < 2 π

. Thus in a time series of

n

observations it is not possible to distinguish more than

n / 2

independent sinusoidal components. The frequency range

0 \leq ω \leq π

is limited to the shortest wavelength of two sampling units because any wave of higher frequency is indistinguishable upon sampling (or is aliased with) a wave within this range. Spectral analysis follows the idea that for a series made up of a finite number of sine waves the amplitude of any component at frequency

ω

is given to order

1 / n

R^{2} = (\frac{1}{n^{2}}) {| \sum_{t = 1}^{n} x_{t} e^{i ω t} |}^{2} .

2.2.1 The sample spectrum

For a series

x_{1}, x_{2}, \dots, x_{n}

this is defined as

f^{*} (ω) = (\frac{1}{2 n π}) {| \sum_{t = 1}^{n} x_{t} e^{i ω t} |}^{2},

the scaling factor now being chosen in order that

2 \int_{0}^{π} f^{*} (ω) d ω = σ_{x}^{2},

i.e., the spectrum indicates how the sample variance (

σ_{x}^{2}

) of the series is distributed over components in the frequency range

0 \leq ω \leq π

It may be demonstrated that

f^{*} (ω)

is equivalently defined in terms of the sample ACF

r_{k}

of the series as

f^{*} (ω) = (\frac{1}{2 π}) (c_{0} + 2 \sum_{k = 1}^{n - 1} c_{k} \cos k ω),

where

c_{k} = σ_{x}^{2} r_{k}

are the sample autocovariance coefficients.

If the series

x_{t}

does contain a deterministic sinusoidal component of amplitude

R

, this will be revealed in the sample spectrum as a sharp peak of approximate width

π / n

and height

(n / 2 π) R^{2}

. This is called the discrete part of the spectrum, the variance

R^{2}

associated with this component being in effect concentrated at a single frequency.

If the series

x_{t}

has no deterministic components, i.e., is purely stochastic being stationary with autocorrelation function (ACF)

r_{k}

, then with increasing sample size the expected value of

f^{*} (ω)

converges to the theoretical spectrum – the continuous part

f (ω) = (\frac{1}{2 π}) (γ_{0} + 2 \sum_{k = 1}^{\infty} γ_{k} \cos (ω k)),

where

γ_{k}

are the theoretical autocovariances.

The sample spectrum does not however converge to this value but at each frequency point fluctuates about the theoretical spectrum with an exponential distribution, being independent at frequencies separated by an interval of

2 π / n

or more. Various devices are, therefore, employed to smooth the sample spectrum and reduce its variability. Much of the strength of spectral analysis derives from the fact that the error limits are multiplicative so that features may still show up as significant in a part of the spectrum which has a generally low level, whereas they are completely masked by other components in the original series. The spectrum can help to distinguish deterministic cyclical components from the stochastic quasi-cycle components which produce a broader peak in the spectrum. (The deterministic components can be removed by regression and the remaining part represented by an ARIMA model.)

A large discrete component in a spectrum can distort the continuous part over a large frequency range surrounding the corresponding peak. This may be alleviated at the cost of slightly broadening the peak by tapering a portion of the data at each end of the series with weights which decay smoothly to zero. It is usual to correct for the mean of the series and for any linear trend by simple regression, since they would similarly distort the spectrum.

2.2.2 Spectral smoothing by lag window

The estimate is calculated directly from the sample autocovariances

c_{k}

f (ω) = (\frac{1}{2 π}) (c_{0} + 2 \sum_{k = 1}^{M - 1} w_{k} c_{k} \cos k ω),

the smoothing being induced by the lag window weights

w_{k}

which extend up to a truncation lag

M

which is generally much less than

n

. The smaller the value of

M

, the greater the degree of smoothing, the spectrum estimates being independent only at a wider frequency separation indicated by the bandwidth

b

which is proportional to

1 / M

. It is wise, however, to calculate the spectrum at intervals appreciably less than this. Although greater smoothing narrows the error limits, it can also distort the spectrum, particularly by flattening peaks and filling in troughs.

2.2.3 Direct spectral smoothing

The unsmoothed sample spectrum is calculated for a fine division of frequencies, then averaged over intervals centred on each frequency point for which the smoothed spectrum is required. This is usually at a coarser frequency division. The bandwidth corresponds to the width of the averaging interval.

2.3 Linear Lagged Relationships Between Time Series

We now consider the context in which one time series, called the dependent or output series,

y_{1}, y_{2}, \dots, y_{n}

, is believed to depend on one or more explanatory or input series, e.g.,

x_{1}, x_{2}, \dots, x_{n}

. This dependency may follow a simple linear regression, e.g.,

y_{t} = v x_{t} + n_{t}

or more generally may involve lagged values of the input

y_{t} = v_{0} x_{t} + v_{1} x_{t - 1} + v_{2} x_{t - 2} + \dots + n_{t} .

The sequence

v_{0}, v_{1}, v_{2}, \dots

is called the impulse response function (IRF) of the relationship. The term

n_{t}

represents that part of

y_{t}

which cannot be explained by the input, and it is assumed to follow a univariate ARIMA model. We call

n_{t}

the (output) noise component of

y_{t}

, and it includes any constant term in the relationship. It is assumed that the input series,

x_{t}

, and the noise component,

n_{t}

, are independent.

The part of

y_{t}

which is explained by the input is called the input component

z_{t}

z_{t} = v_{0} x_{t} + v_{1} x_{t - 1} + v_{2} x_{t - 2} + \dots

y_{t} = z_{t} + n_{t}

The eventual aim is to model both these components of

y_{t}

on the basis of observations of

y_{1}, y_{2}, \dots, y_{n}

and

x_{1}, x_{2}, \dots, x_{n}

. In applications to forecasting or control both components are important. In general there may be more than one input series, e.g.,

x_{1, t}

and

x_{2, t}

, which are assumed to be independent and corresponding components

z_{1, t}

and

z_{2, t}

, so

y_{t} = z_{1, t} + z_{2, t} + n_{t} .

2.3.1 Transfer function models

In a similar manner to that in which the structure of a univariate series may be represented by a finite-parameter ARIMA model, the structure of an input component may be represented by a transfer function (TF) model with delay time

b

p

autoregressive-like parameters

δ_{1}, δ_{2}, \dots, δ_{p}

and

q + 1

moving-average-like parameters

ω_{0}, ω_{1}, \dots, ω_{q}

z_{t} = δ_{1} z_{t - 1} + δ_{2} z_{t - 2} + \dots + δ_{p} z_{t - p} + ω_{0} x_{t - b} - ω_{1} x_{t - b - 1} - \dots - ω_{q} x_{t - b - q} .

(7)

p > 0

this represents an impulse response function (IRF) which is infinite in extent and decays with geometric and/or sinusoidal behaviour. The parameters

δ_{1}, δ_{2}, \dots, δ_{p}

are constrained to satisfy a stability condition identical to the stationarity condition of autoregressive models. There is no constraint on

ω_{0}, ω_{1}, \dots, ω_{q}

2.3.2 Cross-correlations

An important tool for investigating how an input series

x_{t}

affects an output series

y_{t}

is the sample cross-correlation function (CCF)

r_{x y} (k)

, for

k = 0, 1, \dots

between the series. If

x_{t}

and

y_{t}

are (jointly) stationary time series this is an estimator of the theoretical quantity

ρ_{x y} (k) = corr (x_{t}, y_{t + k}) .

The sequence

r_{y x} (k)

, for

k = 0, 1, \dots

, is distinct from

r_{x y} (k)

, though it is possible to interpret

r_{y x} (k) = r_{x y} (- k) .

When the series

y_{t}

and

x_{t}

are believed to be related by a transfer function (TF) model, the CCF is determined by the impulse response function (IRF)

v_{0}, v_{1}, v_{2}, \dots

and the autocorrelation function (ACF) of the input

x_{t}

In the particular case when

x_{t}

is an uncorrelated series or white noise (and is uncorrelated with any other inputs):

ρ_{x y} (k) \propto v_{k}

and the sample CCF can provide an estimate of

v_{k}

{\tilde{v}}_{k} = (s_{y} / s_{x}) r_{x y} (k)

where

s_{y}

and

s_{x}

are the sample standard deviations of

y_{t}

and

x_{t}

, respectively.

In theory the IRF coefficients

v_{b}, \dots, v_{b + p + q}

determine the parameters in the TF model, and using

{\tilde{v}}_{k}

to estimate

{\tilde{v}}_{k}

it is possible to solve for preliminary estimates of

δ_{1}, δ_{2}, \dots, δ_{p}

ω_{0}, ω_{1}, \dots, ω_{q}

2.3.3 Prewhitening or filtering by an ARIMA model

In general an input series

x_{t}

is not white noise, but may be represented by an ARIMA model with innovations or residuals

a_{t}

which are white noise. If precisely the same operations by which

a_{t}

is generated from

x_{t}

are applied to the output

y_{t}

to produce a series

b_{t}

, then the transfer function relationship between

y_{t}

and

x_{t}

is preserved between

b_{t}

and

a_{t}

. It is then possible to estimate

{\tilde{v}}_{k} = (s_{b} / s_{a}) r_{a b} (k) .

The procedure of generating

a_{t}

from

x_{t}

(and

b_{t}

from

y_{t}

) is called prewhitening or filtering by an ARIMA model. Although

a_{t}

is necessarily white noise, this is not generally true of

b_{t}

2.3.4 Multi-input model estimation

The term multi-input model is used for the situation when one output series

y_{t}

is related to one or more input series

x_{j, t}

, as described in Section 2.3. If for a given input the relationship is a simple linear regression, it is called a simple input; otherwise it is a transfer function input. The error or noise term follows an ARIMA model.

Given that the orders of all the transfer function models and the ARIMA model of a multi-input model have been specified, the various parameters in those models may be (simultaneously) estimated.

The procedure used is closely related to the least squares principle applied to the innovations in the ARIMA noise model.

The innovations are derived for any proposed set of parameter values by calculating the response of each input to the transfer functions and then evaluating the noise

n_{t}

as the difference between this response (combined for all the inputs) and the output. The innovations are derived from the noise using the ARIMA model in the same manner as for a univariate series, and as described in Section 2.1.6.

In estimating the parameters, consideration has to be given to the lagged terms in the various model equations which are associated with times prior to the observation period, and are, therefore, unknown. The function descriptions provide the necessary detail as to how this problem is treated.

Also, as described in Section 2.1.7 the sum of squares criterion

S = \sum a_{t}^{2}

is related to the quadratic form in the exact log-likelihood of the parameters:

- \frac{1}{2} \log | V | - \frac{1}{2} w^{T} V^{- 1} w .

Here

w

is the vector of appropriately differenced noise terms, and

w^{T} V^{- 1} w = S / σ_{a}^{2},

where

σ_{a}^{2}

is the innovation variance parameter.

The least squares criterion is, therefore, identical to minimization of the quadratic form, but is not identical to exact likelihood. Because

V

may be expressed as

M σ_{a}^{2}

, where

M

is a function of the ARIMA model parameters, substitution of

σ_{a}^{2}

by its maximum likelihood (ML) estimator yields a concentrated (or profile) likelihood which is a function of

{| M |}^{1 / N} S .

N

is the length of the differenced noise series

w

, and

| M | = \det M

Use of the above quantity, called the deviance,

D

, as an objective function is preferable to the use of

S

alone, on the grounds that it is equivalent to exact likelihood, and yields estimates with better properties. However, there is an appreciable computational penalty in calculating

D

, and in large samples it differs very little from

S

, except in the important case of seasonal ARIMA models where the number of whole seasons within the data length must also be large.

You are given the option of taking the objective function to be either

S

D

, or a third possibility, the marginal likelihood. This is similar to exact likelihood but can counteract bias in the ARIMA model due to the fitting of a large number of simple inputs.

Approximate standard errors of the parameter estimates and the correlations between them are available after estimation.

The model residuals

{\hat{a}}_{t}

are the innovations resulting from the estimation, and they are usually examined for the presence of either autocorrelation or cross-correlation with the inputs. Absence of such correlation provides some confirmation of the adequacy of the model.

2.3.5 Multi-input model forecasting

A multi-input model may be used to forecast the output series provided future values (possibly forecasts) of the input series are supplied.

Construction of the forecasts requires knowledge only of the model orders and parameters, together with a limited set of the most recent variables which appear in the model equations. This is called the state set. It is conveniently constituted after model estimation. Moreover, if new observations

y_{n + 1}, y_{n + 2}, \dots

of the output series and

x_{n + 1}, x_{n + 2}, \dots

of (all) the independent input series become available, then the model equations can easily be used to update the state set before constructing forecasts from the end of the new observations. The new innovations

a_{n + 1}, a_{n + 2}, \dots

generated in this updating may be used to monitor the continuing adequacy of the model.

2.3.6 Transfer function model filtering

In many time series applications it is desired to calculate the response (or output) of a transfer function (TF) model for a given input series.

Smoothing, detrending, and seasonal adjustment are typical applications. You must specify the orders and parameters of a TF model for the purpose being considered. This may then be applied to the input series.

Again, problems may arise due to ignorance of the input series values prior to the observation period. The transient errors which can arise from this may be substantially reduced by using ‘backforecasts’ of these unknown observations.

2.4 Multivariate Time Series

Multi-input modelling represents one output time series in terms of one or more input series. Although there are circumstances in which it may be more appropriate to analyse a set of time series by modelling each one in turn as the output series with the remainder as inputs, there is a more symmetric approach in such a context. These models are known as vector autoregressive moving-average (VARMA) models.

2.4.1 Differencing and transforming a multivariate time series

As in the case of a univariate time series, it may be useful to simplify the series by differencing operations which may be used to remove linear or seasonal trends, thus ensuring that the resulting series to be used in the model estimation is stationary. It may also be necessary to apply transformations to the individual components of the multivariate series in order to stabilize the variance. Commonly used transformations are the log and square root transformations.

2.4.2 Model identification for a multivariate time series

Multivariate analogues of the autocorrelation and partial autocorrelation functions are available for analysing a set of

k

time series,

x_{i, 1}, x_{i, 2}, \dots, x_{i, n}

, for

i = 1, 2, \dots, k

, thereby making it possible to obtain some understanding of a suitable VARMA model for the observed series.

It is assumed that the time series have been differenced if necessary, and that they are jointly stationary. The lagged correlations between all possible pairs of series, i.e.,

ρ_{i j l} = corr (x_{i, t}, x_{j, t + l})

are then taken to provide an adequate description of the statistical relationships between the series. These quantities are estimated by sample auto- and cross-correlations

r_{i j l}

. For each

l

these may be viewed as elements of a (lagged) autocorrelation matrix.

Thus consider the vector process

x_{t}

(with elements

x_{i t}

) and lagged autocovariance matrices

Γ_{l}

with elements of

σ_{i} σ_{j} ρ_{i j l}

where

σ_{i}^{2} = var (x_{i, t})

. Correspondingly,

Γ_{l}

is estimated by the matrix

C_{l}

with elements

s_{i} s_{j} r_{i j l}

where

s_{i}^{2}

is the sample variance of

x_{i t}

For a series with short-term cross-correlation only, i.e.,

r_{i j l}

is not significant beyond some low lag

q

, then the pure vector

MA (q)

model, with no autoregressive parameters, i.e.,

p = 0

, is appropriate.

The correlation matrices provide a description of the joint statistical properties of the series. It is also possible to calculate matrix quantities which are closely analogous to the partial autocorrelations of univariate series (see Section 2.1.4). Wei (1990) discusses both the partial autoregression matrices proposed by Tiao and Box (1981) and partial lag correlation matrices.

In the univariate case the partial autocorrelation function (PACF) between

x_{t}

and

x_{t + l}

is the correlation coefficient between the two after removing the linear dependence on each of the intervening variables

x_{t + 1}, x_{t + 2}, \dots, x_{t + l - 1}

. This partial autocorrelation may also be obtained as the last regression coefficient associated with

x_{t}

when regressing

x_{t + l}

on its

l

lagged variables

x_{t + l - 1}, x_{t + l - 2}, \dots, x_{t}

. Tiao and Box (1981) extended this method to the multivariate case to define the partial autoregression matrix. Heyse and Wei (1985) also extended the univariate definition of the PACF to derive the correlation matrix between the vectors

x_{t}

and

x_{t + l}

after removing the linear dependence on each of the intervening vectors

x_{t + 1}, x_{t + 2}, \dots, x_{t + l - 1}

, the partial lag correlation matrix.

Note that the partial lag correlation matrix is a correlation coefficient matrix since each of its elements is a properly normalized correlation coefficient. This is not true of the partial autoregression matrices (except in the univariate case for which the two types of matrix are the same). The partial lag correlation matrix at lag

1

also reduces to the regular correlation matrix at lag

1

; this is not true of the partial autoregression matrices (again except in the univariate case).

Both the above share the same cut-off property for autoregressive processes; that is for an autoregressive process of order

p

, the terms of the matrix at lags

p + 1

and greater are zero. Thus if the sample partial cross-correlations are significant only up to some low lag

p

then a pure vector

AR (p)

model is appropriate with

q = 0

. Otherwise moving-average terms will need to be introduced as well as autoregressive terms.

Under the hypothesis that

x_{t}

is an autoregressive process of order

l - 1

n

times the sum of the squared elements of the partial lag correlation matrix at lag

l

is asymptotically distributed as a

χ^{2}

variable with

k^{2}

degrees of freedom where

k

is the dimension of the multivariate time series. This provides a diagnostic aid for determining the order of an autoregressive model.

The partial autoregression matrices may be found by solving a multivariate version of the Yule–Walker equations to find the autoregression matrices, using the final regression matrix coefficient as the partial autoregression matrix at that particular lag.

The basis of these calculations is a multivariate autoregressive model:

x_{t} = ϕ_{l, 1} x_{t - 1} + \dots + ϕ_{l, l} x_{t - l} + e_{l, t}

where

ϕ_{l, 1}, ϕ_{l, 2}, \dots, ϕ_{l, l}

are matrix coefficients, and

e_{l, t}

is the vector of errors in the prediction. These coefficients may be rapidly computed using a recursive technique which requires, and simultaneously furnishes, a backward prediction equation:

x_{t - l - 1} = ψ_{l, 1} x_{t - l} + ψ_{l, 2} x_{t - l + 1} + \dots + ψ_{l, l} x_{t - 1} + f_{l, t}

(in the univariate case

ψ_{l, i} = ϕ_{l, i}

The forward prediction equation coefficients,

ϕ_{l, i}

, are of direct interest, together with the covariance matrix

D_{l}

of the prediction errors

e_{l, t}

. The calculation of these quantities for a particular maximum equation lag

l = L

involves calculation of the same quantities for increasing values of

l = 1, 2, \dots, L

The quantities

v_{l} = \det D_{l} / \det Γ_{0}

may be viewed as generalized variance ratios, and provide a measure of the efficiency of prediction (the smaller the better). The reduction from

v_{l - 1}

v_{l}

which occurs on extending the order of the predictor to

l

may be represented as

v_{l} = v_{l - 1} (1 - ρ_{l}^{2})

where

ρ_{l}^{2}

is a multiple squared partial autocorrelation coefficient associated with

k^{2}

degrees of freedom.

Sample estimates of all the above quantities may be derived by using the series covariance matrices

C_{l}

, for

l = 1, 2, \dots, L

, in place of

Γ_{l}

. The best lag for prediction purposes may be chosen as that which yields the minimum final prediction error (FPE) criterion:

FPE (l) = v_{l} \times \frac{(1 + l k^{2} / n)}{(1 - l k^{2} / n)} .

An alternative method of estimating the sample partial autoregression matrices is by using multivariate least squares to fit a series of multivariate autoregressive models of increasing order.

2.4.3 VARMA model estimation

The cross-correlation structure of a stationary multivariate time series may often be represented by a model with a small number of parameters belonging to the VARMA class. If the stationary series

w_{t}

has been derived by transforming and/or differencing the original series

x_{t}

, then

w_{t}

is said to follow the VARMA model:

w_{t} = ϕ_{1} w_{t - 1} + \dots + ϕ_{p} w_{t - p} + ε_{t} - θ_{1} ε_{t - 1} - \dots - θ_{q} ε_{t - q},

where

ε_{t}

is a vector of uncorrelated residual series (white noise) with zero mean and constant covariance matrix

Σ

ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}

are the

p

autoregressive (AR) parameter matrices and

θ_{1}, θ_{2}, \dots, θ_{q}

are the

q

moving-average (MA) parameter matrices. If

w_{t}

has a nonzero mean

μ

, then this can be allowed for by replacing

w_{t}, w_{t - 1}, \dots

w_{t} - μ, w_{t - 1} - μ, \dots

in the model.

A series generated by this model will only be stationary provided restrictions are placed on

ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}

to avoid unstable growth of

w_{t}

. These are stationarity constraints. The series

ε_{t}

may also be usefully interpreted as the linear innovations in

w_{t}

, i.e., the error if

w_{t}

were to be predicted using the information in all past values

w_{t - 1}, w_{t - 2}, \dots

, provided also that

θ_{1}, θ_{2}, \dots, θ_{q}

satisfy what are known as invertibility constraints. This allows the series

ε_{t}

to be generated by rewriting the model equation as

ε_{t} = w_{t} - ϕ_{1} w_{t - 1} - \dots - ϕ_{p} w_{t - p} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q} .

The method of maximum likelihood (ML) may be used to estimate the parameters of a specified VARMA model from the observed multivariate time series together with their standard errors and correlations.

The residuals from the model may be examined for the presence of autocorrelations as a check on the adequacy of the fitted model.

2.4.4 VARMA model forecasting

Forecasts of the series may be constructed using a multivariate version of the univariate method. Efficient methods are available for updating the forecasts each time new observations become available.

2.5 Cross-spectral Analysis

The relationship between two time series may be investigated in terms of their sinusoidal components at different frequencies. At frequency

ω

a component of

y_{t}

of the form

R_{y} (ω) \cos (ω t - ϕ_{y} (ω))

has its amplitude

R_{y} (ω)

and phase lag

ϕ_{y} (ω)

estimated by

R_{y} (ω) e^{i ϕ_{y} (ω)} = \frac{1}{n} \sum_{t = 1}^{n} y_{t} e^{i ω t}

and similarly for

x_{t}

. In the univariate analysis only the amplitude was important – in the cross analysis the phase is important.

2.5.1 The sample cross-spectrum

This is defined by

f_{x y}^{*} (ω) = \frac{1}{2 π n} (\sum_{t = 1}^{n} y_{t} e^{i ω t}) (\sum_{t = 1}^{n} x_{t} e^{- i ω t}) .

It may be demonstrated that this is equivalently defined in terms of the sample cross-correlation function (CCF),

r_{x y} (k)

, of the series as

f_{x y}^{*} (ω) = \frac{1}{2 π} \sum_{- (n - 1)}^{(n - 1)} c_{x y} (k) e^{i ω k}

where

c_{x y} (k) = s_{x} s_{y} r_{x y} (k)

is the cross-covariance function.

2.5.2 The amplitude and phase spectrum

The cross-spectrum is specified by its real part or cospectrum

c f^{*} (ω)

and imaginary part or quadrature spectrum

q f^{*} (ω)

, but for the purpose of interpretation the cross-amplitude spectrum and phase spectrum are useful:

A^{*} (ω) = | f_{x y}^{*} (ω) |, ϕ^{*} (ω) = \arg (f_{x y}^{*} (ω)) .

If the series

x_{t}

and

y_{t}

contain deterministic sinusoidal components of amplitudes

R_{y}, R_{x}

and phases

ϕ_{y}, ϕ_{x}

at frequency

ω

, then

A^{*} (ω)

will have a peak of approximate width

π / n

and height

(n / 2 π) R_{y} R_{x}

at that frequency, with corresponding phase

ϕ^{*} (ω) = ϕ_{y} - ϕ_{x}

. This supplies no information that cannot be obtained from the two series separately. The statistical relationship between the series is better revealed when the series are purely stochastic and jointly stationary, in which case the expected value of

f_{x y}^{*} (ω)

converges with increasing sample size to the theoretical cross-spectrum

f_{x y} (ω) = \frac{1}{2 π} \sum_{- \infty}^{\infty} γ_{x y} (k) e^{i ω k}

where

γ_{x y} (k) = cov (x_{t}, y_{t + k})

. The sample spectrum, as in the univariate case, does not converge to the theoretical spectrum without some form of smoothing which either implicitly (using a lag window) or explicitly (using a frequency window) averages the sample spectrum

f_{x y (ω)}^{*}

over wider bands of frequency to obtain a smoothed estimate

{\hat{f}}_{x y} (ω)

2.5.3 The coherency spectrum

If there is no statistical relationship between the series at a given frequency, then

f_{x y} (ω) = 0

, and the smoothed estimate

{\hat{f}}_{x y} (ω)

, will be close to

0

. This is assessed by the squared coherency between the series:

\hat{W} (ω) = \frac{{| {\hat{f}}_{x y} (ω) |}^{2}}{{\hat{f}}_{x x} (ω) {\hat{f}}_{y y} (ω)}

where

{\hat{f}}_{x x} (ω)

is the corresponding smoothed univariate spectrum estimate for

x_{t}

, and similarly for

y_{t}

. The coherency can be treated as a squared multiple correlation. It is similarly invariant in theory not only to simple scaling of

x_{t}

and

y_{t}

, but also to filtering of the two series, and provides a useful test statistic for the relationship between autocorrelated series. Note that without smoothing,

{| f_{x y}^{*} (ω) |}^{2} = f_{x x}^{*} (ω) f_{y y}^{*} (ω),

so the coherency is

1

at all frequencies, just as a correlation is

1

for a sample of size

1

. Thus smoothing is essential for cross-spectrum analysis.

2.5.4 The gain and noise spectrum

y_{t}

is believed to be related to

x_{t}

by a linear lagged relationship as in Section 2.3, i.e.,

y_{t} = v_{0} x_{t} + v_{1} x_{t - 1} + v_{2} x_{t - 2} + \dots + n_{t},

then the theoretical cross-spectrum is

f_{x y} (ω) = V (ω) f_{x x} (ω)

where

V (ω) = G (ω) e^{i ϕ (ω)} = \sum_{k = 0}^{\infty} v_{k} e^{i k ω}

is called the frequency response of the relationship.

Thus if

x_{t}

were a sinusoidal wave at frequency

ω

(and

n_{t}

were absent),

y_{t}

would be similar but multiplied in amplitude by

G (ω)

and shifted in phase by

ϕ (ω)

. Furthermore, the theoretical univariate spectrum

f_{y y} (ω) = G {(ω)}^{2} f_{x x} (ω) + f_{n} (ω)

where

n_{t}

, with spectrum

f_{n} (ω)

, is assumed independent of the input

x_{t}

Cross-spectral analysis thus furnishes estimates of the gain

\hat{G} (ω) = | {\hat{f}}_{x y} (ω) | / {\hat{f}}_{x x} (ω)

and the phase

\hat{ϕ} (ω) = \arg ({\hat{f}}_{x y} (ω)) .

From these representations of the estimated frequency response

\hat{V} (ω)

, parametric transfer function (TF) models may be recognized and selected. The noise spectrum may also be estimated as

{\hat{f}}_{y ∣ x} (ω) = {\hat{f}}_{y y} (ω) (1 - \hat{W} (ω))

a formula which reflects the fact that in essence a regression is being performed of the sinusoidal components of

y_{t}

on those of

x_{t}

over each frequency band.

Interpretation of the frequency response may be aided by extracting from

\hat{V} (ω)

estimates of the impulse response function (IRF)

{\hat{v}}_{k}

. It is assumed that there is no anticipatory response between

y_{t}

and

x_{t}

, i.e., no coefficients

v_{k}

with

k = −1

- 2

are needed (their presence might indicate feedback between the series).

2.5.5 Cross-spectrum smoothing by lag window

The estimate of the cross-spectrum is calculated from the sample cross-variances as

{\hat{f}}_{x y} (ω) = \frac{1}{2 π} \sum_{- M + S}^{M + S} w_{k - S} c_{x y} (k) e^{i ω k} .

The lag window

w_{k}

extends up to a truncation lag

M

as in the univariate case, but its centre is shifted by an alignment lag

S

usually chosen to coincide with the peak cross-correlation. This is equivalent to an alignment of the series for peak cross-correlation at lag

0

, and reduces bias in the phase estimation.

The selection of the truncation lag

M

, which fixes the bandwidth of the estimate, is based on the same criteria as for univariate series, and the same choice of

M

and window shape should be used as in univariate spectrum estimation to obtain valid estimates of the coherency, gain, etc., and test statistics.

2.5.6 Direct smoothing of the cross-spectrum

The computations are exactly as for smoothing of the univariate spectrum except that allowance is made for an implicit alignment shift

S

between the series.

2.6 Kalman Filters

2.6.1 Linear State Space Models

Kalman filtering provides a method for the analysis of multidimensional time series. The underlying model is:

X_{t + 1} = A_{t} X_{t} + B_{t} W_{t}

(8)

Y_{t} = C_{t} X_{t} + V_{t}

(9)

where

X_{t}

is the unobserved state vector,

Y_{t}

is the observed measurement vector,

W_{t}

is the state noise,

V_{t}

is the measurement noise,

A_{t}

is the state transition matrix,

B_{t}

is the noise coefficient matrix and

C_{t}

is the measurement coefficient matrix at time

t

. The state noise and the measurement noise are assumed to be uncorrelated with zero mean and covariance matrices:

E {W_{t} W_{t}^{T}} = Q_{t} and E {V_{t} V_{t}^{T}} = R_{t} .

If the system matrices

A_{t}

B_{t}

C_{t}

and the covariance matrices

Q_{t}, R_{t}

are known then Kalman filtering can be used to compute the minimum variance estimate of the stochastic variable

X_{t}

The estimate of

X_{t}

given observations

Y_{1}

Y_{t - 1}

is denoted by

{\hat{X}}_{t ∣ t - 1}

with state covariance matrix

E {{\hat{X}}_{t ∣ t - 1} {\hat{X}}_{t ∣ t - 1}^{T}} = P_{t ∣ t - 1}

while the estimate of

X_{t}

given observations

Y_{1}

Y_{t}

is denoted by

{\hat{X}}_{t ∣ t}

with covariance matrix

E {{\hat{X}}_{t ∣ t} {\hat{X}}_{t ∣ t}^{T}} = P_{t ∣ t}

The update of the estimate,

{\hat{X}}_{t + 1 ∣ t}

, from time

t

to time

t + 1

, is computed in two stages.

First, the update equations are

{\hat{X}}_{t ∣ t} = {\hat{X}}_{t ∣ t - 1} + K_{t} r_{t}, P_{t ∣ t} = (I - K_{t} C_{t}) P_{t ∣ t - 1}

where the residual

r_{t} = Y_{t} - C_{t} X_{t ∣ t - 1}

has an associated covariance matrix

H_{t} = C_{t} P_{t ∣ t - 1} C_{t}^{T} + R_{t}

, and

K_{t}

is the Kalman gain matrix with

K_{t} = P_{t ∣ t - 1} C_{t}^{T} H_{t}^{- 1} .

The second stage is the one-step-ahead prediction equations given by

{\hat{X}}_{t + 1 ∣ t} = A_{t} {\hat{X}}_{t ∣ t}, P_{t + 1 ∣ t} = A_{t} P_{t ∣ t} A_{t}^{T} + B_{t} Q_{t} B_{t}^{T} .

These two stages can be combined to give the one-step-ahead update-prediction equations

{\hat{X}}_{t + 1 ∣ t} = A_{t} {\hat{X}}_{t ∣ t - 1} + A_{t} K_{t} r_{t} .

The above equations thus provide a method for recursively calculating the estimates of the state vectors

{\hat{X}}_{t ∣ t}

and

{\hat{X}}_{t + 1 ∣ t}

and their covariance matrices

P_{t ∣ t}

and

P_{t + 1 ∣ t}

from their previous values. This recursive procedure can be viewed in a Bayesian framework as being the updating of the prior by the data

Y_{t}

The initial values

{\hat{X}}_{1 ∣ 0}

and

P_{1 ∣ 0}

are required to start the recursion. For stationary systems,

P_{1 ∣ 0}

can be computed from the following equation:

P_{1 ∣ 0} = A_{1} P_{1 ∣ 0} A_{1}^{T} + B_{1} Q_{1} B_{1}^{T},

which can be solved by iterating on the equation. For

{\hat{X}}_{1 ∣ 0}

the value

E {X}

can be used if it is available.

2.6.1.1 The information filter

An alternative set of Kalman filter equations can be constructed which use the inverse of the covariance matrices. These matrices (e.g.,

P_{t + 1 ∣ t}^{−1}

) are also positive semidefinite and are termed information matrices.

Although the information filter has the disadvantage that it requires the inverses

A_{t}^{- 1}

and

R_{t}^{- 1}

to be computed, it is preferable to the covariance filter in situations where there is no (very little) information concerning the initial state of the system. In these circumstances the covariance filter will fail because the initial state covariance matrix

P_{0 ∣ - 1}

is infinite (very large), whilst the corresponding information filter initial state

P_{0 ∣ - 1}^{−1} = 0

(very small) incurs no such difficulties.

The information filter recursion is described by the following equations

P_{t + 1 ∣ t}^{−1} = [I - N_{t} {B^{T}}_{t}] M_{t}

(10)

P_{t + 1 ∣ t + 1}^{−1} = P_{t + 1 ∣ t}^{−1} + C_{t + 1}^{T} R_{t + 1}^{−1} C_{t + 1}

(11)

where ​ M_{t} = {(A_{t}^{- 1})}^{T} P_{t ∣ t}^{−1} A_{t}^{- 1}

and ​ N_{t} = M_{t} B_{t} {[Q_{t}^{- 1} + {B^{T}}_{t} M_{t} B_{t}]}^{−1}

{\hat{a}}_{t + 1 ∣ t} = [I - N_{t} {B^{T}}_{t}] {(A_{t}^{- 1})}^{T} {\hat{a}}_{t ∣ t}

(12)

{\hat{a}}_{t + 1 ∣ t + 1} = {\hat{a}}_{t + 1 ∣ t} + C_{t + 1}^{T} R_{t + 1}^{−1} Y_{t + 1}

(13)

where ​ {\hat{a}}_{t + 1 ∣ t} = P_{t + 1 ∣ t}^{−1} {\hat{X}}_{t + 1 ∣ t}

(14)

and ​ {\hat{a}}_{t + 1 ∣ t + 1} = P_{t + 1 ∣ t + 1}^{−1} {\hat{X}}_{t + 1 ∣ t + 1} .

(15)

2.6.1.2 Computational methods

To improve the stability of the computations the square root algorithm is used. One recursion of the square root covariance filter algorithm which can be summarised as follows:

(\begin{matrix} R_{t}^{1 / 2} & C_{t} S_{t} & 0 \\ 0 & A_{t} S_{t} & B_{t} Q_{t}^{1 / 2} \end{matrix}) U = (\begin{matrix} H_{t}^{1 / 2} & 0 & 0 \\ G_{t} & S_{t + 1} & 0 \end{matrix})

where

U

is an orthogonal transformation triangularizing the left-hand pre-array to produce the right-hand post-array,

S_{t}

is the lower triangular Cholesky factor of the state covariance matrix

P_{t + 1 ∣ t}

Q_{t}^{1 / 2}

and

R_{t}^{1 / 2}

are the lower triangular Cholesky factor of the covariance matrices

Q

and

R

and

H_{t}^{1 / 2}

is the lower triangular Cholesky factor of the covariance matrix of the residuals. The relationship between the Kalman gain matrix,

K_{t}

, and

G_{t}

is given by

A_{t} K_{t} = G_{t} {(H_{t}^{1 / 2})}^{−1} .

To improve the efficiency of the computations when the matrices

A_{t}, B_{t}

and

C_{t}

do not vary with time the system can be transformed to give a simpler structure. The transformed state vector is

U^{*} X

where

U^{*}

is the transformation that reduces the matrix pair

(A, C)

to lower observer Hessenberg form. That is, the matrix

U^{*}

is computed such that the compound matrix

(\begin{matrix} C {U^{*}}^{T} \\ U^{*} A {U^{*}}^{T} \end{matrix})

is a lower trapezoidal matrix. The transformations need only be computed once at the start of a series, and the covariance matrices

Q_{t}

and

R_{t}

can still be time-varying.

2.6.1.3 The square root information filter

The time-varying square root information Kalman filter (g13ecc) provided by this chapter requires the construction of the following block matrix pre-array and block matrix post-array.

\begin{matrix} U_{2} (\begin{matrix} Q_{t}^{- 1 / 2} & 0 & 0 \\ S_{t}^{- 1} A_{t}^{- 1} B_{t} & S_{t}^{- 1} A_{t}^{- 1} & S_{t}^{- 1} {\hat{X}}_{t ∣ t} \\ 0 & R_{t + 1}^{- 1 / 2} C_{t + 1} & R_{t + 1}^{- 1 / 2} Y_{t + 1} \end{matrix}) & = (\begin{matrix} F_{t + 1}^{- 1 / 2} & * & * \\ 0 & S_{t + 1}^{−1} & ξ_{t + 1 ∣ t + 1} \\ 0 & 0 & E_{t + 1} \end{matrix}) \\ (Pre-array) & (Post-array) \end{matrix}

(16)

where the asterisk represents elements that are not required,

U_{2}

is an orthogonal transformation triangularizing the pre-array and

F_{t + 1}

, the matrix containing the innovations in the process noise, is given by

F_{t + 1}^{−1} = Q_{t}^{- 1} + {B^{T}}_{t} M_{t} B_{t} .

The matrices

P_{t ∣ t}^{−1}

Q_{t}^{- 1}

F_{t + 1}^{−1}

and

R_{t}^{- 1}

have been Cholesky factorized as follows:

P_{t ∣ t}^{−1} = {(S_{t}^{- 1})}^{T} S_{t}^{- 1}

Q_{t}^{- 1} = {(Q_{t}^{- 1 / 2})}^{T} Q_{t}^{- 1 / 2}

R_{t}^{- 1} = {(R_{t}^{- 1 / 2})}^{T} R_{t}^{- 1 / 2}

F_{t + 1}^{−1} = {(F_{t + 1}^{- 1 / 2})}^{T} F_{t + 1}^{- 1 / 2}

where the right factors are upper triangular.

The new state estimate is computed via

X_{t + 1 ∣ t + 1} = S_{t + 1} ξ_{t + 1 ∣ t + 1} .

(17)

That this method is computationally equivalent to equations (10)–(15) can be demonstrated by transposing (16), ‘squaring’ the right-hand side to eliminate the orthogonal matrix

U_{2}

and then, after performing a block Cholesky decomposition, equating block matrix elements on either side. It can similarly be shown that transposition of rows 2 and 3 of the pre-array, as occurs in function g13edc, does not affect the elements in the resultant post-array.

2.6.1.4 Time invariant condensed square root filters

When the system matrices

A

B

and

C

are time invariant, it can be advantageous to perform initial unitary transformations to ‘condense’ them (create as many zeros as possible) and thereby significantly reduce the number of floating-point operations required by the algorithm. Essentially this entails creating an appropriate unitary transformation matrix

U

and solving for the new state vector

X_{t} = U X

in the transformed reference frame. After the required number of Kalman filter iterations have been performed the back transformation

X = U^{T} X_{t}

provides the estimated state vector in the original reference frame. It can be shown that the transformed system matrices for the covariance filter are given by

{U A U^{T}, U B, C U^{T}}

, which are in agreement with the arguments required by g13ebc. It can similarly be shown that the system matrices describing the corresponding transformed information filter are

{U A^{- 1} U^{T}, U B, C U^{T}}

. These correspond to the arguments used by g13edc (

U A^{- 1} U^{T}

U A^{- 1} B

C U^{T}

), where the second matrix is input as the product of

U A^{- 1} U^{T}

and

U B

. It should be noted that in the transformed frame the covariance matrix

P_{t ∣ t}^{'}

is related to the original covariance matrix via the similarity transformation

P_{t ∣ t}^{'} = U P_{t ∣ t} U^{T} ({(P_{t ∣ t}^{'})}^{−1} = U ({P^{'}}_{t ∣ t}^{−1}) U^{T})

. This means that, for square root Kalman filter functions, the appropriate Cholesky factor of

P_{t ∣ t}^{'}

must be input.

The condensed matrix forms used by the functions in this chapter are lower observer Hessenberg form, in the case of g13ebc, where the compound matrix

(\frac{U A U^{T}}{C U^{T}})

is lower trapezoidal and upper controller Hessenberg form, in the case of g13edc, where the compound matrix

(U B ∣ U A U^{T})

is upper trapezoidal.

Both g13ebc and g13edc contain the block matrix

(\begin{matrix} C U^{T} \\ U B & U A U^{T} \end{matrix})

within their pre-array, and the structure of this matrix (for

n = 6

m = 3

and

p = 2

) is illustrated below for both Hessenberg forms

Lower observer Hessenberg

(\begin{matrix} x & 0 & 0 & 0 & 0 & 0 \\ x & x & 0 & 0 & 0 & 0 \\ x & x & x & x & x & x & 0 & 0 & 0 \\ x & x & x & x & x & x & x & 0 & 0 \\ x & x & x & x & x & x & x & x & 0 \\ x & x & x & x & x & x & x & x & x \\ x & x & x & x & x & x & x & x & x \\ x & x & x & x & x & x & x & x & x \end{matrix}) .

Upper controller Hessenberg

(\begin{matrix} x & x & x & x & x & x \\ x & x & x & x & x & x \\ x & x & x & x & x & x \\ x & x & x & x & x & x & x & x \\ 0 & x & x & x & x & x & x & x \\ 0 & 0 & x & x & x & x & x & x \\ 0 & 0 & 0 & x & x & x & x & x \\ 0 & 0 & 0 & 0 & x & x & x & x \\ 0 & 0 & 0 & 0 & 0 & x & x & x \end{matrix}) .

2.6.1.5 Model fitting and forecasting

If the state space model contains unknown parameters,

θ

, these can be estimated using maximum likelihood (ML). Assuming that

W_{t}

and

V_{t}

are normal variates the log-likelihood for observations

Y_{t}

, for

t = 1, 2, \dots, n

, is given by

constant - \frac{1}{2} \sum_{t = 1}^{n} ln (\det (H_{t})) - \frac{1}{2} \sum_{t = 1}^{t} r_{t}^{T} H_{t}^{- 1} r_{t} .

Optimal estimates for the unknown model parameters

θ

can then be obtained by using a suitable optimizer function to maximize the likelihood function.

Once the model has been fitted forecasting can be performed by using the one-step-ahead prediction equations. The one-step-ahead prediction equations can also be used to ‘jump over’ any missing values in the series.

2.6.1.6 Kalman filter and time series models

Many commonly used time series models can be written as state space models. A univariate

ARMA (p, q)

model can be cast into the following state space form:

\begin{matrix} x_{t} & = A x_{t - 1} + B ε_{t} \\ w_{t} & = C x_{t} \end{matrix}

A = (\begin{array}{l} ϕ_{1} & 1 \\ ϕ_{2} & 1 \\ . & . \\ . & . \\ ϕ_{r - 1} & 1 \\ ϕ_{r} & 0 & 0 & . & . & 0 \end{array}), B = (\begin{array}{l} 1 \\ - θ_{1} \\ - θ_{2} \\ . \\ . \\ - θ_{r - 1} \end{array}) and C^{T} = (\begin{array}{l} 1 \\ 0 \\ 0 \\ . \\ . \\ 0 \end{array}),

where

r = \max (p, q + 1)

The representation for a

k

-variate

ARMA (p, q)

series (VARMA) is very similar to that given above, except now the state vector is of length

k r

and the

ϕ

and

θ

are now

k \times k

matrices and the 1s in

A

B

and

C

are now the identity matrix of order

k

. If

p < r

q + 1 < r

then the appropriate

ϕ

θ

matrices are set to zero, respectively.

Since the compound matrix

(\begin{matrix} C \\ A \end{matrix})

is already in lower observer Hessenberg form (i.e., it is lower trapezoidal with zeros in the top right-hand triangle) the invariant Kalman filter algorithm can be used directly without the need to generate a transformation matrix

U^{*}

2.6.2 Nonlinear State Space Models

A nonlinear state space model, with additive noise, can, at time

t

, be described by:

\begin{matrix} x_{t + 1} & = F (x_{t}) + v_{t} \\ y_{t} & = H (x_{t}) + u_{t} \end{matrix}

(18)

where

x_{t}

represents the unobserved state vector of length

m_{x}

and

y_{t}

the observed measurement vector of length

m_{y}

. The process noise is denoted

v_{t}

, which is assumed to have mean zero and covariance structure

Σ_{x}

, and the measurement noise by

u_{t}

, which is assumed to have mean zero and covariance structure

Σ_{y}

. The two nonlinear functions,

F

and

H

may be time dependent. Two methods are commonly used to analyse nonlinear state space models: the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF).

The EKF solves the nonlinear state space model by first linearising the set of equations given in (18) using a first order taylor expansion around

{\hat{x}}_{t}

(the estimate of the state vector at time

t

given the full data:

y_{1}, y_{2}, \dots, y_{t}

) in the case of

F

and around

{\hat{x}}_{\bar{t}}

(the estimate of the state vector at time

t

given the partial data:

y_{1}, y_{1}, \dots, y_{t - 1}

) in the case of

H

. This leads to the linear state space model:

\begin{matrix} x_{t + 1} & \approx (F^{'}) x_{t} + v_{t} + F^{'} - F ({\hat{x}}_{t}) {\hat{x}}_{t} \\ y_{t} - H ({\hat{x}}_{\bar{t}}) + (H^{'}) {\hat{x}}_{\bar{t}} & \approx (H^{'}) x_{t} + u_{t} \end{matrix}

where

\begin{matrix} F^{'} & = {\frac{\partial F (x)}{\partial x} |}_{x = {\hat{x}}_{t}} \\ H^{'} & = {\frac{\partial H (x)}{\partial x} |}_{x = {\hat{x}}_{\bar{t}}} \end{matrix}

This linear state space model can then be solved using the standard Kalman Filter. See Haykin (2001) for more details.

Unlike the EKF, the UKF of Julier and Uhlmann (1997) does not attempt to linearise the problem, rather it uses a minimal set of carefully chosen points, called sigma points, to capture the mean and covariance of the underlying Gaussian random variables. These points are then propagated through the nonlinear functions giving an estimate of the transformed mean and covariance. A brief description of the UKF can be found in Section 3 in g13ekc.

2.7 GARCH Models

2.7.1 ARCH models and their generalizations

Rather than modelling the mean (for example using regression models) or the autocorrelation (by using ARMA models) there are circumstances in which the variance of a time series needs to be modelled. This is common in financial data modelling where the variance (or standard deviation) is known as volatility. The ability to forecast volatility is a vital part in deciding the risk attached to financial decisions like portfolio selection. The basic model for relating the variance at time

t

to the variance at previous times is the autoregressive conditional heteroskedastic (ARCH) model. The standard ARCH model is defined as

\begin{matrix} y_{t} ∣ ψ_{t - 1} \sim N (0, h_{t}), \\ h_{t} = α_{0} + \sum_{i = 1}^{q} α_{i} ε_{t - i}^{2}, \end{matrix}

where

ψ_{t}

is the information up to time

t

and

h_{t}

is the conditional variance.

In a similar way to that in which autoregressive (AR) models were generalized to ARMA models the ARCH models have been generalized to a GARCH model; see Engle (1982), Bollerslev (1986) and Hamilton (1994)

h_{t} = α_{0} + \sum_{i = 1}^{q} α_{i} ε_{t - i}^{2} + \sum_{i = 1}^{p} β h_{t - i} .

This can be combined with a regression model:

y_{t} = b_{0} + \sum_{i = 1}^{k} b_{i} x_{i t} + ε_{t},

where

ε_{t} ∣ ψ_{t - 1} \sim N (0, h_{t})

and where

x_{i t}

, for

i = 1, 2, \dots, k

, are the exogenous variables.

The above models assume that the change in variance,

h_{t}

, is symmetric with respect to the shocks, that is, that a large negative value of

ε_{t - 1}

has the same effect as a large positive value of

ε_{t - 1}

. A frequently observed effect is that a large negative value

ε_{t - 1}

often leads to a greater variance than a large positive value. The following three asymmetric models represent this effect in different ways using the parameter

γ

as a measure of the asymmetry.

Type I AGARCH(

p, q

)

h_{t} = α_{0} + \sum_{i = 1}^{q} α_{i} {(ε_{t - i} + γ)}^{2} + \sum_{i = 1}^{p} β_{i} h_{t - i} .

Type II AGARCH(

p, q

)

h_{t} = α_{0} + \sum_{i = 1}^{q} α_{i} {(| ε_{t - i} | + γ ε_{t - i})}^{2} + \sum_{i = 1}^{p} β_{i} h_{t - i} .

GJR-GARCH(

p, q

), or Glosten, Jagannathan and Runkle GARCH (see Glosten et al. (1993))

h_{t} = α_{0} + \sum_{i = 1}^{q} (α_{i} + γ I_{t - 1}) ε_{t - 1}^{2} + \sum_{i = 1}^{p} β_{i} h_{t - i},

where

I_{t} = 1

ε_{t} < 0

and

I_{t} = 0

ε_{t} \geq 0

The first assumes that the effects of the shocks are symmetric about

γ

rather than zero, so that for

γ < 0

the effect of negative shocks is increased and the effect of positive shocks is decreased. Both the Type II AGARCH and the GJR GARCH (see Glosten et al. (1993)) models introduce asymmetry by increasing the value of the coefficient of

ε_{t - 1}^{2}

for negative values of

ε_{t - 1}

. In the case of the Type II AGARCH the effect is multiplicative while for the GJR GARCH the effect is additive.

(Note that in the case of GJR GARCH,

γ

needs to be positive to inflate variance after negative shocks while for Type I and Type II AGARCH,

γ

needs to be negative.)

A third type of GARCH model is the exponential GARCH (EGARCH). In this model the variance relationship is on the log scale and hence asymmetric.

\ln (h_{t}) = α_{0} + \sum_{i = 1}^{q} α_{i} z_{t - i} + \sum_{i = 1}^{q} ϕ_{i} (| z_{t - i} | - E [| z_{t - i} |]) + \sum_{i = 1}^{p} β_{i} \ln (h_{t - i}),

where

z_{t} = \frac{ε_{t}}{\sqrt{h_{t}}}

and

E [| z_{t - i} |]

denotes the expected value of

| z_{t - i} |

Note that the

ϕ_{i}

terms represent a symmetric contribution to the variance while the

α_{i}

terms give an asymmetric contribution.

2.7.2 Fitting GARCH models

The models are fitted by maximizing the conditional log-likelihood. For the Normal distribution the conditional log-likelihood is

\frac{1}{2} \sum_{i = 1}^{T} (\log (h_{i}) + \frac{ε_{i}^{2}}{h_{i}}) .

For the Student's

t

-distribution the function is more complex. An approximation to the standard errors of the parameter estimates is computed from the Fisher information matrix.

2.8 Inhomogeneous Time Series

If we denote a generic univariate time series as a sequence of pairs of values

(z_{i}, t_{i})

, for

i = 1, 2, \dots

where the

z

's represent an observed scalar value and the

t

's the time that the value was observed, then in a standard time series analysis, as discussed in other sections of this introduction, it is assumed that the series being analysed is homogeneous, that is the sampling times are regularly spaced with

t_{i} - t_{i - 1} = δ

for some value

δ

. In many real world applications this assumption does not hold, that is, the series is inhomogeneous.

Standard time series analysis techniques cannot be used on an inhomogeneous series without first preprocessing the series to construct an artificial homogeneous series, by for example, resampling the series at regular intervals. Zumbach and Müller (2001) introduced a series of operators that can be used to extract robust information directly from the inhomogeneous time series. In this context, robust information means that the results should be essentially independent of minor changes to the sampling mechanism used when collecting the data, for example, changing a number of time stamps or adding or removing a few observations.

The basic operator available for inhomogeneous time series is the exponential moving average (EMA). This operator has a single parameter,

τ

, and is an average operator with an exponentially decaying kernel given by:

\frac{e^{- t / τ}}{τ} .

This gives rise to the following iterative formula:

EMA [τ; z] (t_{i}) = μ EMA [τ; z] (t_{i - 1}) + (ν - μ) z_{i - 1} + (1 - ν) z_{i}

where

μ = e^{- α} and α = \frac{t_{i} - t_{i - 1}}{τ} .

The value of

ν

depends on the method of interpolation chosen. Three interpolation methods are available:

1.	Previous point:	$ν = 1$ .
2.	Linear:	$ν = (1 - μ) / α$ .
3.	Next point:	$ν = μ$ .

Given the EMA, a number of other operators can be defined, including:

(i) $m$ -Iterated Exponential Moving Average, defined as

$EMA [τ, m; z] = EMA [τ; EMA [τ, m - 1; z]] where EMA [τ, 1; z] = EMA [τ; z] .$
(ii)Moving Average (MA), defined as

$MA [τ, m_{1}, m_{2}; z] (t_{i}) = \frac{1}{m_{2} - m_{1} + 1} \sum_{j = m_{1}}^{m_{2}} EMA [\tilde{τ}, j; z] (t_{i}) where \tilde{τ} = \frac{2 τ}{m_{2} + m_{1}}$
(iii)Moving Norm (MNorm), defined as

$MNorm (τ, m, p; z) = {MA [τ, 1, m; {| z |}^{p}]}^{1 / p}$
(iv)Moving Variance (MVar), defined as

$MVar (τ, m, p; z) = MA [τ, 1, m; {| z - MA [τ, 1, m; z] |}^{p}]$
(v)Moving Standard Deviation (MSD), defined as

$MSD (τ, m, p; z) = {MA [τ, 1, m; {| z - MA [τ, 1, m; z] |}^{p}]}^{1 / p}$
(vi)Differential ( $Δ$ ), defined as

$Δ [τ, α, β, γ; z] = γ (EMA [α τ, 1; z] + EMA [α τ, 2; z] - 2 EMA [α β τ, 4; z])$
(vii)Volatility, defined as

$Volatility [τ, τ^{'}, m, p; z] = MNorm (τ / 2, m, p; Δ [τ^{'}; z])$

A discussion of each of these operators, their use and in some cases, alternative definitions, are given in Zumbach and Müller (2001).

3 Recommendations on Choice and Use of Available Functions

3.1 Univariate Analysis

The availability of functions for each of these four steps is given below.

3.1.1 ARMA-type Models

ARMA-type modelling usually follows the methodology made popular by Box and Jenkins. It consists of four steps: identification, model fitting, model checking and forecasting.

(a)Model identification
The function g13auc may be used in obtaining either a range-mean or standard deviation-mean plot for a series of observations, which may be useful in detecting the need for a variance-stabilizing transformation. g13auc computes the range or standard deviation and the mean for successive groups of observations that may then be used to produce a scatter plot of range against mean or of standard deviation against mean.

The function g13aac may be used to difference a time series. The $N = n - d - s \times D$ values of the differenced time series which extends for $t = 1 + d + s \times D, \dots, n$ are stored in the first $N$ elements of the output array.

The function g13abc may be used for direct computation of the autocorrelations. It requires the time series as input, after optional differencing by g13aac.
An alternative is to use g13cac, which uses the fast Fourier transform (FFT) to carry out the convolution for computing the autocovariances. Circumstances in which this is recommended are
1. (i)if the main aim is to calculate the smoothed sample spectrum;
2. (ii)if the series length and maximum lag for the autocorrelations are both very large, in which case appreciable computing time may be saved.
For more precise recommendations, see Gentleman and Sande (1966). In this case the autocorrelations $r_{k}$ need to be obtained from the autocovariances $c_{k} \times r_{k} = c_{k} / c_{0}$ .

The function g13acc computes the partial autocorrelation function (PACF) and prediction error variance estimates from an input autocorrelation function (ACF). Note that g13dnc, which is designed for multivariate time series, may also be used to compute the PACF together with $χ^{2}$ statistics and their significance levels.

Finite lag predictor coefficients are also computed by the function g13acc. It may have to be used twice, firstly with a large value for the maximum lag $L$ in order to locate the optimum final prediction error (FPE) lag, then again with $L$ reset to this lag.

The function g13dxc may be used to check that the autoregressive (AR) part of the model is stationary and that the moving-average (MA) part is invertible.
(b)Model estimation
ARIMA models may be fitted using the function g13bec. This function can fit both simple ARIMA models as well as more complex multi-input models. There is a choice of using least squares or maximum likelihood (ML) estimation.

The function g13ddc is primarily designed for fitting vector ARMA models to multivariate time series but may also be used in a univariate mode. It allows the use of either the exact or conditional likelihood estimation criterion, and allows you to fit non-multiplicative seasonal models which are not available in g13bec.
(c)Model checking
The function g13asc calculates the correlations in the residuals from a model fitted by g13bec. In addition the standard errors and correlations of the residual autocorrelations are computed along with a portmanteau test for model adequacy. g13asc can be used after a univariate model has been fitted by g13bec, but care must be taken in selecting the correct inputs to g13asc. Note that if g13ddc has been used to fit a non-multiplicative seasonal model to a univariate series then g13dsc may be used to check the adequacy of the model.
(d)Forecasting using an ARIMA model
The function g13bjc can be used to compute forecasts using a specified ARIMA model using the observed values of the series. If some further observations $x_{n + 1}, x_{n + 2}, \dots$ have come to hand since model estimation (and there is no desire to re-estimate the model using the extended series), then g13bgc can be used to update the state set using the new observations, prior to forecasting from the end of the extended series. The original series is not required.

3.1.2 Exponential smoothing

A variety of different smoothing methods are provided by g13amc, including; single exponential, Brown's double exponential, linear Holt (also called double exponential smoothing in some references), additive Holt–Winters and multiplicative Holt–Winters. The choice of smoothing method used depends on the characteristics of the time series. If the mean of the series is only slowly changing then single exponential smoothing may be suitable. If there is a trend in the time series, which itself may be slowly changing, then double exponential smoothing may be suitable. If there is a seasonal component to the time series, e.g., daily or monthly data, then one of the two Holt–Winters methods may be suitable.

3.1.3 Change point analysis

Four functions are available for change point analysis, two implementing the PELT algorithm (g13nac and g13nbc) and two binary segmentation (g13ndc and g13nec). Of these, g13nac and g13ndc have six pre-defined cost functions based on the log-likelihood of the Normal, Gamma, Exponential and Poisson distributions. In the case of the Normal distribution changes in the mean, standard deviation or both can be investigated. The remaining two functions, g13nbc and g13nec take a user-supplied cost function.

Binary segmentation only returns an approximate solution to the change point problem as defined in equation (5). It is, therefore, recommended that the PELT algorithm is used in most cases. However, for long time series the binary segmentation algorithm may give a marked improvement in terms of speed especially if the maximum depth for the iterative process (mdepth) is set to a low value.

3.2 Univariate Spectral Analysis

Two functions are available, g13cac carrying out smoothing using a lag window and g13cbc carrying out direct frequency domain smoothing. Both can take as input the original series, but g13cac alone can use the sample autocovariances as alternative input. This has some computational advantage if a variety of spectral estimates needs to be examined for the same series using different amounts of smoothing.

However, the real choice in most cases will be which of the four shapes of lag window in g13cac to use, or whether to use the trapezium frequency window of g13cbc. The references may be consulted for advice on this, but the two most recommended lag windows are the Tukey and Parzen. The Tukey window has a very small risk of supplying negative spectrum estimates; otherwise, for the same bandwidth, both give very similar results, though the Parzen window requires a higher truncation lag (more autocorrelation function (ACF) values).

The frequency window smoothing procedure of g13cbc with a trapezium shape parameter

p ≃ \frac{1}{2}

generally gives similar results for the same bandwidth as lag window methods with a slight advantage of somewhat less distortion around sharp peaks, but suffering a rather less smooth appearance in fine detail.

3.3 Linear Lagged Relationships Between Time Series

The availability of functions for each of four steps: identification, model fitting, model checking and forecasting, is given below.

(a)Model identification
Normally use g13bcc for direct computation of cross-correlations, from which cross-covariances may be obtained by multiplying by $s_{y} s_{x}$ , and impulse response estimates (after prewhitening) by multiplying by $s_{y} / s_{x}$ , where $s_{y}, s_{x}$ are the sample standard deviations of the series.

An alternative is to use g13ccc, which exploits the fast Fourier transform (FFT) to carry out the convolution for computing cross-covariances. The criteria for this are the same as given in Section 3.1.1 for calculation of autocorrelations. The impulse response function may also be computed by spectral methods without prewhitening using g13cgc.

g13bac may be used to prewhiten or filter a series by an ARIMA model.

g13bbc may be used to filter a time series using a transfer function model.
(b)Estimation of multi-input model parameters
The function g13bdc is used to obtain preliminary estimates of transfer function model parameters. The model orders and an estimate of the impulse response function (see Section 3.2) are required.

The simultaneous estimation of the transfer function model parameters for the inputs, and ARIMA model parameters for the output, is carried out by g13bec.

This function requires values of the output and input series, and the orders of all the models. Any differencing implied by the model is carried out internally.

The function also requires the maximum number of iterations to be specified, and returns the state set for use in forecasting.
(c) Multi-input model checking
The function g13asc, primarily designed for univariate time series, can be used to test the residuals from an input-output model.
(d)Forecasting using a multi-input model
The function g13bjc can be used to compute forecasts for a specified multi-input model using the observed values of the series. Forecast for the input series have to be provided.
(e)Filtering a time series using a transfer function model
The function for this purpose is g13bbc.

3.4 Multivariate Time Series

The availability of functions for each of four steps: identification, model fitting, model checking and forecasting, is given below.

(a)Model identification
The function g13dlc may be used to difference the series. You must supply the differencing parameters for each component of the multivariate series. The order of differencing for each individual component does not have to be the same. The function may also be used to apply a log or square root transformation to the components of the series.

The function g13dmc may be used to calculate the sample cross-correlation or cross-covariance matrices. It requires a set of time series as input. You may request either the cross-covariances or cross-correlations.

The function g13dnc computes the partial lag correlation matrices from the sample cross-correlation matrices computed by g13dmc, and the function g13dpc computes the least squares estimates of the partial autoregression matrices and their standard errors. Both functions compute a series of $χ^{2}$ statistic that aid the determination of the order of a suitable autoregressive model. g13dbc may also be used in the identification of the order of an autoregressive model. The function computes multiple squared partial autocorrelations and predictive error variance ratios from the sample cross-correlations or cross-covariances computed by g13dmc.

The function g13dxc may be used to check that the autoregressive part of the model is stationary and that the moving-average part is invertible.
(b)Estimation of VARMA model parameters
The function for this purpose is g13ddc. This function requires a set of time series to be input, together with values for $p$ and $q$ . You must also specify the maximum number of likelihood evaluations to be permitted and which parameters (if any) are to be held at their initial (user-supplied) values. The fitting criterion is either exact maximum likelihood (ML) or conditional maximum likelihood.

g13ddc is primarily designed for estimating relationships between time series. It may, however, easily be used in univariate mode for non-seasonal and non-multiplicative seasonal ARIMA model estimation. The advantage is that it allows (optional) use of the exact maximum likelihood (ML) estimation criterion, which is not available in g13bec. The conditional likelihood option is recommended for those models in which the parameter estimates display a tendency to become stuck at points on the boundary of the parameter space. When one of the series is known to be influenced by all the others, but the others in turn are mutually independent and do not influence the output series, then g13bec (the transfer function (TF) model fitting function) may be more appropriate to use.
(c)VARMA model checking
g13dsc calculates the cross-correlation matrices of residuals for a model fitted by g13ddc. In addition the standard errors and correlations of the residual correlation matrices are computed along with a portmanteau test for model adequacy.
(d)Forecasting using a VARMA model
The function g13djc may be used to construct a chosen number of forecasts using the model estimated by g13ddc. The standard errors of the forecasts are also computed. A reference vector is set up by g13djc so that should any further observations become available the existing forecasts can be efficiently updated using g13dkc. On a call to g13dkc the reference vector itself is also updated so that g13dkc may be called again each time new observations are available.

3.5 Cross-spectral Analysis

Two functions are available for the main step in cross-spectral analysis. To compute the cospectrum and quadrature spectrum estimates using smoothing by a lag window, g13ccc should be used. It takes as input either the original series or cross-covariances which may be computed in a previous call of the same function or possibly using results from g13bcc. As in the univariate case, this gives some advantage if estimates for the same series are to be computed with different amounts of smoothing.

The choice of window shape will be determined as the same as that which has already been used in univariate spectrum estimation for the series.

For direct frequency domain smoothing, g13cdc should be used, with similar consideration for the univariate estimation in choice of degree of smoothing.

The cross-amplitude and squared coherency spectrum estimates are calculated, together with upper and lower confidence bounds, using g13cec. For input the cross-spectral estimates from either g13ccc or g13cdc and corresponding univariate spectra from either g13cac or g13cbc are required.

The gain and phase spectrum estimates are calculated together with upper and lower confidence bounds using g13cfc. The required input is as for g13cec above.

The noise spectrum estimates and impulse response function estimates are calculated together with multiplying factors for confidence limits on the former, and the standard error for the latter, using g13cgc. The required input is again the same as for g13cec above.

3.6 Kalman Filtering

3.6.1 Linear state space models

There are four main functions available for Kalman filtering covering both the covariance and information filters with time-varying or time-invariant filter. For covariance filters the functions are g13eac for time-varying filter and g13ebc for time-invariant filter while the equivalent for the information filter are g13ecc and g13edc respectively. In addition, for use with the time-invariant filters, the function g13ewc provides the required transformation to lower or upper observer Hessenberg form while g13exc provides the transformation to lower or upper controller Hessenberg form.

3.6.2 Nonlinear state space models

Two functions are available for analysing a nonlinear state space model: g13ejc and g13ekc. The difference between the two functions is how the nonlinear functions,

F

and

H

are supplied, with g13ejc using reverse communication and g13ekc using direct communication. See Section 7 in How to Use the NAG Library for a description of the terms reverse and direct communication.

As well as having the additional flexibility inherent in reverse communication functions g13ejc also offers an alternative method of generating the sigma points utilized by the Unscented Kalman Filter (UKF), potentially allowing for additional information to be propagated through the state space model. However, due to the increased complexity of the interface it is recommended that g13ekc is used unless this additional flexibility is definitely required.

3.7 GARCH Models

The main choice in selecting a type of GARCH model is whether the data is symmetric or asymmetric and if asymmetric what form of asymmetry should be included in the model.

A symmetric ARCH or GARCH model can be fitted by g13fac and the volatility forecast by g13fbc. For asymmetric data the choice is between the type of asymmetry as described in Section 2.7.

GARCH Type	Fit	Forecast
Type I	g13fac	g13fbc
Type II	g13fcc	g13fdc
GJR	g13fec	g13ffc

All functions allow the option of including regressor variables in the model.

3.8 Inhomogeneous Time Series

The following functions deal with inhomogeneous time series, g13mec, g13mfc and g13mgc.

Both g13mec and g13mfc calculate the

m

-iterated exponential moving average (EMA). In most cases g13mec can be used, which returns

EMA [τ, m; z]

for a given value of

m

, overwriting the input data. Sometimes it is advantageous to have access to the intermediate results, for example when calculating the differential operator, in which case g13mfc can be used, which can return

EMA [τ, i; z]

, for

i = 1, 2, \dots, m

. g13mfc can also be used if you do not wish to overwrite the input data.

The last function, g13mgc should be used if you require the moving average, (MA), moving norm (MNorm), moving variance (MVar) or moving standard deviation (MSD). Other operators can be calculated by calling a combination of these three functions and the use of simple mathematics (additions, subtractions, etc.).

3.9 Time Series Simulation

There are functions available in Chapter G05 for generating a realization of a time series from a specified model: g05phc for univariate time series and g05pjc for multivariate time series. There is also a suite of functions for simulating GARCH models: g05pdc, g05pec and g05pfc. The function g05pmc can be used to simulate data from an exponential smoothing model.

4 Functionality Index

ARMA modelling,

ACF

g13abc

diagnostic checking

g13asc

Dickey–Fuller unit root test

g13awc

differencing

g13aac

mean/range

g13auc