naginterfaces.library.tsa.uni_arima_estim¶

naginterfaces.library.tsa.uni_arima_estim(mr, par, c, x, iex, igh, ist, kpiv, zsp, kzsp, kfc=1, piv=None, nit=100, data=None, io_manager=None)[source]¶

uni_arima_estim fits a seasonal autoregressive integrated moving average (ARIMA) model to an observed time series, using a nonlinear least squares procedure incorporating backforecasting. Parameter estimates are obtained, together with appropriate standard errors. The residual series is returned, and information for use in forecasting the time series is produced for use by the functions uni_arima_update() and uni_arima_forecast_state().

The estimation procedure is iterative, starting with initial parameter values such as may be obtained using uni_arima_prelim(). It continues until a specified convergence criterion is satisfied, or until a specified number of iterations has been carried out. The progress of the procedure can be monitored by means of a user-supplied function.

For full information please refer to the NAG Library document for g13ae

https://support.nag.com/numeric/nl/nagdoc_31/flhtml/g13/g13aef.html

Parameters

mrint, array-like, shape $(7)$

The orders vector $(p, d, q, P, D, Q, s)$ of the ARIMA model whose parameters are to be estimated. $p$ , $q$ , $P$ and $Q$ refer respectively to the number of autoregressive ( $ϕ$ ), moving average $(θ)$ , seasonal autoregressive ( $Φ$ ) and seasonal moving average ( $Θ$ ) parameters. $d$ , $D$ and $s$ refer respectively to the order of non-seasonal differencing, the order of seasonal differencing and the seasonal period.

parfloat, array-like, shape $(npar)$

The initial estimates of the $p$ values of the $ϕ$ parameters, the $q$ values of the $θ$ parameters, the $P$ values of the $Φ$ parameters and the $Q$ values of the $Θ$ parameters, in that order.

cfloat

If $k f c = 0$ , $c$ must contain the expected value, $c$ , of the differenced series.

If $k f c = 1$ , $c$ must contain an initial estimate of $c$ .

xfloat, array-like, shape $(nx)$

The $n$ values of the original undifferenced time series.

iexint

The dimension of the arrays $e x$ , $e x r$ and $a l$ .

ighint

The dimension of the arrays $g$ and $s d$ .

The second dimension of the arrays $h$ and $h c$ .

istint

The dimension of the array $s t$ .

kpivint

Must be nonzero if the progress of the optimization is to be monitored using $p i v$ . Otherwise $k p i v$ must contain $0$ .

zspfloat, array-like, shape $(4)$

When $k z s p = 1$ , the first four elements of $z s p$ must contain the four values used to guide the search procedure. These are as follows.

$z s p [0]$ contains $α$ , the value used to constrain the magnitude of the search procedure steps.

$z s p [1]$ contains $β$ , the multiplier which regulates the value $α$ .

$z s p [2]$ contains $δ$ , the value of the stationarity and invertibility test tolerance factor.

$z s p [3]$ contains $γ$ , the value of the convergence criterion.

If $k z s p \neq 1$ on entry, default values for $z s p$ are supplied by the function.

These are $0.001$ , $10.0$ , $1000.0$ and $m a x (100 \times machine precision, 0.0000001)$ respectively.

kzspint

The value $1$ if the function is to use the input values of $z s p$ , and any other value if the default values of $z s p$ are to be used.

kfcint, optional

Must be set to $1$ if the constant, $c$ , is to be estimated and $0$ if it is to be held fixed at its initial value.

pivNone or callable piv(mr, par, c, kfc, icount, s, g, h, itc, zsp, data=None), optional

Note: if this argument is None then a NAG-supplied facility will be used.

$p i v$ is used to monitor the progress of the optimization.

Parameters

mrint, ndarray, shape $(7)$: $m r$ , defined as for uni_arima_estim.
parfloat, ndarray, shape $(npar)$: $p a r$ , defined as for uni_arima_estim.
cfloat: $c$ , defined as for uni_arima_estim.
kfcint: $k f c$ , defined as for uni_arima_estim.
icountint, ndarray, shape $(6)$: $i c o u n t$ , defined as for uni_arima_estim.
sfloat: $s$ , defined as for uni_arima_estim.
gfloat, ndarray, shape $(igh)$: $g$ , defined as for uni_arima_estim.
hfloat, ndarray, shape $(:, igh)$: $h$ , defined as for uni_arima_estim.
itcint: $i t c$ , defined as for uni_arima_estim.
zspfloat, ndarray, shape $(4)$: $z s p$ , defined as for uni_arima_estim.
dataarbitrary, optional, modifiable in place: User-communication data for callback functions.

nitint, optional

The maximum number of iterations to be performed.

dataarbitrary, optional

User-communication data for callback functions.

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

parfloat, ndarray, shape $(npar)$

The latest values of the estimates of these parameters.

cfloat

If $k f c = 0$ , $c$ is unchanged.

If $k f c = 1$ , $c$ contains the latest estimate of $c$ .

Therefore, if $c$ and $k f c$ are both zero on entry, there is no constant correction.

icountint, ndarray, shape $(6)$

Size of various output arrays.

$i c o u n t [0]$

Contains $q + (Q \times s)$ , the number of backforecasts.

$i c o u n t [1]$

Contains $n - d - (D \times s)$ , the number of differenced values.

$i c o u n t [2]$

Contains $d + (D \times s)$ , the number of values of reconstitution information.

$i c o u n t [3]$

Contains $n + q + (Q \times s)$ , the number of values held in each of the series $e x$ , $e x r$ and $a l$ .

$i c o u n t [4]$

Contains $n - d - (D \times s) - p - q - P - Q - k f c$ , the number of degrees of freedom associated with $S$ .

$i c o u n t [5]$

Contains $i c o u n t [0] + npar + k f c$ , the number of parameters being estimated.

These values are always computed regardless of the exit value of $errno$ .

exfloat, ndarray, shape $(i e x)$

The extended differenced series which is made up of:

$i c o u n t [0]$ backforecast values of the differenced series.

$i c o u n t [1]$ actual values of the differenced series.

$i c o u n t [2]$ values of reconstitution information.

The total number of these values held in $e x$ is $i c o u n t [3]$ .

If the function exits because of a faulty input parameter, the contents of $e x$ will be indeterminate.

exrfloat, ndarray, shape $(i e x)$

The values of the model residuals which is made up of:

$i c o u n t [0]$ residuals corresponding to the backforecasts in the differenced series.

$i c o u n t [1]$ residuals corresponding to the actual values in the differenced series.

The remaining $i c o u n t [2]$ values contain zeros.

If the function exits with $errno$ holding a value other than $0$ or $9$ , the contents of $e x r$ will be indeterminate.

alfloat, ndarray, shape $(i e x)$

The intermediate series which is made up of:

$i c o u n t [0]$ intermediate series values corresponding to the backforecasts in the differenced series.

$i c o u n t [1]$ intermediate series values corresponding to the actual values in the differenced series.

The remaining $i c o u n t [2]$ values contain zeros.

If an exception is raised, the contents of $a l$ will be indeterminate.

sfloat

The residual sum of squares after the latest series of parameter estimates has been incorporated into the model. If the function exits with a faulty input parameter, $s$ contains zero.

gfloat, ndarray, shape $(i g h)$

The latest value of the derivatives of $S$ with respect to each of the parameters being estimated (backforecasts, $p a r$ parameters, and where relevant the constant – in that order). The contents of $g$ will be indeterminate if the function exits with a faulty input parameter.

sdfloat, ndarray, shape $(i g h)$

The standard deviations corresponding to each of the parameters being estimated (backforecasts, $p a r$ parameters, and where relevant the constant, in that order).

If the function exits with $errno$ containing a value other than $0$ or $9$ , or if the required number of iterations is zero, the contents of $s d$ will be indeterminate.

hfloat, ndarray, shape $(1 + q + (Q \times s) + npar + k f c, i g h)$

The second derivative of $S$ and correlation coefficients.

the latest values of an approximation to the second derivative of $S$ with respect to each of the $(q + Q \times s + npar + k f c)$ parameters being estimated (backforecasts, $p a r$ parameters, and where relevant the constant – in that order), and
the correlation coefficients relating to each pair of these parameters.

These are held in a matrix defined by the first $(q + Q \times s + npar + k f c)$ rows and the first $(q + Q \times s + npar + k f c)$ columns of $h$ . (Note that $i c o u n t [5]$ contains the value of this expression.) The values of (a) are contained in the upper triangle, and the values of (b) in the strictly lower triangle.

These correlation coefficients are zero during intermediate printout using $p i v$ , and indeterminate if $errno$ contains on exit a value other than $0$ or $9$ .

All the contents of $h$ are indeterminate if the required number of iterations are zero.

The $(q + (Q \times s) + npar + k f c + 1)$ th row of $h$ is used internally as workspace.

stfloat, ndarray, shape $(i s t)$

The $n s t$ values of the state set array. If the function exits with $errno$ containing a value other than $0$ or $9$ , the contents of $s t$ will be indeterminate.

nstint

The number of values in the state set array $s t$ .

itcint

The number of iterations performed.

zspfloat, ndarray, shape $(4)$

$z s p$ contains the values, default or otherwise, used by the function.

isfint, ndarray, shape $(4)$

Contains success/failure indicators, one for each of the four types of parameter in the model (autoregressive, moving average, seasonal autoregressive, seasonal moving average), in that order.

Each indicator has the interpretation:

$- 2$	On entry parameters of this type have initial estimates which do not satisfy the stationarity or invertibility test conditions.
$- 1$	The search procedure has failed to converge because the latest set of parameter estimates of this type is invalid.
$0$	No parameter of this type is in the model.
$1$	Valid final estimates for parameters of this type have been obtained.

Raises

NagValueError

(errno $1$ )

On entry, $k f c = ⟨ v a l u e ⟩$ .

Constraint: $k f c = 0$ or $1$ .

(errno $1$ )

The orders vector $m r$ is invalid.

(errno $1$ )

On entry, $npar = ⟨ v a l u e ⟩$ .

Constraint: $npar = p + q + P + Q$ .

(errno $2$ )

The model is over-parameterised.

(errno $3$ )

On entry, $n i t = ⟨ v a l u e ⟩$ .

Constraint: $n i t \geq 0$ .

(errno $3$ )

On entry, $z s p [3] = ⟨ v a l u e ⟩$ .

Constraint: $0.0 \leq z s p [3] < 1.0$ .

(errno $3$ )

On entry, $z s p [2] = ⟨ v a l u e ⟩$ .

Constraint: $z s p [2] \geq 1.0$ .

(errno $3$ )

On entry, $z s p [1] = ⟨ v a l u e ⟩$ .

Constraint: $z s p [1] > 1.0$ .

(errno $3$ )

On entry, $z s p [0] = ⟨ v a l u e ⟩$ .

Constraint: $z s p [0] > 0.0$ .

(errno $4$ )

On entry, $i s t = ⟨ v a l u e ⟩$ and the minimum size $required = ⟨ v a l u e ⟩$ .

Constraint: $i s t \geq (P \times s) + d + (D \times s) + q + m a x (p, Q \times s)$ .

Warns

NagAlgorithmicWarning

(errno $6$ )

On entry, $ldh = ⟨ v a l u e ⟩$ .

Constraint: $ldh \geq q \times (Q \times s) + npar + k f c$ .

(errno $6$ )

On entry, $i g h = ⟨ v a l u e ⟩$ .

Constraint: $i g h \geq q \times (Q \times s) + npar + k f c$ .

(errno $6$ )

On entry, $i e x = ⟨ v a l u e ⟩$ .

Constraint: $i e x \geq q \times (Q \times s) + nx$ .

(errno $7$ )

A failure in the search procedure has occurred.

(errno $8$ )

Failure to invert $H$ .

(errno $9$ )

Unable to calculate the latest estimates of the backforecasts.

(errno $10$ )

Satisfactory parameter estimates could not be obtained for all parameter types in the model.

Notes

No equivalent traditional C interface for this routine exists in the NAG Library.

The time series $x_{1}, x_{2}, \dots, x_{n}$ supplied to uni_arima_estim is assumed to follow a seasonal autoregressive integrated moving average (ARIMA) model defined as follows:

\nabla^{d} \nabla_{s}^{D} x_{t} - c = w_{t},

where $\nabla^{d} \nabla_{s}^{D} x_{t}$ is the result of applying non-seasonal differencing of order $d$ and seasonal differencing of seasonality $s$ and order $D$ to the series $x_{t}$ , as outlined in the description of uni_diff(). The differenced series is then of length $N = n - d^{'}$ , where $d^{'} = d + (D \times s)$ is the generalized order of differencing. The scalar $c$ is the expected value of the differenced series, and the series $w_{1}, w_{2}, \dots, w_{N}$ follows a zero-mean stationary autoregressive moving average (ARMA) model defined by a pair of recurrence equations. These express $w_{t}$ in terms of an uncorrelated series $a_{t}$ , via an intermediate series $e_{t}$ . The first equation describes the seasonal structure:

w_{t} = Φ_{1} w_{t - s} + Φ_{2} w_{t - 2 \times s} + \dots + Φ_{P} w_{t - P \times s} + e_{t} - Θ_{1} e_{t - s} - Θ_{2} e_{t - 2 \times s} - \dots - Θ_{Q} e_{t - Q \times s} .

The second equation describes the non-seasonal structure. If the model is purely non-seasonal the first equation is redundant and $e_{t}$ above is equated with $w_{t}$ :

e_{t} = ϕ_{1} e_{t - 1} + ϕ_{2} e_{t - 2} + \dots + ϕ_{p} e_{t - p} + a_{t} - θ_{1} a_{t - 1} - θ_{2} a_{t - 2} - \dots - θ_{q} a_{t - q} .

Estimates of the model parameters defined by

\begin{matrix} \begin{matrix} ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}, θ_{1}, θ_{2}, \dots, θ_{q}, Φ_{1}, Φ_{2}, \dots, Φ_{P}, Θ_{1}, Θ_{2}, \dots, Θ_{Q} \end{matrix} \end{matrix}

and (optionally) $c$ are obtained by minimizing a quadratic form in the vector $w = {(w_{1}, w_{2}, \dots, w_{N})}_{1}^{'}$ .

This is $Q F = w^{'} V^{- 1} w$ , where $V$ is the covariance matrix of $w$ , and is a function of the model parameters. This matrix is not explicitly evaluated, since $Q F$ may be expressed as a ‘sum of squares’ function. When moving average parameters $θ_{i}$ or $Θ_{i}$ are present, so that the generalized moving average order $q^{'} = q + s \times Q$ is positive, backforecasts $w_{1 - q^{'}}, w_{2 - q^{'}}, \dots, w_{0}$ are introduced as nuisance parameters. The ‘sum of squares’ function may then be written as

S (p m) = N \sum t = 1 - q^{'} a_{t}^{2} - - q^{'} \sum t = 1 - q^{'} - p^{'} b_{t}^{2},

where $p m$ is a combined vector of parameters, consisting of the backforecasts followed by the ARMA model parameters.

The terms $a_{t}$ correspond to the ARMA model residual series $a_{t}$ , and $p^{'} = p + s \times P$ is the generalized autoregressive order. The terms $b_{t}$ are only present if autoregressive parameters are in the model, and serve to correct for transient errors introduced at the start of the autoregression.

The equations defining $a_{t}$ and $b_{t}$ are precisely:

$e_{t} = w_{t} - Φ_{1} w_{t - s} - Φ_{2} w_{t - 2 \times s} - \dots - Φ_{P} w_{t - P \times s} + Θ_{1} e_{t - s} + Θ_{2} e_{t - 2 \times s} + \dots + Θ_{Q} e_{t - Q \times s}$ , for $t = 1 - q^{'}, 2 - q^{'}, \dots, n$ .

$a_{t} = e_{t} - ϕ_{1} e_{t - 1} - ϕ_{2} e_{t - 2} - \dots - ϕ_{p} e_{t - p} + θ_{1} a_{t - 1} + θ_{2} a_{t - 2} + \dots + θ_{q} a_{t - q}$ , for $t = 1 - q^{'}, 2 - q^{'}, \dots, n$ .

$f_{t} = w_{t} - Φ_{1} w_{t + s} - Φ_{2} w_{t + 2 \times s} - \dots - Φ_{P} w_{t + P \times s} + Θ_{1} f_{t - s} + Θ_{2} f_{t - 2 \times s} + \dots + Θ_{Q} f_{t - Q \times s}$ , for $t = (1 - q^{'} - s \times P), (2 - q^{'} - s \times P), \dots, (- q^{'} + P)$

$b_{t} = f_{t} - ϕ_{1} f_{t + 1} - ϕ_{2} f_{t + 2} - \dots - ϕ_{p} f_{t + p} + θ_{1} b_{t - 1} + θ_{2} b_{t - 2} + \dots + θ_{q} b_{t - q}$ , for $t = (1 - q^{'} - p^{'}), (2 - q^{'} - p^{'}), \dots, (- q^{'})$ .

For all four of these equations, the following conditions hold:

$w_{i} = 0$ if $i < 1 - q^{'}$

$e_{i} = 0$ if $i < 1 - q^{'}$

$a_{i} = 0$ if $i < 1 - q^{'}$

$f_{i} = 0$ if $i < 1 - q^{'} - s \times P$

$b_{i} = 0$ if $i < 1 - q^{'} - p^{'}$

Minimization of $S$ with respect to $p m$ uses an extension of the algorithm of Marquardt (1963).

The first derivatives of $S$ with respect to the parameters are calculated as

2 \times \sum a_{t} \times a_{t, i} - 2 \sum b_{t} \times b_{t, i} = 2 \times G_{i},

where $a_{t, i}$ and $b_{t, i}$ are derivatives of $a_{t}$ and $b_{t}$ with respect to the $i$ th parameter.

The second derivative of $S$ is approximated by

2 \times \sum a_{t, i} \times a_{t, j} - 2 \times \sum b_{t, i} \times b_{t, j} = 2 \times H_{i j} .

Successive parameter iterates are obtained by calculating a vector of corrections $dpm$ by solving the equations

(H + α \times D) \times dpm = - G,

where $G$ is a vector with elements $G_{i}$ , $H$ is a matrix with elements $H_{i j}$ , $α$ is a scalar used to control the search and $D$ is the diagonal matrix of $H$ .

The new parameter values are then $p m + dpm$ .

The scalar $α$ controls the step size, to which it is inversely related.

If a step results in new parameter values which give a reduced value of $S$ , then $α$ is reduced by a factor $β$ . If a step results in new parameter values which give an increased value of $S$ , or in ARMA model parameters which in any way contravene the stationarity and invertibility conditions, then the new parameters are rejected, $α$ is increased by the factor $β$ , and the revised equations are solved for a new parameter correction.

This action is repeated until either a reduced value of $S$ is obtained, or $α$ reaches the limit of $10^{9}$ , which is used to indicate a failure of the search procedure.

This failure may be due to a badly conditioned sum of squares function or to too strict a convergence criterion. Convergence is deemed to have occurred if the fractional reduction in the residual sum of squares in successive iterations is less than a value $γ$ , while $α < 1.0$ .

The stationarity and invertibility conditions are tested to within a specified tolerance multiple $δ$ of machine accuracy. Upon convergence, or completion of the specified maximum number of iterations without convergence, statistical properties of the estimates are derived. In the latter case the sequence of iterates should be checked to ensure that convergence is adequate for practical purposes, otherwise these properties are not reliable.

The estimated residual variance is

erv = S_{m i n} / df,

where $S_{m i n}$ is the final value of $S$ , and the residual number of degrees of freedom is given by

df = N - p - q - P - Q (- 1 if c is estimated) .

The covariance matrix of the vector of estimates $p m$ is given by

erv \times H^{- 1},

where $H$ is evaluated at the final parameter values.

From this expression are derived the vector of standard deviations, and the correlation matrix for the whole parameter set. These are asymptotic approximations.

The differenced series $w_{t}$ (now uncorrected for the constant), intermediate series $e_{t}$ and residual series $a_{t}$ are all available upon completion of the iterations over the range (extended by backforecasts)

t = 1 - q^{'}, 2 - q^{'}, \dots, N .

The values $a_{t}$ can only properly be interpreted as residuals for $t \geq 1 + p^{'} - q^{'}$ , as the earlier values are corrupted by transients if $p^{'} > 0$ .

In consequence of the manner in which differencing is implemented, the residual $a_{t}$ is the one step ahead forecast error for $x_{t + d^{'}}$ .

For convenient application in forecasting, the following quantities constitute the ‘state set’, which contains the minimum amount of time series information needed to construct forecasts:

the differenced series $w_{t}$ , for $(N - s \times P) < t \leq N$ ,
the $d^{'}$ values required to reconstitute the original series $x_{t}$ from the differenced series $w_{t}$ ,
the intermediate series $e_{t}$ , for $(N - m a x (p, Q \times s)) < t \leq N$ ,
the residual series $a_{t}$ , for $(N - q) < t \leq N$ .

This state set is available upon completion of the iterations. The function may be used purely for the construction of this state set, given a previously estimated model and time series $x_{t}$ , by requesting zero iterations. Backforecasts are estimated, but the model parameter values are unchanged. If later observations become available and it is desired to update the state set, uni_arima_update() can be used.

References

Box, G E P and Jenkins, G M, 1976, Time Series Analysis: Forecasting and Control, (Revised Edition), Holden–Day

Marquardt, D W, 1963, An algorithm for least squares estimation of nonlinear parameters, J. Soc. Indust. Appl. Math. (11), 431

NAG and Python

Return to Front

naginterfaces.library.tsa.uni_arima_estim¶

naginterfaces.library.tsa.uni_​arima_​estim¶

naginterfaces.library.tsa.uni_arima_estim¶