naginterfaces.library.correg.robustm_corr_huber¶

naginterfaces.library.correg.robustm_corr_huber(x, eps, maxit=150, nitmon=0, tol=5e-05, io_manager=None)[source]¶

robustm_corr_huber computes a robust estimate of the covariance matrix for an expected fraction of gross errors.

For full information please refer to the NAG Library document for g02hk

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g02/g02hkf.html

Parameters

xfloat, array-like, shape $(n, m)$

$x [i - 1, j - 1]$ must contain the $i$ th observation for the $j$ th variable, for $j = 1, 2, \dots, m$ , for $i = 1, 2, \dots, n$ .

epsfloat

$ϵ$ , the expected fraction of gross errors expected in the sample.

maxitint, optional

The maximum number of iterations that will be used during the calculation of the covariance matrix.

nitmonint, optional

Indicates the amount of information on the iteration that is printed.

$n i t m o n > 0$

The value of $A$ , $θ$ and $δ$ (see Accuracy) will be printed at the first and every $n i t m o n$ iterations.

$n i t m o n \leq 0$

No iteration monitoring is printed.

When printing occurs the output is directed to the file object associated with the advisory I/O unit (see FileObjManager).

tolfloat, optional

The relative precision for the final estimates of the covariance matrix.

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

covfloat, ndarray, shape $(m \times (m + 1) / 2)$: A robust estimate of the covariance matrix, $C$ . The upper triangular part of the matrix $C$ is stored packed by columns. $C_{i j}$ is returned in $c o v [(j \times (j - 1) / 2 + i) - 1]$ , $i \leq j$ .
thetafloat, ndarray, shape $(m)$: The robust estimate of the location parameters $θ_{j}$ , for $j = 1, 2, \dots, m$ .
nitint: The number of iterations performed.

Raises

NagValueError

(errno $1$ )

On entry, $m a x i t = ⟨ v a l u e ⟩$ .

Constraint: $m a x i t > 0$ .

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l > 0.0$ .

(errno $1$ )

On entry, $e p s = ⟨ v a l u e ⟩$ .

Constraint: $0.0 \leq e p s \leq 1.0$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ and $m = ⟨ v a l u e ⟩$ .

Constraint: $n \geq m$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 2$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 1$ .

(errno $2$ )

On entry, a variable has a constant value, i.e., all elements in column $⟨ v a l u e ⟩$ of $x$ are identical.

(errno $3$ )

The iterative procedure to find $C$ has failed to converge in $m a x i t$ iterations.

(errno $4$ )

The iterative procedure to find $C$ has become unstable. This may happen if the value of $e p s$ is too large for the sample.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

For a set of $n$ observations on $m$ variables in a matrix $X$ , a robust estimate of the covariance matrix, $C$ , and a robust estimate of location, $θ$ , are given by

C = τ^{2} {(A^{T} A)}^{- 1},

where $τ^{2}$ is a correction factor and $A$ is a lower triangular matrix found as the solution to the following equations:

z_{i} = A (x_{i} - θ),

\frac{1}{n} n \sum i = 1 w ({∥ z_{i} ∥}_{2}) z_{i} = 0,

and

\frac{1}{n} n \sum i = 1 u ({∥ z_{i} ∥}_{2}) z_{i} z_{i}^{T} - I = 0,

where	$x_{i}$ is a vector of length $m$ containing the elements of the $i$ th row of $x$ ,
	$z_{i}$ is a vector of length $m$ ,
	$I$ is the identity matrix and $0$ is the zero matrix,
and	$w$ and $u$ are suitable functions.

robustm_corr_huber uses weight functions:

\begin{matrix} \begin{matrix} u (t) = \frac{a_{u}}{t^{2}}, & if t < a_{u}^{2} u (t) = 1, & if a_{u}^{2} \leq t \leq b_{u}^{2} u (t) = \frac{b_{u}}{t^{2}}, & if t > b_{u}^{2} \end{matrix} \end{matrix}

and

\begin{matrix} \begin{matrix} w (t) = 1, & if t \leq c_{w} w (t) = \frac{c_{w}}{t}, & if t > c_{w} \end{matrix} \end{matrix}

for constants $a_{u}$ , $b_{u}$ and $c_{w}$ .

These functions solve a minimax problem considered by Huber (see Huber (1981)). The values of $a_{u}$ , $b_{u}$ and $c_{w}$ are calculated from the expected fraction of gross errors, $ϵ$ (see Huber (1981) and Marazzi (1987)). The expected fraction of gross errors is the estimated proportion of outliers in the sample.

In order to make the estimate asymptotically unbiased under a Normal model a correction factor, $τ^{2}$ , is calculated, (see Huber (1981) and Marazzi (1987)).

The matrix $C$ is calculated using robustm_corr_user_deriv(). Initial estimates of $θ_{j}$ , for $j = 1, 2, \dots, m$ , are given by the median of the $j$ th column of $X$ and the initial value of $A$ is based on the median absolute deviation (see Marazzi (1987)). robustm_corr_huber is based on routines in ROBETH; see Marazzi (1987).

References

Huber, P J, 1981, Robust Statistics, Wiley

Marazzi, A, 1987, Weights for bounded influence regression in ROBETH, Cah. Rech. Doc. IUMSP, No. 3 ROB 3, Institut Universitaire de Médecine Sociale et Préventive, Lausanne

NAG and Python

Return to Front

naginterfaces.library.correg.robustm_corr_huber¶

naginterfaces.library.correg.robustm_​corr_​huber¶

naginterfaces.library.correg.robustm_corr_huber¶