hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_robustm_corr_user (g02hm)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_robustm_corr_user (g02hm) computes a robust estimate of the covariance matrix for user-supplied weight functions. The derivatives of the weight functions are not required.

Syntax

[user, covar, a, wt, theta, nit, ifail] = g02hm(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
[user, covar, a, wt, theta, nit, ifail] = nag_correg_robustm_corr_user(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: nitmon and tol were made optional
At Mark 22: n was made optional

Description

For a set of n observations on m variables in a matrix X, a robust estimate of the covariance matrix, C, and a robust estimate of location, θ, are given by
C=τ2ATA-1,  
where τ2 is a correction factor and A is a lower triangular matrix found as the solution to the following equations.
zi=Axi-θ  
1n i= 1nwzi2zi=0  
and
1ni=1nuzi2zi ziT -vzi2I=0,  
where xi is a vector of length m containing the elements of the ith row of X,
zi is a vector of length m,
I is the identity matrix and 0 is the zero matrix.
and w and u are suitable functions.
nag_correg_robustm_corr_user (g02hm) covers two situations:
(i) vt=1 for all t,
(ii) vt=ut.
The robust covariance matrix may be calculated from a weighted sum of squares and cross-products matrix about θ using weights wti=uzi. In case (i) a divisor of n is used and in case (ii) a divisor of i=1nwti is used. If w.=u., then the robust covariance matrix can be calculated by scaling each row of X by wti and calculating an unweighted covariance matrix about θ.
In order to make the estimate asymptotically unbiased under a Normal model a correction factor, τ2, is needed. The value of the correction factor will depend on the functions employed (see Huber (1981) and Marazzi (1987)).
nag_correg_robustm_corr_user (g02hm) finds A using the iterative procedure as given by Huber; see Huber (1981).
Ak=Sk+IAk-1  
and
θjk=bjD1+θjk- 1,  
where Sk=sjl, for j=1,2,,m and l=1,2,,m is a lower triangular matrix such that
sjl= -minmaxhjl/D2,-BL,BL, j>l -minmax12hjj/D2-1,-BD,BD, j=l ,  
where and BD and BL are suitable bounds.
The value of τ may be chosen so that C is unbiased if the observations are from a given distribution.
nag_correg_robustm_corr_user (g02hm) is based on routines in ROBETH; see Marazzi (1987).

References

Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

Parameters

Compulsory Input Parameters

1:     ucv – function handle or string containing name of m-file
ucv must return the values of the functions u and w for a given value of its argument.
[user, u, w] = ucv(t, user)

Input Parameters

1:     t – double scalar
The argument for which the functions u and w must be evaluated.
2:     user – Any MATLAB object
ucv is called from nag_correg_robustm_corr_user (g02hm) with the object supplied to nag_correg_robustm_corr_user (g02hm).

Output Parameters

1:     user – Any MATLAB object
2:     u – double scalar
The value of the u function at the point t.
3:     w – double scalar
The value of the w function at the point t.
2:     indm int64int32nag_int scalar
Indicates which form of the function v will be used.
indm=1
v=1.
indm1
v=u.
3:     xldxm – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn.
xij must contain the ith observation on the jth variable, for i=1,2,,n and j=1,2,,m.
4:     am×m+1/2 – double array
An initial estimate of the lower triangular real matrix A. Only the lower triangular elements must be given and these should be stored row-wise in the array.
The diagonal elements must be 0, and in practice will usually be >0. If the magnitudes of the columns of X are of the same order, the identity matrix will often provide a suitable initial value for A. If the columns of X are of different magnitudes, the diagonal elements of the initial value of A should be approximately inversely proportional to the magnitude of the columns of X.
Constraint: aj×j-1/2+j0.0, for j=1,2,,m.
5:     thetam – double array
An initial estimate of the location argument, θj, for j=1,2,,m.
In many cases an initial estimate of θj=0, for j=1,2,,m, will be adequate. Alternatively medians may be used as given by nag_univar_robust_1var_median (g07da).

Optional Input Parameters

1:     user – Any MATLAB object
user is not used by nag_correg_robustm_corr_user (g02hm), but is passed to ucv. Note that for large objects it may be more efficient to use a global variable which is accessible from the m-files than to use user.
2:     n int64int32nag_int scalar
Default: the first dimension of the array x.
n, the number of observations.
Constraint: n>1.
3:     m int64int32nag_int scalar
Default: the dimension of the array theta and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
m, the number of columns of the matrix X, i.e., number of independent variables.
Constraint: 1mn.
4:     bl – double scalar
Default: 0.9
The magnitude of the bound for the off-diagonal elements of Sk, BL.
Constraint: bl>0.0.
5:     bd – double scalar
Default: 0.9
The magnitude of the bound for the diagonal elements of Sk, BD.
Constraint: bd>0.0.
6:     maxit int64int32nag_int scalar
Default: 150
The maximum number of iterations that will be used during the calculation of A.
Constraint: maxit>0.
7:     nitmon int64int32nag_int scalar
Default: 0
Indicates the amount of information on the iteration that is printed.
nitmon>0
The value of A, θ and δ (see Accuracy) will be printed at the first and every nitmon iterations.
nitmon0
No iteration monitoring is printed.
When printing occurs the output is directed to the current advisory message channel (See nag_file_set_unit_advisory (x04ab).)
8:     tol – double scalar
Default: 5e-5
The relative precision for the final estimate of the covariance matrix. Iteration will stop when maximum δ (see Accuracy) is less than tol.
Constraint: tol>0.0.

Output Parameters

1:     user – Any MATLAB object
2:     covarm×m+1/2 – double array
A robust estimate of the covariance matrix, C. The upper triangular part of the matrix C is stored packed by columns (lower triangular stored by rows), that is Cij is returned in covarj×j-1/2+i, ij.
3:     am×m+1/2 – double array
The lower triangular elements of the inverse of the matrix A, stored row-wise.
4:     wtn – double array
wti contains the weights, wti=uzi2, for i=1,2,,n.
5:     thetam – double array
Contains the robust estimate of the location argument, θj, for j=1,2,,m.
6:     nit int64int32nag_int scalar
The number of iterations performed.
7:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

   ifail=1
On entry,n1,
orm<1,
orn<m,
orldx<n.
   ifail=2
On entry,tol0.0,
ormaxit0,
ordiagonal element of a=0.0,
orbl0.0,
orbd0.0.
   ifail=3
A column of x has a constant value.
   ifail=4
Value of u or w returned by ucv<0.
   ifail=5
The function has failed to converge in maxit iterations.
W  ifail=6
Either the sum D1 or the sum D2 is zero. This may be caused by the functions u or w being too strict for the current estimate of A (or C). You should either try a larger initial estimate of A or make the u and w functions less strict.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Arguments. At an iteration let
(i) d1= the maximum value of sjl
(ii) d2= the maximum absolute change in wti
(iii) d3= the maximum absolute relative change in θj
and let δ=maxd1,d2,d3. Then the iterative procedure is assumed to have converged when δ<tol

Further Comments

The existence of A will depend upon the function u (see Marazzi (1987)); also if X is not of full rank a value of A will not be found. If the columns of X are almost linearly related, then convergence will be slow.
If derivatives of the u and w functions are available then the method used in nag_correg_robustm_corr_user_deriv (g02hl) will usually give much faster convergence.

Example

A sample of 10 observations on three variables is read in along with initial values for A and θ and argument values for the u and w functions, cu and cw. The covariance matrix computed by nag_correg_robustm_corr_user (g02hm) is printed along with the robust estimate of θ.
ucv computes the Huber's weight functions:
ut=1, if  tcu2 ut= cut2, if  t>cu2  
and
wt= 1, if   tcw wt= cwt, if   t>cw.  
function g02hm_example


fprintf('g02hm example results\n\n');

indm = int64(1);
x = [3.4, 6.9, 12.2;
     6.4, 2.5, 15.1;
     4.9, 5.5, 14.2;
     7.3, 1.9, 18.2;
     8.8, 3.6, 11.7;
     8.4, 1.3, 17.9;
     5.3, 3.1, 15.0;
     2.7, 8.1,  7.7;
     6.1, 3.0, 21.9;
     5.3, 2.2, 13.9];

[n,m] = size(x);
indm = int64(1);

% Initial values
a = [1;
     0;     1;
     0;     0;     1];
theta = zeros(m,1);

user = [4, 2];
[user, covar, a, wt, theta, nit, ifail] = ...
    g02hm( ...
           @ucv, indm, x, a, theta, 'user', user);

fprintf(' iterations to convergence = %4d\n\n', nit);
mtitle = 'Robust covariance matrix';
n = int64(size(x,2));
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
                 uplo, diag, n, covar, mtitle);
fprintf('\n');
disp('Robust estimates of theta');
disp(theta);



function [userp, u, w] = ucv(t, userp)
  cu = userp(1);
  u = 1.0;
  if (t ~= 0)
    t2 = t*t;
    if (t2 > cu)
      u = cu/t2;
    end
  end
  % w function and derivative
  cw = userp(2);
  if (t > cw)
    w = cw/t;
  else
    w = 1.0;
  end
g02hm example results

 iterations to convergence =   34

 Robust covariance matrix
             1          2          3
 1      3.2779    -3.6918     4.7391
 2                 5.2841    -6.4087
 3                           11.8373

Robust estimates of theta
    5.6998
    3.8636
   14.7036


PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015