PDF version (NAG web site
, 64bit version, 64bit version)
NAG Toolbox: nag_tsa_cp_binary_user (g13ne)
Purpose
nag_tsa_cp_binary_user (g13ne) detects change points in a univariate time series, that is, the time points at which some feature of the data, for example the mean, changes. Change points are detected using binary segmentation for a usersupplied cost function.
Syntax
[
tau,
user,
ifail] = g13ne(
n,
beta,
chgpfn, 'minss',
minss, 'mdepth',
mdepth, 'user',
user)
[
tau,
user,
ifail] = nag_tsa_cp_binary_user(
n,
beta,
chgpfn, 'minss',
minss, 'mdepth',
mdepth, 'user',
user)
Description
Let ${y}_{1:n}=\left\{{y}_{j}:j=1,2,\dots ,n\right\}$ denote a series of data and $\tau =\left\{{\tau}_{i}:i=1,2,\dots ,m\right\}$ denote a set of $m$ ordered (strictly monotonic increasing) indices known as change points with $1\le {\tau}_{i}\le n$ and ${\tau}_{m}=n$. For ease of notation we also define ${\tau}_{0}=0$. The $m$ change points, $\tau $, split the data into $m$ segments, with the $i$th segment being of length ${n}_{i}$ and containing ${y}_{{\tau}_{i1}+1:{\tau}_{i}}$.
Given a cost function,
$C\left({y}_{{\tau}_{i1}+1:{\tau}_{i}}\right)$,
nag_tsa_cp_binary_user (g13ne) gives an approximate solution to
where
$\beta $ is a penalty term used to control the number of change points. The solution is obtained in an iterative manner as follows:
1. 
Set $u=1$, $w=n$ and $k=0$ 
2. 
Set $k=k+1$. If $k>K$, where $K$ is a usersupplied control parameter, then terminate the process for this segment. 
3. 
Find $v$ that minimizes

4. 
Test

5. 
If inequality (1) is false then the process is terminated for this segment. 
6. 
If inequality (1) is true, then $v$ is added to the set of change points, and the segment is split into two subsegments, ${y}_{u:v}$ and ${y}_{v+1:w}$. The whole process is repeated from step 2 independently on each subsegment, with the relevant changes to the definition of $u$ and $w$ (i.e., $w$ is set to $v$ when processing the left hand subsegment and $u$ is set to $v+1$ when processing the right hand subsegment. 
The change points are ordered to give $\tau $.
References
Chen J and Gupta A K (2010) Parametric Statistical Change Point Analysis With Applications to Genetics Medicine and Finance Second Edition Birkhäuser
Parameters
Compulsory Input Parameters
 1:
$\mathrm{n}$ – int64int32nag_int scalar

$n$, the length of the time series.
Constraint:
${\mathbf{n}}\ge 2$.
 2:
$\mathrm{beta}$ – double scalar

$\beta $, the penalty term.
There are a number of standard ways of setting
$\beta $, including:
 SIC or BIC
 $\beta =p\times \mathrm{log}\left(n\right)$.
 AIC
 $\beta =2p$.
 HannanQuinn
 $\beta =2p\times \mathrm{log}\left(\mathrm{log}\left(n\right)\right)$.
where
$p$ is the number of parameters being treated as estimated in each segment. The value of
$p$ will depend on the cost function being used.
If no penalty is required then set $\beta =0$. Generally, the smaller the value of $\beta $ the larger the number of suggested change points.
 3:
$\mathrm{chgpfn}$ – function handle or string containing name of mfile

chgpfn must calculate a proposed change point, and the associated costs, within a specified segment.
[v, cost, user, info] = chgpfn(side, u, w, minss, user, info)
Input Parameters
 1:
$\mathrm{side}$ – int64int32nag_int scalar

Flag indicating what
chgpfn must calculate and at which point of the Binary Segmentation it has been called.
 ${\mathbf{side}}=1$
 only $C\left({y}_{u:w}\right)$ need be calculated and returned in ${\mathbf{cost}}\left(1\right)$, neither v nor the other elements of cost need be set. In this case, $u=1$ and $w=\mathrm{n}$.
 ${\mathbf{side}}=0$
 all elements of cost and v must be set. In this case, $u=1$ and $w=\mathrm{n}$.
 ${\mathbf{side}}=1$
 the segment, ${y}_{u:w}$, is a left hand side subsegment from a previous iteration of the Binary Segmentation algorithm. All elements of cost and v must be set.
 ${\mathbf{side}}=2$
 the segment, ${y}_{u:w}$, is a right hand side subsegment from a previous iteration of the Binary Segmentation algorithm. All elements of cost and v must be set.
The distinction between
${\mathbf{side}}=1$ and
$2$ may allow for
chgpfn to be implemented in a more efficient manner. See section
Example for one such example.
The first call to
chgpfn will always have
${\mathbf{side}}=1$ and the second call will always have
${\mathbf{side}}=0$. All subsequent calls will be made with
${\mathbf{side}}=1$ or
$2$.
 2:
$\mathrm{u}$ – int64int32nag_int scalar

$u$, the start of the segment of interest.
 3:
$\mathrm{w}$ – int64int32nag_int scalar

$w$, the end of the segment of interest.
 4:
$\mathrm{minss}$ – int64int32nag_int scalar

The minimum distance between two change points, as passed to nag_tsa_cp_binary_user (g13ne).
 5:
$\mathrm{user}$ – Any MATLAB object
chgpfn is called from
nag_tsa_cp_binary_user (g13ne) with the object supplied to
nag_tsa_cp_binary_user (g13ne).
 6:
$\mathrm{info}$ – int64int32nag_int scalar

${\mathbf{info}}=0$.
Output Parameters
 1:
$\mathrm{v}$ – int64int32nag_int scalar

If
${\mathbf{side}}=1$ then
v need not be set.
if
${\mathbf{side}}\ne 1$ then
$v$, the proposed change point. That is, the value which minimizes
for
$v=u+{\mathbf{minss}}1$ to
$w{\mathbf{minss}}$.
 2:
$\mathrm{cost}\left(3\right)$ – double array

Costs associated with the proposed change point,
$v$.
If
${\mathbf{side}}=1$ then
${\mathbf{cost}}\left(1\right)=C\left({y}_{u:w}\right)$ and the remaining two elements of
cost need not be set.
If
${\mathbf{side}}\ne 1$ then
 ${\mathbf{cost}}\left(1\right)=C\left({y}_{u:v}\right)+C\left({y}_{v+1:w}\right)$.
 ${\mathbf{cost}}\left(2\right)=C\left({y}_{u:v}\right)$.
 ${\mathbf{cost}}\left(3\right)=C\left({y}_{v+1:w}\right)$.
 3:
$\mathrm{user}$ – Any MATLAB object
 4:
$\mathrm{info}$ – int64int32nag_int scalar

In most circumstances
info should remain unchanged.
If
info is set to a strictly positive value then
nag_tsa_cp_binary_user (g13ne) terminates with
${\mathbf{ifail}}={\mathbf{51}}$.
If
info is set to a strictly negative value the current segment is skipped (i.e., no change points are considered in this segment) and
nag_tsa_cp_binary_user (g13ne) continues as normal. If
info was set to a strictly negative value at any point and no other errors occur then
nag_tsa_cp_binary_user (g13ne) will terminate with
${\mathbf{ifail}}={\mathbf{52}}$.
Optional Input Parameters
 1:
$\mathrm{minss}$ – int64int32nag_int scalar
Default:
$2$
The minimum distance between two change points, that is ${\tau}_{i}{\tau}_{i1}\ge {\mathbf{minss}}$.
Constraint:
${\mathbf{minss}}\ge 2$.
 2:
$\mathrm{mdepth}$ – int64int32nag_int scalar
Default:
$0$
$K$, the maximum depth for the iterative process, which in turn puts an upper limit on the number of change points with
$m\le {2}^{K}$.
If $K\le 0$ then no limit is put on the depth of the iterative process and no upper limit is put on the number of change points.
 3:
$\mathrm{user}$ – Any MATLAB object
user is not used by
nag_tsa_cp_binary_user (g13ne), but is passed to
chgpfn. Note that for large objects it may be more efficient to use a global variable which is accessible from the mfiles than to use
user.
Output Parameters
 1:
$\mathrm{tau}\left(\mathbf{ntau}\right)$ – int64int32nag_int array

The dimension of the array
tau will be
$\mathbf{ntau}$
The location of the change points. The $i$th segment is defined by ${y}_{\left({\tau}_{i1}+1\right)}$ to ${y}_{{\tau}_{i}}$, where ${\tau}_{0}=0$ and ${\tau}_{i}={\mathbf{tau}}\left(i\right),1\le i\le m$.
 2:
$\mathrm{user}$ – Any MATLAB object
 3:
$\mathrm{ifail}$ – int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
 ${\mathbf{ifail}}=11$

Constraint: ${\mathbf{n}}\ge 2$.
 ${\mathbf{ifail}}=31$

Constraint: ${\mathbf{minss}}\ge 2$.
 ${\mathbf{ifail}}=51$

User requested termination by setting .
 W ${\mathbf{ifail}}=52$

User requested a segment to be skipped by setting .
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
Accuracy
Not applicable.
Further Comments
nag_tsa_cp_binary (g13nd) performs the same calculations for a cost function selected from a provided set of cost functions. If the required cost function belongs to this provided set then
nag_tsa_cp_binary (g13nd) can be used without the need to provide a cost function routine.
Example
This example identifies changes in the scale parameter, under the assumption that the data has a gamma distribution, for a simulated dataset with $100$ observations. A penalty, $\beta $ of $3.6$ is used and the minimum segment size is set to $3$. The shape parameter is fixed at $2.1$ across the whole input series.
The cost function used is
where
$a$ is a shape parameter that is fixed for all segments and
${n}_{i}={\tau}_{i}{\tau}_{i1}+1$.
Open in the MATLAB editor:
g13ne_example
function g13ne_example
fprintf('g13ne example results\n\n');
y = [ 0.00; 0.78; 0.02; 0.17; 0.04; 1.23; 0.24; 1.70; 0.77; 0.06;
0.67; 0.94; 1.99; 2.64; 2.26; 3.72; 3.14; 2.28; 3.78; 0.83;
2.80; 1.66; 1.93; 2.71; 2.97; 3.04; 2.29; 3.71; 1.69; 2.76;
1.96; 3.17; 1.04; 1.50; 1.12; 1.11; 1.00; 1.84; 1.78; 2.39;
1.85; 0.62; 2.16; 0.78; 1.70; 0.63; 1.79; 1.21; 2.20; 1.34;
0.04; 0.14; 2.78; 1.83; 0.98; 0.19; 0.57; 1.41; 2.05; 1.17;
0.44; 2.32; 0.67; 0.73; 1.17; 0.34; 2.95; 1.08; 2.16; 2.27;
0.14; 0.24; 0.27; 1.71; 0.04; 1.03; 0.12; 0.67; 1.15; 1.10;
1.37; 0.59; 0.44; 0.63; 0.06; 0.62; 0.39; 2.63; 1.63; 0.42;
0.73; 0.85; 0.26; 0.48; 0.26; 1.77; 1.53; 1.39; 1.68; 0.43];
a = 2.1;
n = int64(numel(y));
work = zeros(2*n,1);
user = {y; a; work};
beta = 3.4;
minss = int64(3);
[tau] = g13ne(n,beta,@chgpfn,'minss',minss,'user',user);
fprintf('  Change Points \n');
fprintf(' Number Position\n');
fprintf(' =====================\n');
for i = 1:numel(tau)
fprintf(' %4d %6d\n', i, tau(i));
end
fig1 = figure;
plot(y,'Color','red');
xpos = transpose(double(tau(1:end1))*ones(1,2));
ypos = diag(ylim)*ones(2,numel(tau)1);
line(xpos,ypos,'Color','black');
title({'{\bf g13ne Example Plot}',
'Simulated time series and the corresponding changes in scale b',
'assuming y ~ Ga(2.1,b)'});
xlabel('{\bf Time}');
ylabel('{\bf Value}');
function [v,cost,user,info] = chgpfn(side,u,w,minss,user,info)
y = user{1};
a = user{2};
work = user{3};
floc = u + minss  1;
lloc = w  minss;
ys = 0;
dn = 0;
if (side==0  side==1)
for i = w:1:floc + 1
dn = dn + 1;
tmp = dn*a;
ys = ys + y(i);
work(2*i2) = 2.0*tmp*(log(ys)log(tmp));
end
user = {y; a; work};
else
if (side==1)
li = w;
else
li = lloc;
end
for i = u:li
dn = dn + 1;
tmp = dn*a;
ys = ys + y(i);
work(2*i1) = 2.0*tmp*(log(ys)log(tmp));
end
user = {y; a; work};
end
v = int64(0);
cost = zeros(3,1);
if (side>=0)
v = int64(0);
cost(1) = 0;
for i = floc:lloc
this_cost = work(2*i1) + work(2*i);
if (this_cost<cost(1)  v==0)
v = int64(i);
cost(1) = this_cost;
cost(2) = work(2*i1);
cost(3) = work(2*i);
end
end
else
cost(1) = work(2*w1);
end
info = int64(0);
g13ne example results
 Change Points 
Number Position
=====================
1 5
2 12
3 32
4 70
5 73
6 100
This example plot shows the original data series and the estimated change points.
PDF version (NAG web site
, 64bit version, 64bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015