A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the $0.5$ quantile because half the values are less than or equal to it.
g01apc uses a slightly modified version of an algorithm described in a paper by
Zhang and Wang (2007) to determine
$\epsilon $approximate quantiles of a large arbitrarysized data stream of real values, where
$\epsilon $ is a userdefined approximation factor. Let
$m$ denote the number of data elements processed so far then, given any quantile
$q\in [0.0,1.0]$, an
$\epsilon $approximate quantile is defined as an element in the data stream whose rank falls within
$[(q\epsilon )m,(q+\epsilon )m]$. In case of more than one
$\epsilon $approximate quantile being available, the one closest to
$qm$ is used.
Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29

1:
$\mathbf{ind}$ – Integer *
Input/Output

On initial entry: must be set to $0$.
On entry: indicates the action required in the current call to
g01apc.
 ${\mathbf{ind}}=0$
 Initialize the communication arrays and attempt to process the first nb values from the data stream. eps, rv and nb must be set and licomm must be at least $10$.
 ${\mathbf{ind}}=1$
 Attempt to process the next block of nb values from the data stream. The calling program must update rv and (if required) nb, and reenter g01apc with all other parameters unchanged.
 ${\mathbf{ind}}=2$
 Continue calculation following the reallocation of either or both of the communication arrays rcomm and icomm.
 ${\mathbf{ind}}=3$
 Calculate the nq $\epsilon $approximate quantiles specified in q. The calling program must set q and nq and reenter g01apc with all other parameters unchanged. This option can be chosen only when ${\mathbf{np}}\ge \lceil \mathrm{exp}\left(1.0\right)/{\mathbf{eps}}\rceil $.
On exit: indicates output from the call.
 ${\mathbf{ind}}=1$
 g01apc has processed np data points and expects to be called again with additional data.
 ${\mathbf{ind}}=2$
 Either one or more of the communication arrays rcomm and icomm is too small. The new minimum lengths of rcomm and icomm have been returned in ${\mathbf{icomm}}\left[0\right]$ and ${\mathbf{icomm}}\left[1\right]$ respectively. If the new minimum length is greater than the current length then the corresponding communication array needs to be reallocated, its contents preserved and g01apc called again with all other parameters unchanged.
If there is more data to be processed, it is recommended that
lrcomm and
licomm are made significantly bigger than the minimum to limit the number of reallocations.
 ${\mathbf{ind}}=3$
 g01apc has returned the requested $\epsilon $approximate quantiles in qv. These quantiles are based on np data points.
Constraint:
${\mathbf{ind}}=0$, $1$, $2$ or $3$.

2:
$\mathbf{rv}\left[\mathit{dim}\right]$ – const double
Input

Note: the dimension,
dim, of the array
rv
must be at least
 ${\mathbf{nb}}$, when ${\mathbf{ind}}=0,1\text{or}2$.
On entry: if
${\mathbf{ind}}=0$,
$1$ or
$2$, the vector containing the current block of data, otherwise
rv is not referenced.

3:
$\mathbf{nb}$ – Integer
Input

On entry: if
${\mathbf{ind}}=0$,
$1$ or
$2$, the size of the current block of data. The size of blocks of data in array
rv can vary;, therefore,
nb can change between calls to
g01apc.
Constraint:
if ${\mathbf{ind}}=0$, $1$ or $2$, ${\mathbf{nb}}>0$.

4:
$\mathbf{eps}$ – double
Input

On entry: approximation factor $\epsilon $.
Constraint:
${\mathbf{eps}}>0.0\text{ and}{\mathbf{eps}}\le 1.0$.

5:
$\mathbf{np}$ – Integer *
Output

On exit: $m$, the number of elements processed so far.

6:
$\mathbf{q}\left[\mathit{dim}\right]$ – const double
Input

Note: the dimension,
dim, of the array
q
must be at least
 ${\mathbf{nq}}$, when ${\mathbf{ind}}=3$.
On entry: if
${\mathbf{ind}}=3$, the quantiles to be calculated, otherwise
q is not referenced. Note that
${\mathbf{q}}\left[i\right]=0.0$, corresponds to the minimum value and
${\mathbf{q}}\left[i\right]=1.0$ to the maximum value.
Constraint:
if ${\mathbf{ind}}=3$,
$0.0\le {\mathbf{q}}\left[\mathit{i}1\right]\le 1.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nq}}$.

7:
$\mathbf{qv}\left[\mathit{dim}\right]$ – double
Output

Note: the dimension,
dim, of the array
qv
must be at least
 ${\mathbf{nq}}$, when ${\mathbf{ind}}=3$.
On exit: if ${\mathbf{ind}}=3$, ${\mathbf{qv}}\left[i\right]$ contains the $\epsilon $approximate quantiles specified by the value provided in ${\mathbf{q}}\left[i\right]$.

8:
$\mathbf{nq}$ – Integer
Input

On entry: if
${\mathbf{ind}}=3$, the number of quantiles requested, otherwise
nq is not referenced.
Constraint:
if ${\mathbf{ind}}=3$, ${\mathbf{nq}}>0$.

9:
$\mathbf{rcomm}\left[{\mathbf{lrcomm}}\right]$ – double
Communication Array

On entry: if
${\mathbf{ind}}=1$ or
$2$ then the first
$l$ elements of
rcomm as supplied to
g01apc must be identical to the first
$l$ elements of
rcomm returned from the last call to
g01apc, where
$l$ is the value of
lrcomm used in the last call. In other words, the contents of
rcomm must not be altered between calls to this function. If
rcomm needs to be reallocated then its contents must be preserved. If
${\mathbf{ind}}=0$ then
rcomm need not be set.
On exit:
rcomm holds information required by subsequent calls to
g01apc.

10:
$\mathbf{lrcomm}$ – Integer
Input

On entry: the dimension of the array
rcomm.
Constraints:
 if ${\mathbf{ind}}=0$, ${\mathbf{lrcomm}}\ge 1$;
 otherwise ${\mathbf{lrcomm}}\ge {\mathbf{icomm}}\left[0\right]$.

11:
$\mathbf{icomm}\left[{\mathbf{licomm}}\right]$ – Integer
Communication Array

On entry: if
${\mathbf{ind}}=1$ or
$2$ then the first
$l$ elements of
icomm as supplied to
g01apc must be identical to the first
$l$ elements of
icomm returned from the last call to
g01apc, where
$l$ is the value of
licomm used in the last call. In other words, the contents of
icomm must not be altered between calls to this function. If
icomm needs to be reallocated then its contents must be preserved. If
${\mathbf{ind}}=0$ then
icomm need not be set.
On exit:
${\mathbf{icomm}}\left[0\right]$ holds the minimum required length for
rcomm and
${\mathbf{icomm}}\left[1\right]$ holds the minimum required length for
icomm. The remaining elements of
icomm are used for communication between subsequent calls to
g01apc.

12:
$\mathbf{licomm}$ – Integer
Input

On entry: the dimension of the array
icomm.
Constraints:
 if ${\mathbf{ind}}=0$, ${\mathbf{licomm}}\ge 10$;
 otherwise ${\mathbf{licomm}}\ge {\mathbf{icomm}}\left[1\right]$.

13:
$\mathbf{fail}$ – NagError *
Input/Output

The NAG error argument (see
Section 7 in the Introduction to the NAG Library CL Interface).
 NE_ALLOC_FAIL

Dynamic memory allocation failed.
See
Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
 NE_ARRAY_SIZE

On entry, ${\mathbf{licomm}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{licomm}}\ge 10$.
On entry, ${\mathbf{lrcomm}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{lrcomm}}\ge 1$.
 NE_BAD_PARAM

On entry, argument $\u27e8\mathit{\text{value}}\u27e9$ had an illegal value.
 NE_ILLEGAL_COMM

The contents of
icomm have been altered between calls to this function.
The contents of
rcomm have been altered between calls to this function.
 NE_INT

On entry, ${\mathbf{ind}}=0$, $1$ or $2$ and ${\mathbf{nb}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: if ${\mathbf{ind}}=0$, $1$ or $2$ then ${\mathbf{nb}}>0$.
On entry, ${\mathbf{ind}}=3$ and ${\mathbf{nq}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: if ${\mathbf{ind}}=3$ then ${\mathbf{nq}}>0$.
On entry, ${\mathbf{ind}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{ind}}=0$, $1$, $2$ or $3$.
 NE_INTERNAL_ERROR

An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
See
Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
 NE_NO_LICENCE

Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library CL Interface for further information.
 NE_Q_OUT_OF_RANGE

On entry, ${\mathbf{ind}}=3$ and ${\mathbf{q}}\left[\u27e8\mathit{\text{value}}\u27e9\right]=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: if ${\mathbf{ind}}=3$ then $0.0\le {\mathbf{q}}\left[i\right]\le 1.0$ for all $i$.
 NE_REAL

On entry, ${\mathbf{eps}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: $0.0<{\mathbf{eps}}\le 1.0$.
 NE_TOO_SMALL

Number of data elements streamed,
$\u27e8\mathit{\text{value}}\u27e9$ is not sufficient for a quantile query when
${\mathbf{eps}}=\u27e8\mathit{\text{value}}\u27e9$.
Supply more data or reprocess the data with a higher
eps value.
Not applicable.
Background information to multithreading can be found in the
Multithreading documentation.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
It is not possible to determine in advance the final size of the communication arrays
rcomm and
icomm without knowing the size of the dataset. However, if a rough size (
$n$) is known, the speed of the computation can be increased if the sizes of the communication arrays are not smaller than
where