A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the quantile because half the values are less than or equal to it.
g01anf uses a slightly modified version of an algorithm described in a paper by
Zhang and Wang (2007) to determine
-approximate quantiles of a data stream of
real values, where
is known. Given any quantile
, an
-approximate quantile is defined as an element in the data stream whose rank falls within
. In case of more than one
-approximate quantile being available, the one closest to
is returned.
Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29
-
1:
– Integer
Input/Output
-
On entry: indicates the action required in the current call to
g01anf.
- Return the required length of rcomm and icomm in and respectively. n and eps must be set and licomm must be at least .
- Initialise the communication arrays and process the first nb values from the data stream as supplied in rv.
- Process the next block of nb values from the data stream. The calling program must update rv and (if required) nb, and re-enter g01anf with all other parameters unchanged.
- Calculate the nq -approximate quantiles specified in q. The calling program must set q and nq and re-enter g01anf with all other parameters unchanged. This option can be chosen only when .
On exit: indicates output from a successful call.
- Lengths of rcomm and icomm have been returned in and respectively.
- g01anf has processed np data points and expects to be called again with additional data (i.e., ).
- g01anf has returned the requested -approximate quantiles in qv. These quantiles are based on np data points.
- Routine has processed all n data points (i.e., ).
Constraint:
on entry , , or .
-
2:
– Integer
Input
-
On entry: , the total number of values in the data stream.
Constraint:
.
-
3:
– Real (Kind=nag_wp) array
Input
Note: the dimension of the array
rv
must be at least
if
or
.
On entry: if
or
, the vector containing the current block of data, otherwise
rv is not referenced.
-
4:
– Integer
Input
-
On entry: if
or
, the size of the current block of data. The size of blocks of data in array
rv can vary;, therefore,
nb can change between calls to
g01anf.
Constraint:
if or , .
-
5:
– Real (Kind=nag_wp)
Input
-
On entry: approximation factor .
Constraint:
.
-
6:
– Integer
Output
-
On exit: the number of elements processed so far.
-
7:
– Real (Kind=nag_wp) array
Input
Note: the dimension of the array
q
must be at least
if
.
On entry: if
, the quantiles to be calculated, otherwise
q is not referenced. Note that
, corresponds to the minimum value and
to the maximum value.
Constraint:
if ,
, for .
-
8:
– Real (Kind=nag_wp) array
Output
Note: the dimension of the array
qv
must be at least
if
.
On exit: if , contains the -approximate quantiles specified by the value provided in .
-
9:
– Integer
Input
-
On entry: if
, the number of quantiles requested, otherwise
nq is not referenced.
Constraint:
if , .
-
10:
– Real (Kind=nag_wp) array
Communication Array
-
11:
– Integer
Input
-
On entry: the dimension of the array
rcomm as declared in the (sub)program from which
g01anf is called.
Constraint:
if
,
lrcomm must be at least equal to the value returned in
by a call to
g01anf with
. This will not be more than
, where
.
-
12:
– Integer array
Communication Array
-
13:
– Integer
Input
-
On entry: the dimension of the array
icomm as declared in the (sub)program from which
g01anf is called.
Constraints:
- if , ;
- otherwise licomm must be at least equal to the value returned in by a call to g01anf with . This will not be more than , where and .
-
14:
– Integer
Input/Output
-
On entry:
ifail must be set to
,
or
. If you are unfamiliar with this argument you should refer to
Section 7 in the Introduction to the NAG Library CL Interface for details.
On exit:
unless the routine detects an error (see
Section 6).
As an out-of-core routine
g01anf will only perform certain argument checks when a data checkpoint (including completion of data input) is signaled. As such it will usually be inappropriate to halt program execution when an error is detected since any errors may be subsequently resolved without losing any processing already carried out. Therefore, setting
ifail to a value of
or
is recommended. If the output of error messages is undesirable, the value
is recommended.
When the value or is used it is essential to test the value of ifail on exit.
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Not applicable.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementation-specific information.