# NAG Library Routine Document

## 1Purpose

g01arf produces a stem and leaf display for a single sample of observations.

## 2Specification

Fortran Interface
 Subroutine g01arf ( prt, n, y, unit, plot,
 Integer, Intent (In) :: n, nstepx, nstepy, ldplot Integer, Intent (Inout) :: ifail Integer, Intent (Out) :: lines, iwork(n) Real (Kind=nag_wp), Intent (In) :: y(n) Real (Kind=nag_wp), Intent (Inout) :: unit Real (Kind=nag_wp), Intent (Out) :: sorty(n) Character (1), Intent (In) :: range, prt Character (1), Intent (Inout) :: plot(ldplot,nstepx)
#include nagmk26.h
 void g01arf_ ( const char *range, const char *prt, const Integer *n, const double y[], const Integer *nstepx, const Integer *nstepy, double *unit, char plot[], const Integer *ldplot, Integer *lines, double sorty[], Integer iwork[], Integer *ifail, const Charlen length_range, const Charlen length_prt, const Charlen length_plot)

## 3Description

g01arf produces a stem and leaf display for a single sample of $n$ observations. The stem and leaf display shows data values separated into the form of a ‘stem’ and a ‘leaf’. For example, a value of $473$ could be represented as $47$ $3$ where the stem is $47$ and the leaf is $3$. The data is scaled using a value known as the ‘leaf digit unit’. In the above example the leaf digit unit would be $1.0$.
The following example illustrates a stem and leaf display.
For the $10$ observations:
 $1.8 2.3 2.1 1.9 2.1 2.4 2.0 2.0 1.9 2.1$
the stem and leaf display is:
```1  1  8
3  1  99
5  2  00
5  2  111
2  2
2  2  3
1  2  4```
where the leaf digit unit is $0.1$ so that $1$ $8$ represents $1.8$ (i.e., $18×0.1$). The leaf digit unit distinguishes between the numbers $18.0$, $1.8$, $0.18$, etc. which may otherwise all be represented by $1$ $8$.
Included in the above display is an initial column specifying the cumulative count of values, up to and including that particular line, from either the top or bottom of the display, whichever is smaller. An exception to this is when the line on which the median lies is reached, in which case the actual count of values on that line is displayed, rather than a cumulative count, and this is highlighted by enclosing the count in parentheses. In this case the median is $2.05$ and thus falls between the two lines at which the cumulative count has reached $n/2$ where $n$ is the number of observations.
Some of the other features of the stem and leaf display are illustrated by the following two examples.
For the $30$ observations:
 $-19.0 -3.0 -1.0 0.0 1.0 2.0 2.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 7.0 7.0 8.0 10.0 11.0 11.0 13.0 31.0$
the stem and leaf display may be:
``` 1   1.  9
1   1*
1  -0.
3  -0*  13
15  +0*  012233344444
15  +0.  55556667788
5   1*  011
2   1.  3
1   2
1   2.
1   3   1
```
In the above display all the data are plotted and the leaf digit unit is $1.0$. Also in this display different leaves, that is different digits, may be plotted on a particular line. In this case we have $5$ possible digits per line, that is $2$ lines per stem, and these are represented as follows:
• * indicates that the line may contain the digits $0$ to 4;
• . indicates that the line may contain the digits $5$ to $9$.
Alternatively the stem and leaf display may look like:
```      LO   -19

2   -0*  3
3   +0T  1
5   +0*  01
10   +0T  22333
( 9)  +0F  444445555
11   +0*  66677
6   +0T  8
5    1*  011
2    1T  3

HI   31
```
Again the leaf digit unit is $1.0$ but in this display just the data between the fences, which are the hinges $±1\frac{1}{2}×\text{}$ the inter-hinge range, are plotted. Any data points that fall outside the fences are presented separately in the display under the headings LO for those points below the lower fence and HI for those points above the upper fence.
Again in this display different leaves, that is different digits, may be plotted on a particular line. However in this case we have $2$ possible digits per line, that is $5$ lines per stem, and these are represented as follows
• * indicates that the line may contain the digits $0$ or $1$;
• T indicates that the line may contain the digits $2$ or $3$;
• F indicates that the line may contain the digits $4$ or $5$;
• S indicates that the line may contain the digits $6$ or $7$;
• . indicates that the line may contain the digits $8$ or $9$.
A display may also allow $10$ different digits ($0$ to $9$) per line, that is $1$ line per stem, or just $1$ digit per line, that is $10$ lines per stem, as in the first of the three examples above.
Note that the median here is $4.5$. This falls between two lines in the first display but is highlighted on the second display since it lies on a particular line.
Finally if there are positive and negative numbers on the display these are highlighted by a $+$ or $-$ sign where the distinction is required, that is near the zero-point.
If there are too many leaves to fit in the plot width allowed, g01arf plots as many leaves as possible and places an asterisk to the right to indicate that some leaves are not displayed. If this occurs and you wish to be able to plot all the leaves then the width of the plot may be adjusted.
Options also allow the leaf unit and the height of the display to be specified by you or calculated by g01arf. These arguments may be used to control the type of the display you wish to obtain. Fixing the unit and changing the height of the display may alter the number of lines used per stem, that is the number of different digits per line. g01arf will choose a display for the fixed unit that attempts to make as much use of the available height as possible, thus increasing the height may allow for more lines per stem whereas decreasing the height may force the display to use fewer lines per stem. Similarly you may wish to fix the height and vary the leaf digit unit used on the display. See Section 9 for further details.
The display is returned in a character array with the option of printing the display.
Erickson B H and Nosanchuk T A (1985) Understanding Data Open University Press, Milton Keynes
Tukey J W (1977) Exploratory Data Analysis Addison–Wesley
Velleman P F and Hoaglin D C (1981) Applications, Basics, and Computing of Exploratory Data Analysis Duxbury Press, Boston, MA

## 5Arguments

1:     $\mathbf{range}$ – Character(1)Input
On entry: indicates whether you wish to scale the plot to the extremes of the data or to the fences.
${\mathbf{range}}=\text{'E'}$
The display is a plot to the extremes, that is a plot of all the data.
${\mathbf{range}}=\text{'F'}$
The display is a plot of the data between the fences.
Constraint: ${\mathbf{range}}=\text{'E'}$ or $\text{'F'}$.
2:     $\mathbf{prt}$ – Character(1)Input
On entry: indicates whether the stem and leaf display is to be output to an external file.
${\mathbf{prt}}=\text{'N'}$
The display is not output to an external file.
${\mathbf{prt}}=\text{'P'}$
The display is output to the current advisory message unit as defined by x04abf. Only the first $132$ characters of each line are actually printed.
Constraint: ${\mathbf{prt}}=\text{'P'}$ or $\text{'N'}$.
3:     $\mathbf{n}$ – IntegerInput
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 2$.
4:     $\mathbf{y}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: the $n$ observations.
5:     $\mathbf{nstepx}$ – IntegerInput
On entry: the number of character positions to be plotted horizontally.
Constraint: ${\mathbf{nstepx}}\ge 35$.
6:     $\mathbf{nstepy}$ – IntegerInput
On entry: the maximum number of character positions to be plotted vertically.
If ${\mathbf{nstepy}}\le 0$ a suitable value will be used by g01arf for the number of character positions to be plotted vertically. This will clearly be less than or equal to the value of ldplot.
Constraint: ${\mathbf{nstepy}}\le 0$ or ${\mathbf{nstepy}}\ge 5$.
7:     $\mathbf{unit}$ – Real (Kind=nag_wp)Input/Output
On entry: indicates the leaf digit unit to be used.
If ${\mathbf{unit}}>0.0$ and is not a power of ten, it will be converted to the nearest power of ten below the input value for unit.
If ${\mathbf{unit}}\le 0.0$, the optimum unit will be used. This is based on the range of the data to be plotted and the number of lines available for the display.
On exit: contains the actual unit used in the stem and leaf display.
8:     $\mathbf{plot}\left({\mathbf{ldplot}},{\mathbf{nstepx}}\right)$ – Character(1) arrayOutput
On exit: the stem and leaf display.
9:     $\mathbf{ldplot}$ – IntegerInput
On entry: the first dimension of the array plot as declared in the (sub)program from which g01arf is called.
Constraint: ${\mathbf{ldplot}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(5,{\mathbf{nstepy}}\right)$.
10:   $\mathbf{lines}$ – IntegerOutput
On exit: the actual number of lines needed for the display.
11:   $\mathbf{sorty}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the observations sorted into ascending order.
12:   $\mathbf{iwork}\left({\mathbf{n}}\right)$ – Integer arrayWorkspace
13:   $\mathbf{ifail}$ – IntegerInput/Output
On entry: ifail must be set to $0$, $-1\text{​ or ​}1$. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value $-1\text{​ or ​}1$ is recommended. If the output of error messages is undesirable, then the value $1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is $0$. When the value $-\mathbf{1}\text{​ or ​}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{ldplot}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldplot}}\ge 5$.
On entry, ${\mathbf{ldplot}}=〈\mathit{\text{value}}〉$ and ${\mathbf{nstepy}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldplot}}\ge {\mathbf{nstepy}}$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{nstepx}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nstepx}}\ge 35$.
On entry, ${\mathbf{nstepy}}>0$ and ${\mathbf{nstepy}}<5$: ${\mathbf{nstepy}}=〈\mathit{\text{value}}〉$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{prt}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{prt}}=\text{'P'}$ or $\text{'N'}$.
On entry, ${\mathbf{range}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{range}}=\text{'E'}$ or $\text{'F'}$.
${\mathbf{ifail}}=3$
Lines needed for display ($〈\mathit{\text{value}}〉$) exceed nstepy ($〈\mathit{\text{value}}〉$).
${\mathbf{ifail}}=4$
A value exceeds maximum allowed for an integer.
${\mathbf{ifail}}=-99$
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

## 7Accuracy

Accuracy is limited by the number of significant figures that may be represented on the display which will depend on the data, the number of lines available and the unit used.

## 8Parallelism and Performance

g01arf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

g01arf uses integer representations of the data. If very large data values are being used they should be scaled before using this routine. The largest integer can be found by calling x02bbf.
If an asterisk is plotted at the end of a line to indicate that some leaves are not displayed you should increase nstepx if they wish to be able to print the rest of the leaves on that line.
Note that if you request g01arf to print the plot only the first $132$ characters of each line are printed. The full plot is stored in the array plot so you do have the option of printing a plot which has more than $132$ characters on a line.
When the leaf digit unit is set, the number of lines per stem is decided as follows:
Let $r$ be the range of the data to be plotted:
• $r$ = largest observation – smallest observation: if all the data to both extremes are to be plotted (that is if ${\mathbf{range}}=\text{'E'}$),
• $r$ = upper fence – lower fence: if only the data between the fences are to be plotted (that is if ${\mathbf{range}}=\text{'F'}$).
Let $l$ be the number of lines available for the plot:
• $l={\mathbf{nstepy}}-4$ if ${\mathbf{nstepy}}>0$,
• $l={\mathbf{ldplot}}-4$ if ${\mathbf{nstepy}}\le 0$.
• The $4$ lines are subtracted to allow space for the display headings. If only the data between the fences are to be plotted then $l$ must be further reduced to allow space to present those values outside the fences. This will involve a minimum of another $4$ lines.
Let $e=\frac{\left(r/{\mathbf{unit}}\right)+1}{l}$,
• then the number of lines per stem is:
 $01​ if ​5
The time taken by the routine increases with $n$.

## 10Example

A program to produce two stem and leaf displays for a sample of $30$ observations. The first illustrates a plot produced automatically by g01arf and the second shows how to print the display under your control.

### 10.1Program Text

Program Text (g01arfe.f90)

### 10.2Program Data

Program Data (g01arfe.d)

### 10.3Program Results

Program Results (g01arfe.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017