Creating C++ Interfaces For The NAG Library: Part 2
In the first part of this series we came up with a basic list of requirements for a suite of C++ interfaces to the NAG Library and a list of restrictions we need to work under. In this article we start to look at how we can transform those lists into a template for producing the interfaces.
Underlying the majority of the routines in the NAG Library is the NAG Library Engine. The API for the Engine contains only simple (ANSI C mappable) types, with all array like data structures (vectors, matrices etc.) supplied as contiguous memory. the Engine API is undocumented, the closest documented interface we have is our standard FL interfaces. The main difference between these and the Engine API is that the Engine interfaces tend to have more arguments.
For our C++ interfaces we would like to allow richer data structures to be supplied directly to the NAG routines. Rather than implement specific data classes we want to take a more flexible approach, some advice on how to do this has been discussed by one of my colleagues in an earlier blog article. That article, along with the Engine API, will form the starting point for our new C++ interfaces.
There are a number of components that need to be taken into consideration when designing a new suite of interfaces, in the rest of this article we are going to concentrate mainly on one of them: how to handle array arguments. Of the four requirements we identified in the first article of this series, two can be related to the handling of arrays: the interface should not impose a rigid data structure and should be as simplified as possible.

Use of Templates

For flexibility all array arguments will be individually templated, so an interface may look like:
template <typename A, typename B>
void this_routine(A && a, B && b, double c);
The minimum amount of information that the Engine API needs from a class used as an array argument is the location of the underlying raw data, therefore we are going to assume that each class used to supply an array has a data method:
template <typename DT> DT data(void);
where DT is a pointer to an array of elements which can be statically cast into a double, std::complex or a nagcpp::types::f77_integer for real, complex and integer arrays respectively. Here, nagcpp::types::f77_integer will either be an int or long depending on the implementation. If casting is required then the data will need to be copied but the returned pointer will be passed directly to the Engine otherwise.
As can be seen from the f77_integer type we are currently planning on using the nagcpp namespace as the parent namespace which will contain a number of children, currently one per chapter of the Library and a couple of utility namespaces.
An alternative to using a specific method like a.data to access the raw data would be to use &(a[0]) if the [] operator had been implemented. The requirement of a specific method seems cleaner (and easier to document) than making use of the side effect of an operator, so we are not currently planning on allowing the use of &(a[0]). However the code should be abstracted enough to allow this functionality to be easily added if required.
The above set up means that STL vectors can be passed directly to the NAG routines out of the box, however Boost matrices can not. The Boost matrix API does not provide a method to cleanly access the raw data (i.e. the equivalent of the data method). This is one class where allowing the [] operator to be used rather than the data method may work - however it would be using a non-documented feature of the API, so is not ideal. An alternative would be to use the Boost copy API to copy data into contiguous memory before passing it to the NAG routine - a decision on that is something we will defer to a later date.
In addition to the data method, we are going to assume that each class used to supply a multi-dimensional array (or matrix) has an is_col_major method:
bool is_col_major(void);
which returns true if the multi-dimensional array is supplied in column major order or row major order. If no such method is supplied the data will be assumed to be in column major order. Column major order is being used as the default because whilst the NAG Library Engine can accept arrays in either column or row major order, due to the way that various algorithms are implemented, it is often more efficient to supply data in column major order. It will be possible to override this default value on a routine by routine basis.
Currently we are not planning on using templates for scalar arguments.
How arrays of strings will be best handled is something that we still need to investigate.

Interfaces Supplied as Header Files

Because of the use of templates, the C++ interface will need to be supplied as header files as opposed to a pre-compiled library. This also has the advantage of allowing it to sit more easily on top of any library products - one of the other requirements we had for this set of interfaces.
Currently it is planned for each routine to be in a separate, stand-alone header file. Header files for each chapter will be supplied in their own directory and a combined header file will be supplied for each chapter and one more for the whole library.

Arrays Know Their Own Size

Most routines that take an array as an input argument also have one or more arguments that relate to the size of that array, like in the Python interfaces it would be nice if these arguments could be dropped.
Because we are templating array arguments we can assume that they "know their own size". In order for this assumption to be true we need to assume that each class has some size methods:
template <typename IT> IT size(void);
template <typename IT> IT size1(void);
template <typename IT> IT size2(void);
template <typename IT> IT size3(void);
where the templated type IT is a type which can be statically cast into a nagcpp::types::f77_integer.
The size methods, size1, size2 and size3 return the first, second and third dimensions of an array. They need not be implemented if the array does not have that dimension, so for a vector (one-dimensional array), only size needs to be present. For a two-dimensional array both size1 and size2 would need to be implemented, etc. The methods size and size1 can both be used to return the first dimension, with size taking precedence if both are present (but then why would you implement both!).
In addition to the size methods, we are also assuming that:
template <typename IT> ndims(void);
exists and returns the number of dimensions for an array. However, if ndims is not present then the value is inferred from the presence of the various size methods.
As well as allowing any arguments that are array sizes (or inferable from an array size) to be dropped from the interface assuming that arrays "know their own size" allows us to add additional runtime checks to ensure that the supplied arrays are the correct size. Because of this we currently assume that each array in an interface has the size methods implemented, even where it would be possible to infer all array sizes arguments from a subset of array arguments.

Oversized Arrays

Some of the Engine APIs allow oversized arrays to be supplied. As an example, suppose that you have an m×n matrix A and rather than supplying A in an array with m rows and n columns you supply it in the top left-hand corner of an array with k rows and n columns, for k>m. In order for the NAG routine to access the elements of A correctly it needs to know not only the size of A but also the column stride, k (here we are assuming that the data is stored in column major order, a similar argument applies to data store in row major order, just with the dimensions switch over). The main reason for allowing oversized arrays to be supplied is to allow for sub-arrays to be passed directly to a routine. However there are very few cases where this is needed and the added complication of this, in terms of the interface and documentation, tends to outweigh the benefit.
We are currently assuming that no arrays are oversized, so in the above example we assume that the column stride is always the same as the number of rows.

Allocation of Output Arrays

The standard interfaces to the NAG Library require a user to pre-allocate output arrays. Because arrays are templated and the fact that the C++ interface will be supplied as headers, we can relax this requirement if we assume that each class used as an output array argument has a resize method:
template <typename IT> resize(IT n1);
template <typename IT> resize(IT n1, IT n2);
template <typename IT> resize(IT n1, IT n2, IT n3);
which allocates the memory returned by the data method to the specified size defined by input arguments n1, n2 and n3. If no such method is implemented then it will be assumed that the output array was pre-allocated prior to calling the NAG routine.

Arrays in Callbacks

Whilst it is possible to allow arbitrary classes (with a handful of known methods) to be accepted as array arguments to the main routine interfaces it is not possible to do this in callback interfaces (i.e. functions written by the user that are passed as arguments to the NAG routine). This is because the input arguments to callbacks come from the NAG Library Engine and are then passed to the users calling program, at the point that this happens there is no information on the type of classes the user supplied when calling the main routine. We therefore need to implement a class to hold array data passed from the Engine to the user when calling a callback function. These classes have currently been given the very imaginative names nagcpp::utility::array1D_ref, nagcpp::utility::array2D_ref and nagcpp::utility::array3D_ref for one, two and three-dimensional arrays. Where appropriate these classes will implement the methods described above (with the exception of the resize method) and the () operator which will allow the elements of the array to be accessed, for example a(i,j,k) would access element aijk of a three-dimensional array a.

Summary

Our current plan for the C++ interfaces now includes:
  • Code will be supplied in header files
  • All array arguments will be templated.
  • A class used to supply an array argument will be assumed (where appropriate) to have the following methods implemented:
    template <typename IT> IT size(void);
    template <typename IT> IT size1(void);
    template <typename IT> IT size2(void);
    template <typename IT> IT size3(void);
    template <typename IT> ndims(void);
    template <typename DT> DT data(void);
    bool is_col_major(void);
    template <typename IT> resize(IT n1);
    template <typename IT> resize(IT n1, IT n2);
    template <typename IT> resize(IT n1, IT n2, IT n3);
    
  • Array arguments in callbacks will be type nagcpp::utility::array1D_ref, nagcpp::utility::array2D_ref and nagcpp::utility::array3D_ref.
Whilst this article has been mainly concerned with how we are going to handle array arguments, the next instalment will look at some of the other decisions we need to make; error handling, optional arguments and default values amongst others.