NAG Fortran Library, Mark 26, Multithreaded

FSLM626DCL - License Managed

Intel Xeon Phi 7200 Series (Knights Landing), Linux, 64-bit, Intel Fortran (32-bit integers), Double Precision

Users' Note


1. Introduction

This document is essential reading for every user of the NAG Fortran Library implementation specified in the title. It provides implementation-specific detail that augments the information provided in the NAG Mark 26 Library Manual (which we will refer to as the Library Manual). Wherever that manual refers to the "Users' Note for your implementation", you should consult this note.

In addition, NAG recommends that before calling any Library routine you should read the following reference material from the Library Manual (see Section 5):

(a) How to Use the NAG Library and its Documentation
(b) Chapter Introduction
(c) Routine Document

2. Supplementary Information

Please check the following URL:

for details of any new information related to the applicability or usage of this implementation.

3. General Information

This implementation of the NAG Fortran Library provides static and shareable libraries that use Intel MKL Intel ® Math Kernel for Linux, a third-party vendor performance library, to provide Basic Linear Algebra Subprograms (BLAS) and Linear Algebra PACKage (LAPACK) routines (except for any routines listed in Section 4). It also provides static and shareable libraries that use the NAG versions of these routines (referred to as the self-contained libraries). This implementation has been tested with MKL version 11.3.3, and this version of MKL is supplied as a part of this product. Please see the Intel web site for further information about MKL ( For best performance, we recommend that you use one of the variants of the NAG Fortran Library which is based on the supplied MKL, i.e. libnag_mkl.a or, in preference to using one of the self-contained NAG libraries, libnag_nag.a or

If you intend to use the NAG library within a multithreaded application please refer to Section 3.12.1 of the document How to Use the NAG Library and its Documentation for more information. Further information about using the supplied Intel MKL libraries with threaded applications is available at

The libraries supplied with this implementation have been compiled with OpenMP. However, the OpenMP runtime libraries of different compilers may not be compatible, thus you are recommended to only use this implementation in conjunction with your own OpenMP code (including any OpenMP statements required in the user supplied functions of the routines listed in Section 4) when using the compiler and corresponding OpenMP runtime listed in the Installer's Note. Note that the system's default thread stacksize may not be sufficient for running all NAG Fortran Library routines within multithreaded applications, you may increase this stacksize using the OpenMP environment variable OMP_STACKSIZE.

Intel have introduced a conditional bitwise reproducibility (BWR) option in MKL. Provided a user's code adheres to certain conditions (see, BWR can be forced by setting the MKL_CBWR environment variable. See the MKL documentation for further details. It should be noted, however, that many NAG routines do not adhere to these conditions. This means that for a given NAG library built on top of MKL, it may not be possible to ensure BWR for all NAG routines across different CPU architectures by setting MKL_CBWR. See Section 3.11.1 of the document How to Use the NAG Library and its Documentation for more general information on bitwise reproducibility.

Please note that this implementation is not compatible with versions of MKL earlier than 10.3.

3.1. Accessing the Library

In this section we assume that the Library has been installed in the directory [INSTALL_DIR].

By default [INSTALL_DIR] (see Installer's Note (in.html)) is $HOME/NAG/fslm626dcl; however it could have been changed by the person who did the installation, in which case you should consult that person. To use the NAG Fortran Library and the supplied Intel MKL libraries, you may link in the following manner:

  ifort -qopenmp -fp-model precise -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/libnag_mkl.a -Wl,--start-group [INSTALL_DIR]/mkl_intel64_11.3.3/lib/intel64/libmkl_intel_lp64.a [INSTALL_DIR]/mkl_intel64_11.3.3/lib/intel64/libmkl_intel_thread.a [INSTALL_DIR]/mkl_intel64_11.3.3/lib/intel64/libmkl_core.a -Wl,--end-group [INSTALL_DIR]/rtl/intel64/libiomp5.a -lpthread -lm -ldl -lrt -lstdc++
where driver.f90 is your application program; or
  ifort -qopenmp -fp-model precise -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/ -L[INSTALL_DIR]/mkl_intel64_11.3.3/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -L[INSTALL_DIR]/rtl/intel64 -liomp5 -lpthread -lm -ldl -lrt
if the shareable library is required.

However, if you prefer to link to a version of the NAG Fortran Library which does not require the use of MKL you may wish to use the self-contained libraries as follows:

  ifort -qopenmp -fp-model precise -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/libnag_nag.a -lrt -lstdc++
  ifort -qopenmp -fp-model precise -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/ -lrt
if the shareable library is required. Please note the shareable libraries are fully resolved so that, as long as the environment variable LD_LIBRARY_PATH is set correctly at link time (see below), you need not link against other run-time libraries explicitly.

If your application has been linked with the shareable NAG and MKL libraries then the environment variable LD_LIBRARY_PATH must be set or extended, as follows, to allow run-time linkage.

In the C shell, type:

  setenv LD_LIBRARY_PATH [INSTALL_DIR]/lib:[INSTALL_DIR]/mkl_intel64_11.3.3/lib/intel64
to set LD_LIBRARY_PATH, or
  setenv LD_LIBRARY_PATH [INSTALL_DIR]/lib:[INSTALL_DIR]/mkl_intel64_11.3.3/lib/intel64:\
to extend LD_LIBRARY_PATH if you already have it set.

In the Bourne shell, type:

to set LD_LIBRARY_PATH, or
to extend LD_LIBRARY_PATH if you already have it set.

Note that you may also need to set LD_LIBRARY_PATH to point at other items such as compiler run-time libraries, for example if you are using a newer version of the compiler.

If you are using a different compiler, or indeed a different version of the Intel compiler compiler, you may need to link against the Intel compiler run-time libraries provided in [INSTALL_DIR]/rtl.

3.1.1. Setting the number of threads to use

This implementation of the NAG Library, and MKL, make use of OpenMP to implement threading in some of the library routines. The number of threads that will be used at run time can be controlled by setting the environment variable OMP_NUM_THREADS to the appropriate value.

In the C shell type:

In the Bourne shell, type:
where N is the number of threads required. The environment variable OMP_NUM_THREADS may be re-set between each execution of the program, as desired. If you wish to change the number of threads to use for different parts of your program during execution, routines are provided in Chapter X06 of the NAG Library to assist with this process.

Multiple levels of OpenMP parallelism may be present in some NAG Library and MKL routines, and you may also call these multithreaded routines from within an OpenMP parallel region in your own application. By default, OpenMP nested parallelism is disabled, so only the outermost parallel region will actually be active, using N threads in the example above. The inner level(s) will not be active, i.e. they will run on one thread. You can check if OpenMP nested parallelism is enabled and choose to enable/disable it by either querying and setting the OMP_NESTED OpenMP environment variable or using the appropriate routines in Chapter X06. If OpenMP nested parallelism is enabled, the above example will create N threads at each parallel region for each thread at a higher level, thus N*N threads in total if there are two levels of OpenMP parallelism, etc. To provide more detailed control of nested parallelism, the environment variable OMP_NUM_THREADS can be set to be a comma separated list to specify the number of threads desired at each level.

In the C shell type:

In the Bourne shell, type:
This will create N threads for the first level of parallelism, and then P threads for each outer level thread when an inner level of parallelism is encountered.

Note: If the environment variable OMP_NUM_THREADS is not set, the default value can vary from compiler to compiler, and for different vendor libraries, usually to either be 1 or else equal to the maximum number of cores available on your system. The latter could be an issue if you are sharing the system with other users, or are running a higher level of parallelism within your own application. Thus it is recommended that you always set OMP_NUM_THREADS explicitly to your desired value.

In general, the maximum number of threads you are recommended to use is the number of physical cores on your shared memory system. However, most Intel processors support a facility known as Hyperthreading, which allows each physical core to support up to two threads at the same time and thus appear to the operating system as two logical cores. It may be beneficial to make use of this functionality, but this choice will depend on the particular algorithms and problem size(s) used. You are advised to benchmark performance critical applications with and without making use of the additional logical cores, to determine the best choice for you. This can normally be achieved simply by an appropriate choice for the number of threads to use, via OMP_NUM_THREADS. Completely disabling Hyperthreading normally requires setting the desired choice in the BIOS on your system at boot time.

3.1.2. Calling the Library from C or C++

With care, the NAG Fortran Library may be used from within a C or C++ environment. To assist the user make the mapping between Fortran and C types, a C/C++ header file ([INSTALL_DIR]/c_headers/nagmk26.h) is provided. It is recommended that users wishing to use a Library routine either copy and paste the relevant section of the file into their C or C++ application (making sure that the relevant #defines etc. are also copied from the top of the file) or simply include the header file with their application.

A document, techdoc.html, giving advice on calling the NAG Fortran Library from C and C++ is also available in [INSTALL_DIR]/c_headers.

3.2. Interface Blocks

The NAG Fortran Library interface blocks define the type and arguments of each user callable NAG Fortran Library routine. These are not essential to calling the NAG Fortran Library from Fortran programs. However, they are required if the supplied examples are used. Their purpose is to allow the Fortran compiler to check that NAG Fortran Library routines are called correctly. The interface blocks enable the compiler to check that:

(a) subroutines are called as such;
(b) functions are declared with the right type;
(c) the correct number of arguments are passed; and
(d) all arguments match in type and structure.

The NAG Fortran Library interface block files are organised by Library chapter. They are aggregated into one module named

The modules are supplied in pre-compiled form (.mod files) for the Intel compiler They can be accessed by specifying the -Ipathname option on each compiler invocation, where pathname ([INSTALL_DIR]/nag_interface_blocks) is the path of the directory containing the compiled interface blocks.

The .mod module files were compiled with the compiler shown in Section 2.2 of the Installer's Note. Such module files are compiler-dependent, so if you wish to use the NAG example programs, or use the interface blocks in your own programs, when using a compiler that is incompatible with these modules, you will first need to recompile the interface blocks with your own compiler version. A recompiled set of interface blocks can be created in a separate directory (e.g. nag_interface_blocks_alt) using the supplied script command

  [INSTALL_DIR]/scripts/nag_recompile_mods nag_interface_blocks_alt
from the [INSTALL_DIR] directory. This script uses the version of the Intel Fortran compiler from your PATH environment; to specify an alternative version it is safest to first run any Intel Fortran compiler environment scripts for that version prior to running [INSTALL_DIR]/scripts/nag_recompile_mods.

To make the new set of compiled modules the default set, move the directory [INSTALL_DIR]/nag_interface_blocks to [INSTALL_DIR]/nag_interface_blocks_original, and then move the directory containing the new set of modules [INSTALL_DIR]/nag_interface_blocks_alt to [INSTALL_DIR]/nag_interface_blocks.

You should now be able to use the newly compiled module files in the usual way.

3.3. Example Programs

The example results distributed were generated at Mark 26, using the software described in Section 2.2 of the Installer's Note. These example results may not be exactly reproducible if the example programs are run in a slightly different environment (for example, a different compiler, a different compiler runtime library, or a different set of BLAS or LAPACK routines). The results which are most sensitive to such differences are: eigenvectors (which may differ by a scalar multiple, often -1, but sometimes complex); numbers of iterations and function evaluations; and residuals and other "small" quantities of the same order as the machine precision.

The distributed example results are those obtained with the static library libnag_mkl.a (i.e. using the MKL BLAS and LAPACK routines). Running the examples with NAG BLAS or LAPACK may give slightly different results.

Note that the example material has been adapted, if necessary, from that published in the Library Manual, so that programs are suitable for execution with this implementation with no further changes. The distributed example programs should be used in preference to the versions in the Library Manual wherever possible. The example programs are most easily accessed by using one of the following scripts, which are located in the directory [INSTALL_DIR]/scripts:

Each command will provide you with a copy of an example program (and its data and options file, if any), compile the program and link it with the appropriate libraries (showing you the compile command so that you can recompile your own version of the program). Finally, the executable program will be run (with appropriate arguments specifying data, options and results files as needed), with the results being sent to a file and to the command window.

The example program concerned , and the number of OpenMP threads to use, are specified by the arguments to the command, e.g.

nag_example e04nrf 4
will copy the example program and its data and options files (e04nrfe.f90, e04nrfe.d and e04nrfe.opt) into the current directory, compile and link the program and run it using 4 OpenMP threads to produce the example program results in the file e04nrfe.r.

3.4. Fortran Types and Interpretation of Bold Italicised Terms

This implementation of the NAG Fortran Library uses 32-bit integers.

The NAG Library and documentation use parameterized types for floating-point variables. Thus, the type

appears in documentation of all NAG Fortran Library routines, where nag_wp is a Fortran KIND parameter. The value of nag_wp will vary between implementations, and its value can be obtained by use of the nag_library module. We refer to the type nag_wp as the NAG Library "working precision" type, because most floating-point arguments and internal variables used in the Library are of this type.

In addition, a small number of routines use the type

where nag_rp stands for "reduced precision type". Another type, not currently used in the Library, is
for "higher precision type" or "additional precision type".

For correct use of these types, see almost any of the example programs distributed with the Library.

For this implementation, these types have the following meanings:

      REAL (kind=nag_rp)      means REAL (i.e. single precision)
      REAL (kind=nag_wp)      means DOUBLE PRECISION
      COMPLEX (kind=nag_rp)   means COMPLEX (i.e. single precision complex)
      COMPLEX (kind=nag_wp)   means double precision complex (e.g. COMPLEX*16)

In addition, the Manual has adopted a convention of using bold italics to distinguish some terms. See Section 4.4 of How to Use the NAG Library and its Documentation for details.

3.5. Maintenance Level

The maintenance level of the Library can be determined by compiling and executing the example that calls A00AAF, or you could call one of the nag_example* scripts with the argument a00aaf. See Section 3.3. This example prints out details of the implementation, including title and product code, compiler and precision used, mark and maintenance level.

4. Routine-specific Information

Any further information which applies to one or more routines in this implementation is listed below.
  1. Routines that call User Functions within OpenMP Parallel Regions

    In this implementation, the following routines make calls to user functions from within OpenMP parallel regions located inside the NAG routines:

     D03RAF  D03RBF  E05SAF  E05SBF  E05UCF  E05USF  F01ELF  F01EMF
     F01FLF  F01FMF  F01JBF  F01JCF  F01KBF  F01KCF  

    Thus orphaned OpenMP directives can be used in user functions, unless you are not using the same compiler as that used to build your NAG Library implementation, as listed in the Installers' Note. You must also ensure that you use the user workspace arrays IUSER and RUSER in a thread safe manner, which is best achieved by only using them to supply read-only data to the user functions.

  2. C06

    In this implementation calls to the Intel Discrete Fourier Transforms Interface (DFTI) routines, from the supplied MKL library, are made whenever possible in the following NAG routines:
     C06PAF  C06PCF  C06PFF  C06PJF  C06PKF  C06PPF  C06PQF  C06PRF
     C06PSF  C06PUF  C06PVF  C06PWF  C06PXF  C06PYF  C06PZF  C06RAF  
     C06RBF  C06RCF  C06RDF
    The Intel DFTI routines allocate their own workspace internally, so no changes are needed to the size of workspace array WORK passed to the NAG C06 routines listed above from that specified in their respective library documents.
  3. F06, F07, F08 and F16

    In Chapters F06, F07, F08 and F16, alternate routine names are available for BLAS and LAPACK derived routines. For details of the alternate routine names please refer to the relevant Chapter Introduction. Note that applications should reference routines by their BLAS/LAPACK names, rather than their NAG-style names, for optimum performance.

    Many LAPACK routines have a "workspace query" mechanism which allows a caller to interrogate the routine to determine how much workspace to supply. Note that LAPACK routines from the MKL library may require a different amount of workspace from the equivalent NAG versions of these routines. Care should be taken when using the workspace query mechanism.

    In this implementation calls to BLAS and LAPACK routines are implemented by calls to MKL, except for the following routines:


    The following NAG named routines are wrappers to call LAPACK routines from the vendor library:
  4. G02

    The value of ACC, the machine-dependent constant mentioned in several documents in the chapter, is 1.0D-13.
  5. P01

    On hard failure, P01ABF writes the error message to the error message unit specified by X04AAF and then stops.
  6. S07 - S21

    The behaviour of functions in these Chapters may depend on implementation-specific values.

    General details are given in the Library Manual, but the specific values used in this implementation are as follows:

    S07AAF  F_1 = 1.0E+13
            F_2 = 1.0E-14
    S10AAF  E_1 = 1.8715E+1
    S10ABF  E_1 = 7.080E+2
    S10ACF  E_1 = 7.080E+2
    S13AAF  x_hi = 7.083E+2
    S13ACF  x_hi = 1.0E+16
    S13ADF  x_hi = 1.0E+17
    S14AAF  IFAIL = 1 if X > 1.70E+2
            IFAIL = 2 if X < -1.70E+2
            IFAIL = 3 if abs(X) < 2.23E-308
    S14ABF  IFAIL = 2 if X > x_big = 2.55E+305
    S15ADF  x_hi = 2.65E+1
    S15AEF  x_hi = 2.65E+1
    S15AGF  IFAIL = 1 if X >= 2.53E+307
            IFAIL = 2 if 4.74E+7 <= X < 2.53E+307
            IFAIL = 3 if X < -2.66E+1
    S17ACF  IFAIL = 1 if X > 1.0E+16
    S17ADF  IFAIL = 1 if X > 1.0E+16
            IFAIL = 3 if 0 < X <= 2.23E-308
    S17AEF  IFAIL = 1 if abs(X) > 1.0E+16
    S17AFF  IFAIL = 1 if abs(X) > 1.0E+16
    S17AGF  IFAIL = 1 if X > 1.038E+2
            IFAIL = 2 if X < -5.7E+10
    S17AHF  IFAIL = 1 if X > 1.041E+2
            IFAIL = 2 if X < -5.7E+10
    S17AJF  IFAIL = 1 if X > 1.041E+2
            IFAIL = 2 if X < -1.9E+9
    S17AKF  IFAIL = 1 if X > 1.041E+2
            IFAIL = 2 if X < -1.9E+9
    S17DCF  IFAIL = 2 if abs(Z) < 3.92223E-305
            IFAIL = 4 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 5 if abs(Z) or FNU+N-1 > 1.07374E+9
    S17DEF  IFAIL = 2 if AIMAG(Z) > 7.00921E+2
            IFAIL = 3 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 4 if abs(Z) or FNU+N-1 > 1.07374E+9
    S17DGF  IFAIL = 3 if abs(Z) > 1.02399E+3
            IFAIL = 4 if abs(Z) > 1.04857E+6
    S17DHF  IFAIL = 3 if abs(Z) > 1.02399E+3
            IFAIL = 4 if abs(Z) > 1.04857E+6
    S17DLF  IFAIL = 2 if abs(Z) < 3.92223E-305
            IFAIL = 4 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 5 if abs(Z) or FNU+N-1 > 1.07374E+9
    S18ADF  IFAIL = 2 if 0 < X <= 2.23E-308
    S18AEF  IFAIL = 1 if abs(X) > 7.116E+2
    S18AFF  IFAIL = 1 if abs(X) > 7.116E+2
    S18DCF  IFAIL = 2 if abs(Z) < 3.92223E-305
            IFAIL = 4 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 5 if abs(Z) or FNU+N-1 > 1.07374E+9
    S18DEF  IFAIL = 2 if REAL(Z) > 7.00921E+2
            IFAIL = 3 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 4 if abs(Z) or FNU+N-1 > 1.07374E+9
    S19AAF  IFAIL = 1 if abs(X) >= 5.04818E+1
    S19ABF  IFAIL = 1 if abs(X) >= 5.04818E+1
    S19ACF  IFAIL = 1 if X > 9.9726E+2
    S19ADF  IFAIL = 1 if X > 9.9726E+2
    S21BCF  IFAIL = 3 if an argument < 1.583E-205
            IFAIL = 4 if an argument >= 3.765E+202
    S21BDF  IFAIL = 3 if an argument < 2.813E-103
            IFAIL = 4 if an argument >= 1.407E+102
  7. X01

    The values of the mathematical constants are:

    X01AAF (pi) = 3.1415926535897932
    X01ABF (gamma) = 0.5772156649015328
  8. X02

    The values of the machine constants are:

    The basic parameters of the model

    X02BHF   = 2
    X02BJF   = 53
    X02BKF   = -1021
    X02BLF   = 1024

    Derived parameters of the floating-point arithmetic

    X02AJF   = 1.11022302462516E-16
    X02AKF   = 2.22507385850721E-308
    X02ALF   = 1.79769313486231E+308
    X02AMF   = 2.22507385850721E-308
    X02ANF   = 2.22507385850721E-308

    Parameters of other aspects of the computing environment

    X02AHF   = 1.42724769270596E+45
    X02BBF   = 2147483647
    X02BEF   = 15
  9. X04

    The default output units for error and advisory messages for those routines which can produce explicit output are both Fortran Unit 6.
  10. X06

    Chapter X06 routines also changes the behaviour of MKL threading in this implementation of the Library.

5. Documentation

The Library Manual is available as part of the installation or via download from the NAG website. The most up-to-date version of the documentation is accessible via the NAG website at

The Library Manual is supplied in the following formats:

The following main index files have been provided for these formats:

Use your web browser to navigate from here. For convenience, a master index file containing links to the above files has been provided at

Advice on viewing and navigating the formats available can be found in

In addition the following are provided:

Please see the Intel web site for further information about MKL (

6. Support from NAG

Please see

for information about the NAG Technical Support Service, including details of the NAG Technical Support Service contact points. We would also be delighted to receive your feedback on NAG's products and services.

7. Contact Addresses

Please see

for worldwide contact details for the Numerical Algorithms Group.