Intel® Math Kernel Library 11.3 Update 4 Developer Guide

Environment Variables for the Hybrid Offload

The table below lists Intel MKL environment variables to control runs of the Intel Optimized MP LINPACK Benchmark in the hybrid offload mode. Each of these environment variables has a default value. Use them with hybrid builds of the benchmark.

Note

While these environment variables impact performance of the hybrid offload binaries, they do not impact performance of other binaries. Environment variables for automatic offload, listed in Automatic Offload Controls, do not impact performance or behavior of the hybrid offload binaries.

Environment Variable

Description

Value

HPL_LARGEPAGE

Defines the memory mapping to be used for both the Intel Xeon processor and Intel Xeon Phi coprocessors.

0 or 1:

  • 0 - normal memory mapping, default.

  • 1 - memory mapping with large pages (2 MB per page mapping). It may increase performance.

HPL_LOG

Controls the level of detail for the HPL output.

An integer ranging from 0 to 2:

  • 0 - no log is displayed (printed).

  • 1 - only one root node displays a log, exactly the same as the ASYOUGO option provides.

  • 2 - the most detailed log is displayed. All P root nodes in the processor column that owns the current column block display a log.

HPL_HOST_CORE, HPL_HOST_NODE

Specifies cores or Non-Uniform Memory Access (NUMA) nodes to be used.

HPL_HOST_NODE requires NUMA mode to be enabled. You can check whether it is enabled by the numactl –-hardware command.

The default behavior is auto-detection of the core or NUMA node.

A list of integers ranging from 0 to the largest number of a core or NUMA node in the cluster and separated as explained in example 3.

HPL_SWAPWIDTH

Specifies width for each swap operation.

16 or 24. The default is 24.

HPL_MIC_DEVICE

Specifies Intel Xeon Phi coprocessor(s) to be used. All available Intel Xeon Phi coprocessors are used by default.

Note

To avoid oversubscription of resources that might occur if you use multiple MPI processes per node, set this environment variable to specify which coprocessors each MPI process should use.

A comma-separated list of integers, each ranging from 0 to the largest number of an Intel Xeon Phi coprocessor on the node.

HPL_PNUMMICS

Specifies the number of Intel Xeon Phi coprocessors to be used. The HPL_MIC_DEVICE environment variable takes precedence over HPL_PNUMMICS, and the value of HPL_PNUMMICS is ignored if you set HPL_MIC_DEVICE.

The default behavior is auto-detection of the number of coprocessors.

An integer ranging from 0 to the number of Intel Xeon Phi coprocessors in the node. If the value is 0, the core ignores all Intel Xeon Phi coprocessors.

HPL_MIC_CORE, HPL_MIC_NODE

Specifies which CPU core will be used for an Intel Xeon Phi coprocessor. Each Intel Xeon Phi coprocessor needs a dedicated CPU core. By setting these variables for an Intel Xeon Phi coprocessor, you reserve:

  • HPL_MIC_CORE - a specific core.

  • HPL_MIC_NODE - one of the cores on the specified NUMA node.

While the default for HPL_MIC_CORE is some core, the default for HPL_MIC_NODE is a core that the coprocessor shares with the same NUMA node.

An integer ranging from 0 to the largest number of a core or NUMA node for the coprocessor.

Can be provided in a comma-separated list, each integer corresponding to one coprocessor.

HPL_MIC_NUMCORES

Number of cores to be used for an Intel Xeon Phi coprocessor. All the coprocessor cores are used by default, which produces best performance.

An integer ranging from 1 to the number of cores of the coprocessor.

HPL_MIC_SHAREMODE

Specifies whether and how an Intel Xeon Phi coprocessor is shared among two MPI processes.

See example 5 for details.

An integer ranging from 0 to 2:

  • 0 - no sharing, default

  • 1 - the lower half of the cores will be used for the MPI process.

  • 2 - the upper half of the cores will be used for the MPI process.

HPL_MIC_EXQUEUES

Specifies the queue size on an Intel Xeon Phi coprocessor. Using a larger number is typically better while it increases the memory consumption for the Intel Xeon Phi coprocessor. If out of memory errors are encountered, try a lower number.

An integer ranging from 0 to 512. The default is 128.

HPL_MIC_WIDTH

Computation width for Intel Xeon Phi DGEMM/DTRSM. If the Intel Xeon Phi coprocessor memory is insufficient, change the settings as follows:

  1. Reduce HPL_MIC_WIDTH to 16.

    Note

    This might reduce performance of the node.

  2. If Intel Xeon Phi coprocessor still reports a memory allocation error:

    • Reduce the value of the HPL_MIC_EXQUEUES environment variable.

    • If the node has more than two Intel Xeon Phi coprocessors, use twice larger P, the number of rows in the processor grid.

  3. If memory allocation errors are still reported, keep reducing the problem size N until the errors are no longer reported.

16 or 24. The default is 24.

You can set HPL environment variables using the PMI_RANK and PMI_SIZE environment variables of the Intel MPI library, and you can create a shell script to automate the process.

Examples of Environment Settings for Hybrid Offload

#

Settings

Behavior of the Intel Optimized MP Linpack Benchmark

1

Nothing specified

xhpl uses all Intel Xeon processors and all Intel Xeon Phi coprocessors in the cluster.

2

HPL_PNUMMICS=0

xhpl ignores Intel Xeon Phi coprocessors and works as a regular HPL.

3

HPL_MIC_DEVICE=0,2

HPL_HOST_CORE=1-3,8-10

Only Intel Xeon Phi coprocessors 0 and 2 and Intel Xeon processor cores 1,2,3,8,9, and 10 are used.

4

HPL_HOST_NODE=1

Only cores on NUMA node 1 are used.

5

HPL_MIC_DEVICE=0,1

HPL_MIC_SHAREMODE=0,2

Only Intel Xeon Phi coprocessors 0 and 1 are used:

  • On the coprocessor 0, all cores are used.

  • On the coprocessor 1, the upper half of the cores is used.

    For a 61-core Intel Xeon Phi coprocessor, the upper half includes cores 31-61.

    This setting is useful to share an Intel Xeon Phi coprocessor among two MPI processes for an odd number of Intel Xeon Phi coprocessors.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

See Also