Environment Variables for the Hybrid Offload

The table below lists Intel MKL environment variables to control runs of the Intel Optimized MP LINPACK Benchmark in the hybrid offload mode. Each of these environment variables has a default value. Use them with hybrid builds of the benchmark.

Note

While these environment variables impact performance of the hybrid offload binaries, they do not impact performance of other binaries. Environment variables for automatic offload, listed in Automatic Offload Controls, do not impact performance or behavior of the hybrid offload binaries.

Environment Variable	Description	Value
`HPL_LARGEPAGE`	Defines the memory mapping to be used for both the Intel Xeon processor and Intel Xeon Phi coprocessors.	0 or 1: 0 - normal memory mapping, default. 1 - memory mapping with large pages (2 MB per page mapping). It may increase performance.
`HPL_LOG`	Controls the level of detail for the HPL output.	An integer ranging from 0 to 2: 0 - no log is displayed (printed). 1 - only one root node displays a log, exactly the same as the ASYOUGO option provides. 2 - the most detailed log is displayed. All `P` root nodes in the processor column that owns the current column block display a log.
`HPL_HOST_CORE`, `HPL_HOST_NODE`	Specifies cores or Non-Uniform Memory Access (NUMA) nodes to be used. `HPL_HOST_NODE` requires NUMA mode to be enabled. You can check whether it is enabled by the `numactl –-hardware` command. The default behavior is auto-detection of the core or NUMA node.	A list of integers ranging from 0 to the largest number of a core or NUMA node in the cluster and separated as explained in example 3.
`HPL_SWAPWIDTH`	Specifies width for each swap operation.	16 or 24. The default is 24.
`HPL_MIC_DEVICE`	Specifies Intel Xeon Phi coprocessor(s) to be used. All available Intel Xeon Phi coprocessors are used by default. Note To avoid oversubscription of resources that might occur if you use multiple MPI processes per node, set this environment variable to specify which coprocessors each MPI process should use.	A comma-separated list of integers, each ranging from 0 to the largest number of an Intel Xeon Phi coprocessor on the node.
`HPL_PNUMMICS`	Specifies the number of Intel Xeon Phi coprocessors to be used. The `HPL_MIC_DEVICE` environment variable takes precedence over `HPL_PNUMMICS`, and the value of `HPL_PNUMMICS` is ignored if you set `HPL_MIC_DEVICE`. The default behavior is auto-detection of the number of coprocessors.	An integer ranging from 0 to the number of Intel Xeon Phi coprocessors in the node. If the value is 0, the core ignores all Intel Xeon Phi coprocessors.
`HPL_MIC_CORE`, `HPL_MIC_NODE`	Specifies which CPU core will be used for an Intel Xeon Phi coprocessor. Each Intel Xeon Phi coprocessor needs a dedicated CPU core. By setting these variables for an Intel Xeon Phi coprocessor, you reserve: `HPL_MIC_CORE` - a specific core. `HPL_MIC_NODE` - one of the cores on the specified NUMA node. While the default for `HPL_MIC_CORE` is some core, the default for `HPL_MIC_NODE` is a core that the coprocessor shares with the same NUMA node.	An integer ranging from 0 to the largest number of a core or NUMA node for the coprocessor. Can be provided in a comma-separated list, each integer corresponding to one coprocessor.
`HPL_MIC_NUMCORES`	Number of cores to be used for an Intel Xeon Phi coprocessor. All the coprocessor cores are used by default, which produces best performance.	An integer ranging from 1 to the number of cores of the coprocessor.
`HPL_MIC_SHAREMODE`	Specifies whether and how an Intel Xeon Phi coprocessor is shared among two MPI processes. See example 5 for details.	An integer ranging from 0 to 2: 0 - no sharing, default 1 - the lower half of the cores will be used for the MPI process. 2 - the upper half of the cores will be used for the MPI process.
`HPL_MIC_EXQUEUES`	Specifies the queue size on an Intel Xeon Phi coprocessor. Using a larger number is typically better while it increases the memory consumption for the Intel Xeon Phi coprocessor. If out of memory errors are encountered, try a lower number.	An integer ranging from 0 to 512. The default is 128.
`HPL_MIC_WIDTH`	Computation width for Intel Xeon Phi `DGEMM/DTRSM`. If the Intel Xeon Phi coprocessor memory is insufficient, change the settings as follows: Reduce `HPL_MIC_WIDTH` to 16. Note This might reduce performance of the node. If Intel Xeon Phi coprocessor still reports a memory allocation error: Reduce the value of the `HPL_MIC_EXQUEUES` environment variable. If the node has more than two Intel Xeon Phi coprocessors, use twice larger `P`, the number of rows in the processor grid. If memory allocation errors are still reported, keep reducing the problem size `N` until the errors are no longer reported.	16 or 24. The default is 24.

You can set HPL environment variables using the PMI_RANK and PMI_SIZE environment variables of the Intel MPI library, and you can create a shell script to automate the process.

Examples of Environment Settings for Hybrid Offload

#	Settings	Behavior of the Intel Optimized MP Linpack Benchmark
1	Nothing specified	xhpl uses all Intel Xeon processors and all Intel Xeon Phi coprocessors in the cluster.
2	`HPL_PNUMMICS`=0	xhpl ignores Intel Xeon Phi coprocessors and works as a regular HPL.
3	`HPL_MIC_DEVICE`=0,2 `HPL_HOST_CORE`=1-3,8-10	Only Intel Xeon Phi coprocessors 0 and 2 and Intel Xeon processor cores 1,2,3,8,9, and 10 are used.
4	`HPL_HOST_NODE`=1	Only cores on NUMA node 1 are used.
5	`HPL_MIC_DEVICE`=0,1 `HPL_MIC_SHAREMODE`=0,2	Only Intel Xeon Phi coprocessors 0 and 1 are used: On the coprocessor 0, all cores are used. On the coprocessor 1, the upper half of the cores is used. For a 61-core Intel Xeon Phi coprocessor, the upper half includes cores 31-61. This setting is useful to share an Intel Xeon Phi coprocessor among two MPI processes for an odd number of Intel Xeon Phi coprocessors.

Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Environment Variables for the Hybrid Offload

Note

Note

Note

Examples of Environment Settings for Hybrid Offload

See Also