Intel® Math Kernel Library 11.3 Update 4 Developer Guide
The table below lists Intel MKL environment variables to control runs of the Intel Optimized MP LINPACK Benchmark in the hybrid offload mode. Each of these environment variables has a default value. Use them with hybrid builds of the benchmark.
While these environment variables impact performance of the hybrid offload binaries, they do not impact performance of other binaries. Environment variables for automatic offload, listed in Automatic Offload Controls, do not impact performance or behavior of the hybrid offload binaries.
Environment Variable |
Description |
Value |
---|---|---|
HPL_LARGEPAGE |
Defines the memory mapping to be used for both the Intel Xeon processor and Intel Xeon Phi coprocessors. |
0 or 1:
|
HPL_LOG |
Controls the level of detail for the HPL output. |
An integer ranging from 0 to 2:
|
Specifies cores or Non-Uniform Memory Access (NUMA) nodes to be used. HPL_HOST_NODE requires NUMA mode to be enabled. You can check whether it is enabled by the numactl –-hardware command. The default behavior is auto-detection of the core or NUMA node. |
A list of integers ranging from 0 to the largest number of a core or NUMA node in the cluster and separated as explained in example 3. |
|
HPL_SWAPWIDTH |
Specifies width for each swap operation. |
16 or 24. The default is 24. |
Specifies Intel Xeon Phi coprocessor(s) to be used. All available Intel Xeon Phi coprocessors are used by default. NoteTo avoid oversubscription of resources that might occur if you use multiple MPI processes per node, set this environment variable to specify which coprocessors each MPI process should use. |
A comma-separated list of integers, each ranging from 0 to the largest number of an Intel Xeon Phi coprocessor on the node. |
|
HPL_PNUMMICS |
Specifies the number of Intel Xeon Phi coprocessors to be used. The HPL_MIC_DEVICE environment variable takes precedence over HPL_PNUMMICS, and the value of HPL_PNUMMICS is ignored if you set HPL_MIC_DEVICE. The default behavior is auto-detection of the number of coprocessors. |
An integer ranging from 0 to the number of Intel Xeon Phi coprocessors in the node. If the value is 0, the core ignores all Intel Xeon Phi coprocessors. |
HPL_MIC_CORE, HPL_MIC_NODE |
Specifies which CPU core will be used for an Intel Xeon Phi coprocessor. Each Intel Xeon Phi coprocessor needs a dedicated CPU core. By setting these variables for an Intel Xeon Phi coprocessor, you reserve:
While the default for HPL_MIC_CORE is some core, the default for HPL_MIC_NODE is a core that the coprocessor shares with the same NUMA node. |
An integer ranging from 0 to the largest number of a core or NUMA node for the coprocessor. Can be provided in a comma-separated list, each integer corresponding to one coprocessor. |
HPL_MIC_NUMCORES |
Number of cores to be used for an Intel Xeon Phi coprocessor. All the coprocessor cores are used by default, which produces best performance. |
An integer ranging from 1 to the number of cores of the coprocessor. |
Specifies whether and how an Intel Xeon Phi coprocessor is shared among two MPI processes. See example 5 for details. |
An integer ranging from 0 to 2:
|
|
Specifies the queue size on an Intel Xeon Phi coprocessor. Using a larger number is typically better while it increases the memory consumption for the Intel Xeon Phi coprocessor. If out of memory errors are encountered, try a lower number. |
An integer ranging from 0 to 512. The default is 128. |
|
HPL_MIC_WIDTH |
Computation width for Intel Xeon Phi DGEMM/DTRSM. If the Intel Xeon Phi coprocessor memory is insufficient, change the settings as follows:
|
16 or 24. The default is 24. |
You can set HPL environment variables using the PMI_RANK and PMI_SIZE environment variables of the Intel MPI library, and you can create a shell script to automate the process.
# |
Settings |
Behavior of the Intel Optimized MP Linpack Benchmark |
---|---|---|
1 |
Nothing specified |
xhpl uses all Intel Xeon processors and all Intel Xeon Phi coprocessors in the cluster. |
2 |
HPL_PNUMMICS=0 |
xhpl ignores Intel Xeon Phi coprocessors and works as a regular HPL. |
3 |
HPL_HOST_CORE=1-3,8-10 |
Only Intel Xeon Phi coprocessors 0 and 2 and Intel Xeon processor cores 1,2,3,8,9, and 10 are used. |
4 |
HPL_HOST_NODE=1 |
Only cores on NUMA node 1 are used. |
5 |
HPL_MIC_SHAREMODE=0,2 |
Only Intel Xeon Phi coprocessors 0 and 1 are used:
|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 |