Intel® Math Kernel Library 11.3 Update 4 Developer Guide

Running Hybrid Offload Binaries

Hybrid offload binaries of the Intel Optimized MP LINPACK Benchmark inform you how many Intel Xeon Phi coprocessors are detected during the run. Top of the output has a line like this:

Number of Intel Xeon Phi coprocessors: 1

Note

This number counts only one Intel Xeon Phi coprocessor per MPI process.

If Intel Xeon Phi coprocessors are available on your cluster and you expect offloading to occur, but the number printed is zero, it is likely that the correct compiler environment was not loaded. Specifically, check whether the LD_LIBRARY_PATH environment variable contains shared libraries libcoi_host.so.0 and libscif.so.0, which are installed by the Intel® Manycore Platform Software Stack (Intel® MPSS).

You can use environment variables specific to hybrid offload binaries to adjust the behavior of your runs. For a list of supported environment variables, see Environment Variables for the Hybrid Offload. Hybrid offload binaries also react to MKL_MIC_ENABLE and OFFLOAD_DEVICES environment variables for automatic offload to Intel Xeon Phi coprocessors.

The scripts runme_intel64 or runme_intel64_dynamic can set the HPL environment variables (such as HPL_MIC_DEVICE and HPL_MIC_SHAREMODE) for a given numbers of MPI ranks per node and Intel Xeon Phi coprocessors per node so that MPI ranks share the Intel Xeon Phi coprocessors optimally. Set the following variables at the top of these scripts according to your cluster configuration:

MPI_PROC_NUM

The total number of MPI processes.

MPI_PER_NODE

The number of MPI processes per each cluster node.

Tip

To get best performance of HPL, enable non-uniform memory access (NUMA) and set MPI_PER_NODE equal to the number of NUMA sockets.

NUMMIC

The number of Intel Xeon Phi coprocessors per each cluster node.

The scripts launch the hybrid offload HPL binary for a given number of MPI processes and write the results to the output file.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

See Also