Overview of the Intel Optimized MP LINPACK Benchmark

The Intel® Optimized MP LINPACK Benchmark for Clusters (Intel® Optimized MP LINPACK Benchmark) is based on modifications and additions to High-Performance LINPACK (HPL) 2.1 (http://www.netlib.org/benchmark/hpl) from Innovative Computing Laboratories (ICL) at the University of Tennessee, Knoxville. The Intel Optimized MP LINPACK Benchmark can be used for TOP500 runs (see http://www.top500.org). To use the benchmark you need be familiar with HPL usage. The Intel Optimized MP LINPACK Benchmark provides some additional enhancements designed to make the HPL usage more convenient and to use Intel® Message-Passing Interface (MPI) settings that may enhance performance. The ./benchmarks/mp_linpack directory adds techniques to minimize search times frequently associated with long runs.

The Intel® Optimized MP LINPACK Benchmark implements the Massively Parallel (MP) LINPACK benchmark using HPL code. It solves a random dense system of linear equations (Ax=b) in real*8 precision, measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. You are not limited to solving a number of equations N equal to 1000 because the implementation can be generalized to solve any size system of equations that meets the restrictions imposed by the MPI implementation chosen. The benchmark uses proper random number generation technique and full row pivoting to ensure the accuracy of the results.

Do not use this benchmark to report LINPACK 100 performance. Do not confuse this benchmark with:

LINPACK, the library, which has been expanded upon by the LAPACK library.
Intel Optimized LINPACK Benchmark, which is a shared memory (SMP) version of the same benchmark. While the Intel Optimized MP LINPACK Benchmark can be run on both a single node and a cluster, the Intel Optimized LINPACK Benchmark can only be run on a single node.

Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your systems based on genuine Intel processors more easily than with the standard HPL benchmark. Use the Intel Optimized MP LINPACK Benchmark to benchmark your cluster. The prebuilt binaries require Intel® MPI library be installed on the cluster.

Note

Intel Optimized MP LINPACK Benchmark prebuilt binaries cannot run with the symmetric model of the Intel MPI library. For details, see the article at https://software.intel.com/en-us/articles/using-the-intel-mpi-library-on-intel-xeon-phi-coprocessor-systems.

The run-time version of Intel MPI library is free and can be downloaded from http://www.intel.com/software/products/ .

Note

The prebuilt binaries are hybrid offload binaries. They work even when the system does not have any Intel® Xeon Phi™ coprocessors.

The Intel package includes software developed at the University of Tennessee, Knoxville, ICL, and neither the University nor ICL endorse or promote this product. Although HPL 2.1 is redistributable under certain conditions, this particular package is subject to the Intel MKL license.

Intel MKL has introduced a new hybrid build functionality into Intel Optimized MP LINPACK Benchmark, while continuing to support the previous, non-hybrid build. The term hybrid refers to special optimizations added to take advantage of mixed OpenMP*/MPI parallelism.

If you want to use one MPI process per node and to achieve further parallelism by means of OpenMP, use the hybrid build. In general, the hybrid build is useful when the number of MPI processes per core is less than one. If you want to rely exclusively on MPI for cross-node parallelism and use one MPI process per core, use the non-hybrid build.

In addition to supplying certain hybrid prebuilt binaries, Intel MKL supplies some hybrid prebuilt libraries for Intel® MPI to take advantage of the additional OpenMP optimizations.

To enable you to offload computations from recent Intel® Xeon® processors to between zero and eight Intel Xeon Phi coprocessors, Intel MKL supplies a hybrid offload binary. The hybrid offload binary contains the latest optimizations for Intel® Core™ processors, and you are encouraged to use this binary even when the system does not have any Intel Xeon Phi coprocessors. The hybrid offload binary uses system-specific threading APIs to exploit mixed parallelism.

If you want to use an MPI version other than Intel MPI, you can do so by using the MP LINPACK source code provided. You can use the source code to build a non-hybrid version that may be used in a hybrid mode, but it would be missing some of the optimizations added to the hybrid version.

Non-hybrid builds are the default of the source code makefiles provided. To use the non-hybrid code in a hybrid mode, use the threaded version of Intel MKL BLAS, link with a thread-safe MPI (for example: use the -mt_mpi option with Intel MPI library), and call function MPI_init_thread() so that MPI is thread-safe.

Intel MKL provides prebuilt binaries that are linked against Intel MPI libraries either statically or dynamically.

Note

Performance of statically and dynamically linked prebuilt binaries may be different. The performance of both depends on the version of Intel MPI you are using.
You can build binaries statically linked against a particular version of Intel MPI by yourself.

Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804