Intel® Math Kernel Library 11.3 Update 4 Developer Guide
To run the Intel Optimized MP LINPACK Benchmark on multiple nodes, you need to use MPI and either modify the HPL.dat or use Ease-of-use Command-line Parameters as explained in this section.
While these instructions assume the Intel® 64 architecture, they are more widely applicable. The instructions directly apply to Previous Generation Intel® Core™ or higher Intel® processors. For IA-32 architecture processors and for earlier Intel® 64 architecture processors, omit the version parameter of the make command. For IA-32 architecture processors, also adjust directory names and the value of the arch parameter.
To expand runs of the Intel Optimized MP LINPACK Benchmark to more nodes, perform these steps:
Load the necessary environment variables for Intel MKL, Intel MPI, and the Intel® compiler and build the binary:
<parent directory>/bin/compilervars.sh intel64
<mpi directory>/bin64/mpivars.sh
<mkl directory>/bin/mklvars.sh intel64
make arch=intel64 version=offload
Change directory to bin/intel64:
cd <mkl directory>/benchmarks/mp_linpack/bin/intel64
This directory contains files:
xhpl - the Intel® 64 architecture binary.
HPL.dat - the HPL input data set.
In the HPL.dat file, set the problem size N to 10000. Because this setting is for a test run, the problem size should be small.
In the HPL.dat file, set the parameters Ps and Qs so that Ps * Qs equals the number of nodes. For example, for 2 nodes, set Ps to 1 and Qs to 2. It is easier to achieve optimal result if Ps = Qs, so choose them as close to each other as possible so that Ps ≤ Qs.
The resulting HPL.dat file for 2 nodes is as follows:
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 10000 Ns 1 # of NBs 1280 NBs 1 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 2 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 0 SWAP (0=bin-exch,1=long,2=mix) 1 swapping threshold 1 L1 in (0=transposed,1=no-transposed) form 1 U in (0=transposed,1=no-transposed) form 0 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) Alternatively, launch with –n, -p, and –q parameters and leave the HPL.dat file as is.
The resulting HPL.dat file for 4 nodes is as follows:
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 10000 Ns 1 # of NBs 1280 NBs 1 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 2 Ps 2 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 0 SWAP (0=bin-exch,1=long,2=mix) 1 swapping threshold 1 L1 in (0=transposed,1=no-transposed) form 1 U in (0=transposed,1=no-transposed) form 0 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) Alternatively, launch with –n, -p, -q parameters and leave the HPL.dat file as is.
Run the xhpl binary under MPI control on two nodes:
mpirun --perhost 1 -n 2 -hosts Node1,Node2 \
-genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \
-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl
Rerun the HPL test increasing the size of the problem until the matrix size uses about 80% of the available memory. To do this, either modify Ns in HPL.dat or use the -m command-line parameter.
For specifics of running hybrid offload binaries, see Running Hybrid Offload Binaries.