Expanding the Benchmark to Two or More Nodes

To run the Intel Optimized MP LINPACK Benchmark on multiple nodes, you need to use MPI and either modify the HPL.dat or use Ease-of-use Command-line Parameters as explained in this section.

Note

While these instructions assume the Intel® 64 architecture, they are more widely applicable. The instructions directly apply to Previous Generation Intel® Core™ or higher Intel® processors. For IA-32 architecture processors and for earlier Intel® 64 architecture processors, omit the version parameter of the make command. For IA-32 architecture processors, also adjust directory names and the value of the arch parameter.

To expand runs of the Intel Optimized MP LINPACK Benchmark to more nodes, perform these steps:

Load the necessary environment variables for Intel MKL, Intel MPI, and the Intel® compiler and build the binary:

<parent directory>/bin/compilervars.sh intel64

<mpi directory>/bin64/mpivars.sh

<mkl directory>/bin/mklvars.sh intel64

make arch=intel64 version=offload
Change directory to bin/intel64:

cd <mkl directory>/benchmarks/mp_linpack/bin/intel64

This directory contains files:
- xhpl - the Intel® 64 architecture binary.
- HPL.dat - the HPL input data set.
In the HPL.dat file, set the problem size N to 10000. Because this setting is for a test run, the problem size should be small.

In the HPL.dat file, set the parameters Ps and Qs so that Ps * Qs equals the number of nodes. For example, for 2 nodes, set Ps to 1 and Qs to 2. It is easier to achieve optimal result if Ps = Qs, so choose them as close to each other as possible so that Ps ≤ Qs.

The resulting HPL.dat file for 2 nodes is as follows:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
10000        Ns
1            # of NBs
1280         NBs
1            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
0            SWAP (0=bin-exch,1=long,2=mix)
1            swapping threshold
1            L1 in (0=transposed,1=no-transposed) form
1            U  in (0=transposed,1=no-transposed) form
0            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

Alternatively, launch with –n, -p, and –q parameters and leave the HPL.dat file as is.

The resulting HPL.dat file for 4 nodes is as follows:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
10000        Ns
1            # of NBs
1280         NBs
1            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
2            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
0            SWAP (0=bin-exch,1=long,2=mix)
1            swapping threshold
1            L1 in (0=transposed,1=no-transposed) form
1            U  in (0=transposed,1=no-transposed) form
0            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

Alternatively, launch with –n, -p, -q parameters and leave the HPL.dat file as is.

Run the xhpl binary under MPI control on two nodes:

mpirun --perhost 1 -n 2 -hosts Node1,Node2 \

-genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \

-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl
Rerun the HPL test increasing the size of the problem until the matrix size uses about 80% of the available memory. To do this, either modify Ns in HPL.dat or use the -m command-line parameter.

For specifics of running hybrid offload binaries, see Running Hybrid Offload Binaries.

Expanding the Benchmark to Two or More Nodes

Note

See Also