Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

MPI Application Analysis Configuration

Use the Intel® VTune™ Amplifier command line interface (amplxe-cl) to configure and run performance analysis of an MPI application.

To collect performance data for an MPI application with the VTune Amplifier, use the command line interface (amplxe-cl). The collection configuration can be completed with the help of the Analysis Target configuration options in the VTune Amplifier user interface. For more information, see Arbitrary Targets Configuration.

Usually, MPI jobs are started using an MPI launcher such as mpirun, mpiexec, srun, aprun, etc. The examples provided use mpirun. A typical MPI job uses the following syntax:

mpirun [options] <program> [<args>]

VTune Amplifier is launched using <program> and your application is launched using the VTune Amplifier command arguments. As a result, launching an MPI application using VTune Amplifier uses the following syntax:

mpirun [options] amplxe-cl [options] <program> [<args>]

There are several options for mpirun and amplxe-cl that must be specified or are highly recommended while others can use the default settings. A typical command uses the following syntax:

mpirun -n <n> -l amplxe-cl -quiet -collect <analysis_type> -trace-mpi -result-dir <my_result> my_app [<my_app_options>]

The mpirun options include:

The amplxe-cl options include:

If a MPI application is launched on multiple nodes, VTune Amplifier creates a number of result directories per compute node in the current directory, named as my_result.<hostname1>, my_result.<hostname2>, ... my_result.<hostnameN>, encapsulating the data for all the ranks running on the node in the same directory. For example, the Advanced Hotspots analysis run on 4 nodes collects data on each compute node:

> mpirun -n 16 –ppn 4 –l amplxe-cl -collect advanced-hotspots -trace-mpi -result-dir my_result -- my_app.a

Each process data is presented for each node they were running on:

my_result.host_name1 (rank 0-3)
my_result.host_name2 (rank 4-7)
my_result.host_name3 (rank 8-11)
my_result.host_name4 (rank 12-15)

If you want to profile particular ranks (for example, outlier ranks defined by MPI Performance Snapshot), use selective rank profiling. Use multi-binary MPI run and apply VTune Amplifier profiling for the ranks of interest. This significantly reduces the amount of data required to process and analyze. The following example collects Advanced Hotspots analysis data for 2 out of 16 processes with 1 rank per node:

export VTUNE_CL=amplxe-cl -collect memory-access -trace-mpi -result-dir my_result
$ mpirun -host myhost1 -n 7 my_app.a : -host myhost1 -n 1 $VTUNE_CL -- my_app.a :-host myhost2 -n 7 my_app.a : -host myhost2 -n 1 $VTUNE_CL -- my_app.a

Alternatively, you can create a configuration file with the following content:

# config.txt configuration file
-host myhost1 -n 7 ./a.out
-host myhost1 -n 1 amplxe-cl -quiet -collect memory-access -trace-mpi -result-dir my_result ./a.out
-host myhost2 -n 7 ./a.out
-host myhost2 -n 1 amplxe-cl -quiet -collect memory-access -trace-mpi -result-dir my_result ./a.out

To run the collection using the configuration file, use the following command:

> mpirun -configfile ./config.txt

If you use Intel MPI with version 5.0.2 or later you can use the -gtool option with the Intel MPI process launcher for easier selective rank profiling:

> mpirun -n <n> -gtool "amplxe-cl -collect <analysis type> -r <my_result>:<rank_set>" <my_app> [my_app_options]

where <rank_set> specifies a ranks range to be involved in the tool execution. Separate ranks with a comma or use the “-” symbol for a set of contiguous ranks.

For example:

> mpirun -gtool "amplxe-cl -collect memory-access -result-dir my_result:7,5" my_app.a

Examples:

  1. This example runs the HPC Performance Characterization analysis type (based on the sampling driver), which is recommended as a starting point:

    > mpirun -n 4 amplxe-cl -result-dir my_result -collect hpc-performance -- my_app [my_app_options]

  2. This example collects the Advanced Hotspots data for two out of 16 processes run on myhost2 in the job distributed across the hosts:

    > mpirun -host myhost1 -n 8 ./a.out : -host myhost2 -n 6 ./a.out : -host myhost2 -n 2 amplxe-cl -result-dir foo -c advanced-hotspots ./a.out

    As a result, the VTune Amplifier creates a result directory in the current directory foo.myhost2 (given that process ranks 14 and 15 were assigned to the second node in the job).

  3. As an alternative to the previous example, you can create a configuration file with the following content:

    # config.txt configuration file
    -host myhost1 -n 8 ./a.out
    -host myhost2 -n 6 ./a.out
    -host myhost2 -n 2 amplxe-cl -quiet -collect advanced-hotspots -result-dir foo ./a.out

    and run the data collection as:

    > mpirun -configfile ./config.txt

    to achieve the same result as in the previous example: foo.myhost2 result directory is created.

  4. This example runs the Memory Access analysis with memory object profiling for all ranks on all nodes:

    > mpirun n 16 -ppn 4 amplxe-cl -r my_result -collect memory-access -knob analyze-mem-objects=true -my_app [my_app_options]

  5. This example runs Advanced Hotspots analysis on ranks 1, 4-6, 10:

    > mpirun –gtool “amplxe-cl -r my_result -collect advanced-hotspots: 1,4-6,10” –n 16 -ppn 4 my_app [my_app_options]

Note

The examples above use the mpirun command as opposed to mpiexec and mpiexec.hydra while real-world jobs might use the mpiexec* ones. mpirun is a higher-level command that dispatches to mpiexec or mpiexec.hydra depending on the current default and options passed. All the listed examples work for the mpiexec* commands as well as the mpirun command.

See Also