Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

Viewing Collected MPI Data

Once the MPI application performance analysis result is collected, you can open it in the graphical or command line interface of the Intel® VTune™ Amplifier.

To view the results in the command line interface:

Use the -report option. To get the list of all available VTune Amplifier reports, enter amplxe-cl-help report.

To view the results in the graphical interface:

Click the menu button, select Open > Result... and browse to the required result file (*.amplxe).

Tip

You may copy a result to another system and view it there (for example, to open a result collected on a Linux* cluster on a Windows* workstation).

VTune Amplifier classifies MPI functions as system functions similar to Intel Threading Building Blocks (Intel TBB) and OpenMP* functions. This approach helps you focus on your code rather than MPI internals. You can use the VTune Amplifier GUI Call Stack Mode filter bar combo box and CLI call-stack-mode option to enable displaying the system functions and thus view and analyze the internals of the MPI implementation. The call stack mode User functions+1 is especially useful to find the MPI functions that consumed most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, in the call chain main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ..., MPI_Bar() is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:

VTune Amplifier provides Intel TBB and OpenMP support. You are recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the VTune Amplifier to analyze the performance of that level of parallelism. The MPI, OpenMP, and Intel TBB features in the VTune Amplifier are functionally independent, so all usual features of OpenMP and Intel TBB support are applicable when looking into a result collected for an MPI process. For hybrid OpenMP and MPI applications, the VTune Amplifier displays a summary table listing top MPI ranks with OpenMP metrics sorted by MPI Busy Wait from low to high values. The lower the Communication time is, the longer a process was on a critical path of MPI application execution. For deeper analysis, explore Interpreting OpenMP* Analysis Data by MPI processes laying on the critical path.

Example

This example displays the performance report for functions and modules analyzed for Hotspots. Note that this example opens individual analysis results each of which was collected for a specific rank of MPI process (foo.14 and foo.15 ):

$ amplxe-cl -R hotspots -q -format text -r foo.14
Function Module CPU Time
-------- ------ --------
f        a.out  6.070
main     a.out  2.990

$ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14
Module CPU Time
------ --------
a.out  9.060

See Also