Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Once the MPI application performance analysis result is collected, you can open it in the graphical or command line interface of the Intel® VTune™ Amplifier.
To view the results in the command line interface:
Use the -report option. To get the list of all available VTune Amplifier reports, enter amplxe-cl-help report.
To view the results in the graphical interface:
Click the menu button, select Open > Result... and browse to the required result file (*.amplxe).
You may copy a result to another system and view it there (for example, to open a result collected on a Linux* cluster on a Windows* workstation).
VTune Amplifier classifies MPI functions as system functions similar to Intel Threading Building Blocks (Intel TBB) and OpenMP* functions. This approach helps you focus on your code rather than MPI internals. You can use the VTune Amplifier GUI Call Stack Mode filter bar combo box and CLI call-stack-mode option to enable displaying the system functions and thus view and analyze the internals of the MPI implementation. The call stack mode User functions+1 is especially useful to find the MPI functions that consumed most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, in the call chain main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ..., MPI_Bar() is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:
The Only user functions call stack mode attributes the time spent in the MPI calls to the user function foo() so that you can see which of your functions you can change to actually improve the performance.
The default User functions+1 mode attributes the time spent in the MPI implementation to the top-level system function - MPI_Bar() so that you can easily see outstandingly heavy MPI calls.
The User/system functions mode shows the call tree without any re-attribution so that you can see where exactly in the MPI library the time was spent.
VTune Amplifier provides Intel TBB and OpenMP support. You are recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the VTune Amplifier to analyze the performance of that level of parallelism. The MPI, OpenMP, and Intel TBB features in the VTune Amplifier are functionally independent, so all usual features of OpenMP and Intel TBB support are applicable when looking into a result collected for an MPI process. For hybrid OpenMP and MPI applications, the VTune Amplifier displays a summary table listing top MPI ranks with OpenMP metrics sorted by MPI Busy Wait from low to high values. The lower the Communication time is, the longer a process was on a critical path of MPI application execution. For deeper analysis, explore Interpreting OpenMP* Analysis Data by MPI processes laying on the critical path.
This example displays the performance report for functions and modules analyzed for Hotspots. Note that this example opens individual analysis results each of which was collected for a specific rank of MPI process (foo.14 and foo.15 ):
$ amplxe-cl -R hotspots -q -format text -r foo.14
Function Module CPU Time
-------- ------ --------
f a.out 6.070
main a.out 2.990
$ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14
Module CPU Time
------ --------
a.out 9.060