Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

Intel® Xeon Phi™ Processor (Code Name: Knights Landing) Analysis Workflow

The following figure shows basic steps required to analyze an application running on Intel® Xeon Phi™ processor (formerly code named Knights Landing) based on Intel Many Integrated Core Architecture (Intel® MIC Architecture) or perform a system-wide analysis using Intel® VTune™ Amplifier. Analysis is supported on a Linux* target with the self-boot version of the Intel Xeon Phi processor. You may choose to run one of the predefined analysis types, HPC Performance Characterization, Memory Access, General Exploration, Advanced Hotspots, or create a custom analysis type.

Note

Instrumentation-based collections such as Basic Hotspots, Concurrency, or Locks and Waits can cause a significant overhead on the number of worker threads. Use Advanced Hotspots instead of Basic Hotspots or HPC Performance Characterization rather than Concurrency or Locks and Waits to explore application scalability.

Prerequisites:

It is recommended to install the SEP driver for hardware event based sampling collection types such as HPC Performance Characterization, Memory Access, General Exploration, or Advanced Hotspots. If the SEP driver is not installed, Intel VTune Amplifier can work on Linux perf. Be aware of the following system configuration settings:

Note

The workflow represented in the diagram is the recommended flow to speed up the analysis process. It is possible to run the full Intel VTune Amplifier collection on the Intel Xeon Phi processor, but finalization and visualization might be slow. You can follow the regular analysis flow directly on the target Intel Xeon Phi processor. For more information, see Standalone GUI: Basic Workflow.

1.

Configure and run analysis on the target system with an Intel Xeon Phi processor

There are two ways to configure and run the analysis on the target system:

  • Finalization on host system (recommended): Use a command to run the analysis on the system with the Intel Xeon Phi processor without finalizing. This option results in the best performance.

    From a command prompt, run the collection with the deferred finalization option to calculate the binary check sum for proper symbol resolution on the host system. For example, to run a Memory Access analysis: amplxe-cl -collect memory-access -finalization-mode=deferred -r <my_result_dir> ./my_app

    For more information, see amplxe-cl Command Syntax and finalization-mode.

    Tip

    You can also generate a command using the VTune Amplifier GUI as described below. After generating the command, add the -finalization-mode=deferred option to the command to delay finalization.

  • Finalization on target system: Use the VTune Amplifier GUI on the host system to generate a command for the target system with the Intel Xeon Phi processor. Run and finalize the analysis on the target system. This method may not provide the fastest results.

    1. On the Analysis Target window, select Arbitrary Targets > local.

    2. Set the processor architecture to Intel® Processor code named Knights Landing and specify the operating system type.

    3. Enter the application name and parameters.

    4. Select the Use MPI Launcher checkbox and provide the launcher name, number of ranks, ranks to profile, and result location.

    5. Click Choose Analysis to switch to the Analysis Type window.

    6. Select and configure an analysis type.

    7. Click the Command Line button at the bottom of the window to generate the command.

    8. Copy the generated command to a command prompt on the target system and run the analysis. Finalization begins after the analysis completes. Finalization may take several minutes.

2.

Open the result on the host system

Copy the result to the host system (if the results collected on the target system are not available on the host via a share). Finalize the result if your command specified deferred finalization.

  1. Copy the result to the host system using SSH or a similar method.

  2. [Optional] Finalize the result by providing the result file and search directories to the binaries of interest if the module paths are different from the target system. For example: amplxe-cl -finalize -r <my_result_dir> -search-dir <my_binary_dir>

3.

Open and interpret analysis results

There are two ways to view the results:

See Also