Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
The following figure shows basic steps required to analyze an application running on Intel® Xeon Phi™ processor (formerly code named Knights Landing) based on Intel Many Integrated Core Architecture (Intel® MIC Architecture) or perform a system-wide analysis using Intel® VTune™ Amplifier. Analysis is supported on a Linux* target with the self-boot version of the Intel Xeon Phi processor. You may choose to run one of the predefined analysis types, HPC Performance Characterization, Memory Access, General Exploration, Advanced Hotspots, or create a custom analysis type.
Instrumentation-based collections such as Basic Hotspots, Concurrency, or Locks and Waits can cause a significant overhead on the number of worker threads. Use Advanced Hotspots instead of Basic Hotspots or HPC Performance Characterization rather than Concurrency or Locks and Waits to explore application scalability.
Prerequisites:
It is recommended to install the SEP driver for hardware event based sampling collection types such as HPC Performance Characterization, Memory Access, General Exploration, or Advanced Hotspots. If the SEP driver is not installed, Intel VTune Amplifier can work on Linux perf. Be aware of the following system configuration settings:
To enable system-wide and uncore event collection that allows the measurement of DRAM and MCDRAM memory bandwidth that is a part of the Memory Access and HPC Performance Characterization analysis types, use root or sudo to set /proc/sys/kernel/perf_event_paranoid to 0.
>echo 0>/proc/sys/kernel/perf_event_paranoid
To enable collection with the General Exploration analysis type, increase the default limit of opened file descriptors. Use root or sudo to increase the default value in /etc/security/limits.conf to 100*<number_of_logical_CPU_cores>.
<user> hard nofile <100 * number_of_logic_CPU_cores>
<user> soft nofile <100 * number_of_logic_CPU_cores>
The workflow represented in the diagram is the recommended flow to speed up the analysis process. It is possible to run the full Intel VTune Amplifier collection on the Intel Xeon Phi processor, but finalization and visualization might be slow. You can follow the regular analysis flow directly on the target Intel Xeon Phi processor. For more information, see Standalone GUI: Basic Workflow.
1. |
Configure and run analysis on the target system with an Intel Xeon Phi processor |
There are two ways to configure and run the analysis on the target system:
|
2. |
Open the result on the host system |
Copy the result to the host system (if the results collected on the target system are not available on the host via a share). Finalize the result if your command specified deferred finalization.
|
3. |
Open and interpret analysis results |
There are two ways to view the results:
|