Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Use the Summary window as your starting point of the performance analysis with the Intel® VTune™ Amplifier. To access this window, select the Memory Usageviewpoint and click the Summary sub-tab in the result tab.
Depending on the analysis type, the Summary window provides the following application-level statistics in the Memory Usage viewpoint:
You may click the Copy to Clipboard button to copy the content of the selected summary section to the clipboard.
The Summary window displays a list of memory-related CPU metrics that help you estimate an overall memory usage during application execution. For a metric description, hover over the corresponding question mark icon to read the pop-up help:
Memory Bound metrics are measured either as Clockticks or as Pipeline Slots. Metrics measured in Clockticks are less precise compared to the metrics measured in Pipeline Slots since they may overlap and their sum at some level does not necessarily match the parent metric value. But such metrics are still useful for identifying the dominant performance bottleneck in the code.
Mouse over a flagged value with the performance issue and read the recommendation for further analysis. For example, a high Memory Bound value typically indicates that a significant fraction of the execution pipeline slots could be stalled due to a demand memory load and stores. For further details, you may switch to the Bottom-up window and explore metric data per memory object.
A high DRAM Bandwidth Bound metric value indicates that your system spent much time heavily utilizing the DRAM bandwidth. The calculation of this metric relies on the accurate maximum system DRAM bandwidth measurement provided in the System Bandwidth section below.
This section provides various system bandwidth-related properties detected by the product. Depending on the number of sockets on your system, the following types of system bandwidth are measured:
Max DRAM System Bandwidth |
Maximum DRAM bandwidth measured for the whole system (across all packages) by running a micro-benchmark before the collection starts. If the system has already been actively loaded at the moment of collection start (for example, with the attach mode), the value may be less accurate. |
Max DRAM Single-Package Bandwidth |
Maximum DRAM bandwidth for single package measured by running a micro-benchmark before the collection starts. If the system has already been actively loaded at the moment of collection start (for example, with the attach mode), the value may be less accurate. |
These values are used to define default High, Medium and Low bandwidth utilization thresholds for the Bandwidth Utilization Histogram and to scale overtime bandwidth graphs in the Bottom-up view. By default, for Memory Analysis results the system bandwidth is measured automatically. To enable this functionality for custom analysis results, make sure to select the Evaluate max DRAM bandwidth option.
This histogram shows how much time the system bandwidth was utilized by a certain value (Bandwidth Domain) and provides thresholds to categorize bandwidth utilization as High, Medium and Low. You can set the threshold by moving sliders at the bottom.
If you switch to the Bottom-up window and group the grid data by ../Bandwidth Utilization Type/.., you can identify functions or memory objects with high bandwidth utilization in the specific bandwidth domain.
If you select the QPI domain, you will be able to check whether the performance of your application is limited by the bandwidth of Intel QuickPath Interconnect (Intel QPI) links (inter-socket connections). Then, you may switch to the Bottom-up window and identify code and memory objects with NUMA issues.
Single-Package domains are displayed for the systems with two or more CPU packages and the histogram for them shows the distribution of the elapsed time per maximum bandwidth utilization among all packages. Use this data to identify situations where your application utilizes bandwidth only on a subset of CPU packages. In this case, the whole system bandwidth utilization represented by domains like DRAM may be low whereas the performance is in fact limited by bandwidth utilization.
Intel QPI bandwidth analysis is supported by the VTune Amplifier for Intel microarchitecture code name Ivy Bridge EP and later.
To learn bandwidth capabilities, refer to your system specifications or run appropriate benchmarks to measure them; for example, Intel Memory Latency Checker can provide maximum achievable DRAM and QPI bandwidth.
If you enabled the Analyze memory object configuration option for the Memory Access analysis, the Summary window in the Memory Usage viewpoint displays memory objects (variables, data structures, arrays) that introduced the highest latency to the execution of your application.
Memory objects identification is supported only for Linux targets and only for processors based on Intel microarchitecture code name Sandy Bridge and later.
Only metrics based on DLA-capable hardware events are applicable to the memory objects analysis. For example, the CPU Time metric is based on a non DLA-capable Clockticks event, so cannot be applied to memory objects. Examples of applicable metrics are Loads, Stores, LLC Miss Count, and Average Latency.
Clicking an object in the table opens the Bottom-up window with the grid data grouped by Memory Object/Function/Allocation Stack. The selected hotspot object is highlighted.
This section provides a list of tasks that took most of the time to execute, where tasks are either code regions marked with Task API, or system tasks enabled to monitor Ftrace* events, Atrace* events, Intel Media SDK programs, OpenCL™ kernels, and so on.
Clicking a task type in the table opens the grid view (for example, Bottom-up or Event Count) grouped by the Task Type granularity. See Interpreting Task Analysis Data for more information.
This histogram shows a distribution of loads per latency (in cycles).
This section provides the following data:
Application Command Line |
Path to the target application. |
Operating System |
Operating system used for the collection. |
Computer Name |
Name of the computer used for the collection. |
Result Size |
Size of the result collected by the VTune Amplifier. |
Collection start time |
Start time (in UTC format) of the external collection. Explore the Timeline pane to track the performance statistics provided by the custom collector over time. |
Collection stop time |
Stop time (in UTC format) of the external collection. Explore the Timeline pane to track the performance statistics provided by the custom collector over time. |
CPU Information |
|
Name |
Name of the processor used for the collection. |
Frequency |
Frequency of the processor used for the collection. |
Logical CPU Count |
Logical CPU count for the machine used for the collection. |
Physical Core Count |
Number of physical cores on the system. |
User Name |
User launching the data collection. This field is available if you enabled the per-user event-based sampling collection mode during the product installation. |
GPU Information |
|
Name |
Name of the Graphics installed on the system. |
Vendor |
GPU vendor. |
Driver |
Version of the graphics driver installed on the system. |
Stepping |
Microprocessor version. |
EU Count |
Number of execution units (EUs) in the Render and GPGPU engine. This data is Intel® HD Graphics and Intel® Iris™ Graphics (further: Intel Graphics) specific. |
Max EU Thread Count |
Maximum number of threads per execution unit. This data is Intel Graphics specific. |
Max Core Frequency |
Maximum frequency of the Graphics processor. This data is Intel Graphics specific. |
Graphics Performance Analysis |
GPU metrics collection is enabled on the hardware level. This data is Intel Graphics specific. NoteSome systems disable collection of extended metrics such as L3 misses, memory accesses, sampler busyness, SLM accesses, and others in the BIOS. On some systems you can set a BIOS option to enable this collection. The presence or absence of the option and its name are BIOS vendor specific. Look for the Intel® Graphics Performance Analyzers option (or similar) in your BIOS and set it to Enabled. |