Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Advanced Hotspots analysis is a fast and easy way to identify performance-critical code sections (hotspots). The periodic instruction pointer sampling performed by Intel® VTune™ Amplifier identifies code locations where an application spends more time than in others. A function may consume much time either because its code is slow or because the function is frequently called. But any improvements in the speed of such functions should have a bigger impact on overall application performance.
VTune Amplifier creates a list of functions in your application ordered by the amount of time spent in each function. By default, Advanced Hotspots analysis does not capture the function call stacks as the hotspots are collected, but it can be used to sample all processes on the system. This type of analysis uses event-based sampling collection and analyzes all the processes running on your system at the moment, providing CPU time data on whole system performance.
You still can analyze stacks for your application modules by selecting the collection level that includes stack analysis in the Advanced Hotspots pane. For example, selecting the Hotspots, call counts and stacks collection level extends the Advanced Hotspots analysis with performance, parallelism and power consumption data attributed to execution paths.
To use the Advanced Hotspots analysis, explore:
Configuration options (knobs)
To configure options for the Advanced Hotspots analysis:
Click the New Analysis button on the Intel® VTune™ Amplifier toolbar.
The New Amplifier Result tab opens with the Analysis Type window active.
From the analysis tree on the left pane, select Algorithm Analysis > Advanced Hotspots.
The analysis configuration pane opens on the right.
Configure the following options:
CPU sampling interval, ms field |
Specify an interval (in milliseconds) between CPU samples. Possible values - 0.01-1000. The default value is 1. |
Collection Level options |
Select a level of details provided with event-based sampling collection. Detailed collection levels cause higher overhead.
The default value is Hotspots. |
Event mode drop-down menu |
Limit event-based sampling collection to OS or USER mode.
The default value is All. |
Analyze user tasks, events, and counters check box |
Analyze the tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size. The default value is false. |
Analyze OpenMP regions check box |
Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations. The default value is false. |
Details button |
Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify these settings for the analysis, you need to create a custom configuration by right-clicking the analysis entry in the analysis tree and selecting Copy from Current from the context menu. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis branch in the analysis tree. |
You may generate the command line for this configuration using the Command Line... button at the bottom.
You can choose to view Advanced Hotspots analysis results in any of the following viewpoints:
Viewpoint |
Description |
---|---|
Hardware Events |
Displays statistics of monitored hardware events: estimated count and/or the number of samples collected. Use this view to identify code regions (modules, functions, code lines, and so on) with the highest activity for an event of interest. |
Hardware Issues |
Helps identify where the application is not making the best use of available hardware resources. This viewpoint displays metrics derived from hardware performance counters. Hover over the highlighted metrics values in the grid to read why the extreme value might represent a performance problem. |
Hotspots |
Helps identify hotspots - code regions in the application that consume a lot of CPU time. |
Memory Usage |
Helps understand how effectively your application uses memory resources and identify potential memory access related issues like excessive access to remote memory on NUMA platforms, hitting DRAM or Intel® QuickPath Interconnect (Intel QPI) bandwidth limit, and others. It provides various performance metrics for both the application code and memory objects arrays. |
These viewpoints may include the following windows:
Summary window displays statistics on the overall application execution.
Events Count window displays the event count for all processor events selected for the analysis. This view provides an estimated number of times an event occurred during the collection.
Sample Count window displays the sample count for all collected processor events. This view provides the actual number of samples collected for an event.
Uncore Event Count window displays counts of uncore events selected for the analysis. If there are no uncore events, the upper pane of the window is empty.
Caller/Callee window displays parent and child functions of the selected focus function. This window is available only if stack collection was enabled during analysis configuration.
Top-down Tree window displays hotspot functions in the call tree, performance metrics for a function only (Self value) and for a function and its children together (Total value).
Platform window provides details on CPU and GPU utilization, frame rate, memory bandwidth, and user tasks (if corresponding metrics are collected).
You can go from the hotspots to the source code. View the source code containing the hotspots and modify your code to remove bottlenecks and improve the performance of your application.
Information provided by Advanced Hotspots analysis is important for tuning serial applications and it is still useful for tuning the serial sections of parallel applications. For algorithm tuning, you may also choose to run the Basic Hotspots analysis and analyze the call flow of the application or run the Concurrency analysis to estimate the effectiveness of the parallel algorithms you use. For Intel® Xeon Phi™ coprocessor analysis, you may run the General Exploration analysis with additional metrics that help triage hardware issues in programs running on the coprocessor.