Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Concurrency analysis based on the user-mode sampling and tracing collection helps identify hotspot functions where processor utilization is poor. When cores are idle at a hotspot, you have an opportunity to improve performance by getting those cores working for you.
Concurrency analysis provides information on how many threads were running at each moment during application execution. It includes threads which are currently running or ready to run and therefore are not waiting at a defined waiting or blocking API. VTune Amplifier also shows CPU time while the hotspot was executing. A red bar indicates time where the processors are poorly utilized, a possible lead to help you decide where you should tune.
To use the Concurrency analysis, explore:
Configuration options (knobs)
To configure options for the Concurrency analysis:
Click the New Analysis button on the Intel® VTune™ Amplifier toolbar.
The New Amplifier Result tab opens with the Analysis Type window active.
Select the Algorithm Analysis > Concurrency analysis type from the analysis tree on the left pane.
The Concurrency pane opens on the right.
Configure the following options:
CPU sampling interval, ms spin box |
Specify an interval (in milliseconds) between CPU samples. Possible values - 1-1000. The default value is 10. |
Analyze user tasks, events, and counters check box |
Analyze the tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size. The default value is false. |
Analyze Intel runtimes and user synchronization check box |
Analyze thread synchronization by profiling User synchronization API used by Intel runtimes like OpenMP and Intel TBB or by user. This option causes higher overhead and increases result size. The default value is false. |
Analyze OpenMP regions check box |
Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations. The default value is false. |
Details button |
Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify these settings for the analysis, you need to create a custom configuration by right-clicking the analysis entry in the analysis tree and selecting Copy from Current from the context menu. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis branch in the analysis tree. |
You may generate the command line for this configuration using the Command Line... button at the bottom.
You can choose to view Concurrency analysis results in any of the following viewpoints:
Viewpoint |
Description |
---|---|
Hotspots |
Helps identify hotspots - code regions in the application that consume a lot of CPU time. |
Hotspots by CPU Usage |
Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into CPU usage states: idle, poor, fair, and good. |
Hotspots by Thread Concurrency |
Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into thread concurrency states: idle, poor, fair, good, and over. |
Locks and Waits |
Shows how your application is utilizing available CPU cores and helps identify the cause of ineffective utilization, for example: threads waiting too long on synchronization objects (locks), I/O, or timers while CPU cores are underutilized. CPU time is represented by bars colored according to the CPU utilization level during the wait. |
By default, the VTune Amplifier displays the results of Concurrency analysis in the Hotspots by Thread Concurrency viewpoint where:
Summary window displays statistics on the overall application execution, identifying CPU time and processor utilization.
Bottom-up window displays hotspot functions in the bottom-up tree, CPU time and CPU utilization per function.
Top-down Tree window displays hotspot functions in the call tree, performance metrics for a function only (Self value) and for a function and its children together (Total value).
Caller/Callee window displays parent and child functions of the selected focus function.
Platform window provides details on CPU utilization, frame rate, memory bandwidth, and user tasks (if corresponding metrics are collected).
If you have a hotspot where not all cores are used, you should consider adding parallelism, re-balancing or reducing contention.
To understand possible reasons for the ineffective processor utilization, run Locks and Waits analysis.