Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
To access this pane: Create a new analysis type.
To access this pane for already created custom analysis:
Click the New Analysis button on the Intel® VTune™ Amplifier toolbar.
The New Amplifier Result tab opens with the Analysis Type window active.
In the Analysis Type window, select the Custom Analysis > <Hardware Event-based Sampling Analysis Type> entry in the analysis tree pane.
The Custom Analysis pane opens in the static mode. To edit the configuration options, click the Edit button.
Analysis options displayed in this window depend on the target selected in the Analysis Target window.
Use this pane to configure a new custom analysis type based on hardware event-based sampling data collection.
Use This |
To Do This |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Analysis name field |
Enter/edit a name of this custom analysis type. |
||||||||||||||
Description |
Provide a short meaningful description on the analysis type you create. This information may help you easily identify the analysis type specifics later. |
||||||||||||||
Events table |
Specify events to collect information about.
|
||||||||||||||
Collect stacks check box |
Enable advanced collection of call stacks and thread context switches to analyze performance, parallelism, and power consumption per execution path. NoteFor Intel® Xeon Phi™ coprocessor analysis, the call stack collection is supported only for the Intel Xeon Phi coprocessor (native) target type. |
||||||||||||||
Stack size, in bytes field |
Specify the size of a raw stack (in bytes) to process. Zero value means unlimited size. Possible values are numbers between 0 and 2147483647. |
||||||||||||||
Stack type |
Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms. |
||||||||||||||
Estimate call counts check box |
Obtain statistical estimation of call counts based on the hardware events. |
||||||||||||||
Estimate trip counts check box |
Obtain statistical estimation of loop trip counts based on the hardware events. |
||||||||||||||
Chipset events field |
Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector. |
||||||||||||||
Analyze memory bandwidth check box |
Collect events required to compute memory bandwidth. |
||||||||||||||
Analyze PCIe bandwidth check box |
Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus. NoteThis analysis is possible only on the Intel microarchitecture code name Sandy Bridge EP and later. |
||||||||||||||
Analyze memory objects check box (for Linux* targets only) |
Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects. |
||||||||||||||
Minimal memory object size to track, in bytes spin box (for Linux targets only) |
Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation. |
||||||||||||||
Analyze user tasks, events, and counters check box |
Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size. |
||||||||||||||
Analyze OpenMP regions check box |
Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size. |
||||||||||||||
Analyze I/O waits check box |
Analyze the percentage of time each thread and CPU spends in I/O wait state. |
||||||||||||||
Collect I/O API data menu |
Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. |
||||||||||||||
Analyze system-wide context switches check box |
Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization). |
||||||||||||||
Analyze GPU Usage check box (for Linux targets available with Intel® HD Graphics and Intel® Iris™ Graphics only) |
Analyze GPU usage and frame rate to identify whether your application is GPU or CPU bound. NoteSelect the Collect stacks option to detect context switches and correlate CPU and GPU usage data. |
||||||||||||||
Analyze Processor Graphics events drop-down menu |
Analyze performance data from Intel® HD Graphics and Intel® Iris™ Graphics based on the predefined groups of GPU metrics. |
||||||||||||||
GPU sampling interval, us field |
Specify an interval (in microseconds) between GPU samples. |
||||||||||||||
Trace OpenCL and Intel Media SDK programs (Intel Graphics Driver only) check box |
Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics. NoteIntel Media SDK programs analysis is supported for Linux targets only. |
||||||||||||||
Capture transactional cycles check box |
Collect the events required to analyze transactional success on the Intel® processors supporting Intel Transactional Synchronization Extensions (Intel TSX). |
||||||||||||||
Collect precise clockticks check box |
Collect the event that emulates precise clockticks and could be useful, for example, to analyze hotspots in transactions. |
||||||||||||||
Evaluate max DRAM bandwidth check box |
Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds. |
||||||||||||||
Analyze loops check box |
Extend loop analysis to collect advanced loops information such as instruction set usage and display analysis results by loops and functions. If this option is enabled, the VTune Amplifier automatically applies the Loops and functions filtering mode to the data view in the grid and enables the Vector Instruction Set column that shows a vectorization instruction set used for a particular function, loop, and so on. |
||||||||||||||
Managed runtime type to analyze menu |
Choose a type of the managed runtime to analyze. Available options are:
|
||||||||||||||
Event mode drop-down list |
Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected. |
||||||||||||||
Collect context switches check box |
Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization). |
||||||||||||||
Use precise multiplexing check box |
Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low. |
||||||||||||||
Command line name field |
Enter/edit a name of the custom analysis type that will be used as an identifier when analyzing the project from the command line. Keep it short for your convenience. |
||||||||||||||
Analysis identifier field |
Specify a shorthand identifier to be appended to the name of each result produced by this analysis type. For example, adding the ge identifier for the General Exploration analysis result produces the following result name: r000ge, where 000 is the result number. |
||||||||||||||
VTune Amplifier for Systems only option: | |||||||||||||||
Select events for analysis field |
Use the Events Library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid. |
You may generate the command line for this configuration using the Command Line... button at the bottom.