Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

Custom Analysis - New Hardware Event-based Sampling Analysis

To access this pane: Create a new analysis type.

To access this pane for already created custom analysis:

  1. Click the New Analysis button on the Intel® VTune™ Amplifier toolbar.

    The New Amplifier Result tab opens with the Analysis Type window active.

  2. In the Analysis Type window, select the Custom Analysis > <Hardware Event-based Sampling Analysis Type> entry in the analysis tree pane.

    The Custom Analysis pane opens in the static mode. To edit the configuration options, click the Edit button.

Note

Analysis options displayed in this window depend on the target selected in the Analysis Target window.

Use this pane to configure a new custom analysis type based on hardware event-based sampling data collection.

Use This

To Do This

Analysis name field

Enter/edit a name of this custom analysis type.

Description

Provide a short meaningful description on the analysis type you create. This information may help you easily identify the analysis type specifics later.

Events table

Specify events to collect information about.

Use This

To Do This

Event Name

View a name of the event to monitor.

Sample After

View/modify the number of events after which the VTune Amplifier interrupts the event data collection. The Sample After value depends on the target duration. Based on the duration value, the VTune Amplifier adjusts the Sample After value with a multiplier.

Note

To change the Sample After value, you can directly edit the required cell.

Event Description

Brief information on the event. For more details, see the Reference for Processor Events.

Add Event button

Add a new event to collect.

Remove Event button

Remove the selected event from the table of events to collect.

LBR Filter

Click a row to select a filter for an event and enable the collection of filtered Last branch records (LBRs).

Collect stacks check box

Enable advanced collection of call stacks and thread context switches to analyze performance, parallelism, and power consumption per execution path.

Note

For Intel® Xeon Phi™ coprocessor analysis, the call stack collection is supported only for the Intel Xeon Phi coprocessor (native) target type.

Stack size, in bytes field

Specify the size of a raw stack (in bytes) to process. Zero value means unlimited size. Possible values are numbers between 0 and 2147483647.

Stack type

Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms.

Estimate call counts check box

Obtain statistical estimation of call counts based on the hardware events.

Estimate trip counts check box

Obtain statistical estimation of loop trip counts based on the hardware events.

Chipset events field

Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector.

Analyze memory bandwidth check box

Collect events required to compute memory bandwidth.

Analyze PCIe bandwidth check box

Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus.

Note

This analysis is possible only on the Intel microarchitecture code name Sandy Bridge EP and later.

Analyze memory objects check box (for Linux* targets only)

Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects.

Minimal memory object size to track, in bytes spin box (for Linux targets only)

Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation.

Analyze user tasks, events, and counters check box

Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.

Analyze OpenMP regions check box

Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size.

Analyze I/O waits check box

Analyze the percentage of time each thread and CPU spends in I/O wait state.

Collect I/O API data menu

Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.

Analyze system-wide context switches check box

Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization).

Analyze GPU Usage check box (for Linux targets available with Intel® HD Graphics and Intel® Iris™ Graphics only)

Analyze GPU usage and frame rate to identify whether your application is GPU or CPU bound.

Note

Select the Collect stacks option to detect context switches and correlate CPU and GPU usage data.

Analyze Processor Graphics events drop-down menu

Analyze performance data from Intel® HD Graphics and Intel® Iris™ Graphics based on the predefined groups of GPU metrics.

GPU sampling interval, us field

Specify an interval (in microseconds) between GPU samples.

Trace OpenCL and Intel Media SDK programs (Intel Graphics Driver only) check box

Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics.

Note

Intel Media SDK programs analysis is supported for Linux targets only.

Capture transactional cycles check box

Collect the events required to analyze transactional success on the Intel® processors supporting Intel Transactional Synchronization Extensions (Intel TSX).

Collect precise clockticks check box

Collect the event that emulates precise clockticks and could be useful, for example, to analyze hotspots in transactions.

Evaluate max DRAM bandwidth check box

Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

Analyze loops check box

Extend loop analysis to collect advanced loops information such as instruction set usage and display analysis results by loops and functions. If this option is enabled, the VTune Amplifier automatically applies the Loops and functions filtering mode to the data view in the grid and enables the Vector Instruction Set column that shows a vectorization instruction set used for a particular function, loop, and so on.

Managed runtime type to analyze menu

Choose a type of the managed runtime to analyze. Available options are:

  • for Windows targets: combined Java* and .NET* analysis

  • for Linux targets: Java only analysis

Event mode drop-down list

Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected.

Collect context switches check box

Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization).

Use precise multiplexing check box

Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low.

Command line name field

Enter/edit a name of the custom analysis type that will be used as an identifier when analyzing the project from the command line. Keep it short for your convenience.

Analysis identifier field

Specify a shorthand identifier to be appended to the name of each result produced by this analysis type. For example, adding the ge identifier for the General Exploration analysis result produces the following result name: r000ge, where 000 is the result number.

VTune Amplifier for Systems only option:

Select events for analysis field

Use the Events Library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid.

Note

You may generate the command line for this configuration using the Command Line... button at the bottom.

See Also