Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

GPU Hotspots Analysis

GPU Hotspots analysis is intended for applications that use a Graphics Processing Unit (GPU) for rendering, video processing, and computations with explicit support of Intel® Media SDK and OpenCL™ software technology.

Use this analysis type to identify GPU tasks with high GPU utilization and estimate the effectiveness of this utilization. The tool infrastructure automatically aligns clocks across all cores in the entire system so that you can analyze some CPU-based workloads together with GPU-based workloads within a unified time domain.

Prerequisites: For Linux* targets, to analyze Intel HD and Intel Iris Graphics (further: Intel Graphics) hardware events on a GPU, make sure to install the Intel Media Server Studio (starting with version 2015 R5) and build the kernel driver as described in the Getting Started Guide.

Use the GPU Hotspots analysis to:

To run the GPU Hotspots analysis, explore:

Configuration Options

To view configuration options for the GPU Hotspots analysis:

  1. Click the New Analysis toolbar button.

    The Analysis Type window opens.

  2. From the left pane, select Platform Analysis > GPU Hotspots.

    The GPU Hotspots configuration pane opens on the right displaying editable and predefined collection options for this analysis.

    The GPU Hotspots analysis is pre-configured to collect GPU usage data, analyze GPU task scheduling and identify whether your application is CPU or GPU bound.

  3. Configure the following GPU analysis options:

    Option

    Description

    Supported Target System

    Supported Graphics

    GPU sampling internal, ms field

    Specify an interval between GPU samples.

    All

    All

    Analyze Processor Graphics hardware events

    Monitor the Render and GPGPU engine usage, identify which parts of the engine are loaded, and correlate GPU and CPU data.

    VTune Amplifier provides platform-specific presets of the hardware metrics. All presets collect data about execution units (EUs) activity: EU Array Active, EU Array Stalled, EU Array Idle, Computing Threads Started, and Core Frequency.

    • Overview event set also includes metrics that track general GPU memory accesses such as Memory Read/Write Bandwidth, GPU L3 Misses, Sampler Busy, Sampler Is Bottleneck, and GPU Memory Texture Read Bandwidth. These metrics can be useful for both graphics and compute-intensive applications.

    • Compute Basic (with global/local memory accesses) event group also includes metrics that distinguish accessing different types of data on a GPU: Untyped Memory Read/Write Bandwidth, Typed Memory Read/Write Transactions, SLM Read/Write Bandwidth, Render/GPGPU Command Streamer Loaded, and GPU EU Array Usage. These metrics are useful for compute-intensive workloads on the GPU.

    • Compute Extended event group includes metrics targeted only for GPU analysis on the Intel processor code name Broadwell and higher. For other systems, this preset is not available.

    • Full Compute (preview) event group is a combination of the Overview and Compute Basic event sets.

      Note

      This is a PREVIEW FEATURE. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com for VTune Amplifier XE andto intelsystemstudio@intel.com for VTune Amplifier for Systems.

    Windows*, Linux* (see the prerequisites above) and Android*

    Intel® HD Graphics and Intel® Iris™ Graphics only (further: Intel Graphics)

    Trace OpenCL and Intel Media SDK programs

    Explore execution time for runtimes, monitor performance of each program per GPU metrics and identify hotspots.

    OpenCL kernels analysis: Windows and Linux

    Intel Media SDK program analysis: Linux

    Intel Graphics only

  4. Select the Collect stacks option to analyze performance and parallelism per execution path.

To run the GPU Hotspots analysis from the command line, enter:

$ amplxe-cl -collect gpu-hotspots [-knob <knob_name=knob_option>] -- <target> [target_options]

Note

Viewpoints

VTune Amplifier runs the analysis and opens the data in the GPU Hotspots viewpoint providing various platform data in the following windows:

See Also