Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

Window: Platform

To access this window: Click the Platform sub-tab in the result tab.

Depending on the metrics collected during the analysis, use the Platform window to:

The Platform window represents a distribution of the performance data over time. For example:

Platform Window

Frame Rate. Identify bounds for GPU . GPU Frame X is the time range between the moment frame X-1 is rendered on the screen and the moment frame X is rendered on the screen.

Hover over a frame object to view a summary including data on frame duration, frame rate, and others:

GPU Engine. Explore overall GPU usage per GPU engine at each moment of time. By default, the Platform window displays GPU Usage and software queues per GPU engine. Hover over an object executed on the GPU (in yellow) to view a short summary on GPU usage, where GPU Usage is the time when a GPU engine was executing a workload. You can explore the top GPU Usage band in the chart to estimate the percentage of GPU engine utilization (yellow areas vs. white spaces) and options to submit additional work to the hardware.

To view and analyze GPU software queues, select an object (packet) in the queue and the VTune Amplifier highlights the corresponding software queue bounds:

Full software queue prevents packet submissions and causes waits on a CPU side in the user-mode driver until there is space in the queue. To check whether such a stall decreases your performance, you may decrease a workload on the hardware and switch to the Graphics window to see if there are less waits on the CPU in threads that spawn packets. Another option could be to additionally load the queue by tasks and see whether the queue length increases.

Each packet in the Platform window has its own ID that helps track its life cycle in a software queue. The ID does not correspond to the rendered frames. You may identify where a packet came from by the thread name (corresponding to the name of the module where a thread entry point resides) specified in the tooltip.

Horizontal hatching is used for data that may be not accurate due to collection issues (for example, missing event from the Intel® Graphics Driver). This type of data is identified as Reconstructed packets in the Legend.

Computing Queue. Analyze details on OpenCL™ kernels submission, in particular distinguish the order of submission and execution, and identify the time spent in the queue, zoom in and explore the Computing Queue data. VTune Amplifier displays kernels with the same name and global/local size in the same color.

You can click a kernel task to highlight the whole queue to the execution displayed at the top layer. Hover over an object in the queue to see kernel execution parameters.

Thread. Explore CPU utilization by thread. The Platform window displays the thread name as a name of the module where the thread function resides. For example, if you have a myFoo function that belongs to MyMegaFoo function, the thread name is displayed as MyMegaFoo. This approach helps easily identify the location of the thread code producing the work displayed on the timeline.

If your code used the Task API to mark the tasks regions or you enabled any system tasks for monitoring specific events, the task objects show up on the timeline and you ca hover over such an object for details:

GPU Metrics. Correlate the data on GPU activity per GPU metrics with the CPU usage data. The GPU Usage bars are colored according to the type of used GPU engine.

To analyze CPU and GPU usage per thread, switch to the Graphics window.

Note

For Linux* targets, to analyze Intel HD Graphics and Intel Iris™ Graphics hardware events on a GPU, make sure to install the Intel Media Server Studio (starting with version 2015 R5) and build the kernel driver as described in the Getting Started Guide.

Core Frequency. Explore the ratio between the actual and the nominal CPU frequencies. Values above 1.0 indicate that CPU is operating in a turbo boost mode.

Note

This data is available only for the hardware event-based sampling analysis results.

 

DRAM Bandwidth. Explore the application performance per Uncore to DRAM Bandwidth metrics over time.

Note

This data is available only for the hardware event-based sampling analysis results with the bandwidth events collection enabled.

 

Interrupt. Identify the intervals where system interrupts occurred. Hover over an interrupt object to view full details in the tooltip.

Note

This type of data shows up for the VTune Amplifier for Systems custom data collection results if you enabled the corresponding Ftrace events collection during the analysis type configuration.

 

Explore the Context Summary provided to the right of the Timeline pane in the GPU Hotspots viewpoint. It displays the summary statistics for the context selected on the timeline. By default, the Context Summary shows data for the whole run. To narrow down the analysis, select an area of interest on the timeline, right-click and select Filter In by Selection.

The GPU Usage section shows the GPU Time per GPU engine and the percentage of the Elapsed Time executing on the GPU. The flagged GPU Usage value signals an ineffective utilization of the GPU resources, which is usually caused by imbalance or thread scheduling problems.

See Also