Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

HPC Performance Characterization Analysis

HPC Performance Characterization analysis helps identify how effectively your compute-intensive application uses CPU, memory, and floating-point operation hardware resources. The HPC Performance Characterization analysis type can be used as a starting point for understanding the performance aspects of your application. Additional scalability metrics are available for applications that use Intel OpenMP* or Intel MPI runtime libraries. The analysis can be run from within the VTune Amplifier GUI or from the command line.

During HPC Performance Characterization analysis, the Intel® VTune™ Amplifier data collector profiles your application using event-based sampling collection. OpenMP analysis metrics for Intel OpenMP runtime library are based on User API instrumentation enabled in the runtime library.

Typically the collector will gather data for a specified application, but it can collect system-wide performance data with limited detail if required.

Note

FPU and GFLOPS metrics are supported on 3rd Generation Intel Core™ processors, 5th Generation Intel processors, and 6th Generation Intel processors. Limited support is available for Intel® Xeon Phi™ processors formerly code named Knights Landing. The metrics are not currently available on 4th Generation Intel processors. Expand the Details section on the analysis configuration pane to view the processor family available on your system.

To use the HPC Performance Characterization analysis, explore:

Configuration Options

To configure options for the HPC Performance Characterization analysis:

  1. Click the New Analysis button on the Intel® VTune™ Amplifier toolbar.

    The New Amplifier Result tab opens with the Analysis Type tab active.

  2. Select the Compute-Intensive Application Analysis > HPC Performance Characterization analysis type from the analysis tree on the left pane.

    The HPC Performance Characterization pane opens on the right.

  3. Configure the following options:

CPU sampling interval, ms field

Specify an interval (in milliseconds) between CPU samples.

Possible values - 0.01-1000.

The default value is 1.

Collect stacks check box

Enable advanced collection of call stacks and thread context switches.

The default value is false.

Analyze memory bandwidth check box

Collect the data required to compute memory bandwidth.

The default value is true.

Evaluate max DRAM bandwidth check box

Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

The default value is true.

Analyze OpenMP regions check box

Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations.

The default value is true.

Details button

Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify these settings for the analysis, you need to create a custom configuration by right-clicking the analysis entry in the analysis tree and selecting Copy from Current from the context menu. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis branch in the analysis tree.

Note

You may generate the command line for this configuration using the Command Line... button at the bottom.

Viewpoints

You can choose to view HPC Performance Characterization analysis results in any of the following viewpoints:

Viewpoint

Description

HPC Performance Characterization

Helps understand how effectively your application uses CPU, memory, and floating-point operation resources. Use this view to identify scalability issues for Intel OpenMP and MPI runtimes as well as next steps to increase memory and FPU efficiency.

Hardware Events

Displays statistics of monitored hardware events: estimated count and/or the number of samples collected. Use this view to identify code regions (modules, functions, code lines, and so on) with the highest activity for an event of interest.

Hardware Issues

Helps identify where the application is not making the best use of available hardware resources. This viewpoint displays metrics derived from hardware performance counters. Hover over the highlighted metrics values in the grid to read why the extreme value might represent a performance problem.

Hotspots

Helps identify hotspots - code regions in the application that consume a lot of CPU time.

Memory Usage

Helps understand how effectively your application uses memory resources and identify potential memory access related issues like excessive access to remote memory on NUMA platforms, hitting DRAM or Intel® QuickPath Interconnect (Intel QPI) bandwidth limit, and others. It provides various performance metrics for both the application code and memory objects arrays.

General Exploration

Helps identify where the application is not making the best use of available hardware resources. This viewpoint displays metrics derived from hardware events. The Summary window reports the overall metrics for the entire execution along with explanations of the metrics. From the Bottom-up and Top-down Tree windows you can locate the hardware issues in your application. Cells are highlighted when potential opportunities to improve performance are detected. Hover over the highlighted metrics in the grid to see explanations of the issues.

Each viewpoint consists of the following windows/panes:

What's Next

Use the HPC Performance Characterization viewpoint to review the following:

Use the Analyzing an OpenMP* and MPI Application tutorial to review basic steps for tuning a hybrid application. The tutorial is available from the Intel Developer Zone at https://software.intel.com/en-us/itac-vtune-mpi-openmp-tutorial-lin.

See Also