What's New in Intel® VTune™ Amplifier XE

VTune Amplifier XE 2017 Update 2

Support for cross-OS analysis to all license types. Download installation packages for additional operating systems from registrationcenter.intel.com.
Support for Intel® Xeon Phi™ coprocessor targets codename Knights Landing
Support for the Intel® Atom™ processors codenamed Apollo Lake and Denverton, and the Intel processors codenamed KabyLake
Support for the mixed Python* and native code in the Locks and Waits analysis including call stack collection
HPC Performance Characterization analysis improvements:
- Increased detail and structure for vector efficiency metrics based on FLOP counters in the FPU Utilization section
- Improved MPI imbalance and parallel efficiency for a most awaited rank in the CPU Utilization section
- New section presenting the data on the hottest loops and functions with arithmetic operations, which enables you to identify which loops/functions with FPU Usage took the most CPU Time
- New metrics for the MPI analysis such as MPI Communication Cost, MPI Imbalance, Serial time on MPI critical rank, and OpenMP Potential Gain on MPI critical path rank
DRAM Bandwidth Bound metric based on uncore events used in the Memory Usage viewpoint for the Memory Access and HPC Performance Characterization analyses
GPU Hotspots Summary view extended to provide the Packet Queue Depth and Packet Duration histograms for the analysis of DMA packet execution
Support for performance analysis of a guest Linux* operating system via Kernel-based Virtual Machine (KVM) from a Linux host system with the KVM Guest OS option
Support for the Ubuntu* 16.10 and Fedora* 25

VTune Amplifier XE 2017 Update 1

Support for locator hardware event metrics for the General Exploration analysis results in the Source/Assembly view that enable you to filter the data by a metric of interest and identify performance-critical code lines/instructions
Support for hotspot navigation and filtering of stack sampling analysis data by the Total type of values in the Source/Assembly view
Summary view of the General Exploration analysis extended to explicitly display measure for the hardware metrics: Clockticks vs. Piepline Slots
Command line summary report for the HPC Performance Characterization analysis extended to show metrics for CPU, Memory and FPU performance aspects including performance issue descriptions for metrics that exceed the predefined threshold. To hide issue descriptions in the summary report, use a new report-knob show-issues option.
Support for the Average Latency metric in the Memory Access analysis based on the driverless collection
PREVIEW: New Full Compute event group added to the list of predefined GPU hardware event groups collected for Intel® HD Graphics and Intel Iris™ Graphics. This group combines metrics from the Overview and Compute Basic presets and allows to see all detected GPU stalled/idle issues in the same view.
GPU Hotspots analysis extended to detect hottest computing tasks bound by GPU L3 bandwidth

VTune Amplifier XE 2017

Support for Intel® Xeon Phi™ processor codenamed Knights Landing and Intel® Xeon® Processor E5 v4 Family (formerly codenamed Broadwell EP), including General Exploration, Memory Access (including high bandwidth analysis), and HPC Performance Characterization analysis
Disk Input and Output analysis that monitors utilization of the disk subsystem, CPU and PCIe buses, helps identify long latency of I/O requests and imbalance between I/O and compute operations. Use the Analyzing Input/Output Waits tutorial for a hands-on exercise with the sample code.
Memory Access analysis improvements:
- Automatic detection of maximum system DRAM bandwidth characteristics. This option helps understand how you utilize the available DRAM bandwidth.
- Support for custom memory allocators via Memory Allocation API that help correctly determine memory objects
- Identifying False Sharing tutorial providing a hands-on exercise for running Memory Access analysis for a sample application to identify and remove false sharing issues
HPC workloads profiling improvements:
- HPC Performance Characterization analysis that explores the following performance aspects of the application scalability: CPU utilization with parallel efficiency for MPI and OpenMP*, memory access efficiency and FPU utilization with basic vectorization metrics
- MPI analysis extended with the event-based sampling collection supported for multiple ranks per node with an arbitrary MPI launcher and natural syntax. Arbitrary targets command line configuration extended with MPI launcher options. You can now use the Copy Command Line to Clipboard dialog box to automatically generate a command line for MPI analysis from GUI.
- An option enabling/disabling the OpenMP* regions analysis added to selected analysis configurations
- An option controlling result finalization, -finalization-mode, that enables you to perform a full finalization on the target, defer or skip the finalization. The deferred finalization mode is especially useful on target platforms with a single-thread performance lower than on the host. In this mode, the VTune Amplifier calculates a binary checksum to match the binaries for finalization on the host machine.
- Analyzing an OpenMP and MPI Application web-based tutorial providing a hands-on exercise to identify memory utilization inefficiencies and load imbalance for a sample hybrid application
More languages support:
- Python* applications profiling with Basic Hotspots analysis running via the Launch Application or Attach to Process modes
- Go* applications profiling with hardware event-based analysis types
GPU analysis improvements:
- GPU Hotspots analysis targeted for GPU-bound applications and providing options to analyze execution of OpenCL™ kernels and Intel Media SDK tasks
- GPU analysis Summary introducing a set of metrics to estimate GPU utilization per engine, identify stalled or idle Execution Units and explore the most typical problems with low occupancy or frequent sampler accesses
- Navigation from the Hottest GPU computing tasks summary to the details provided in the Graphics tab
- Support for the Attach to Process target analysis for Intel Media SDK and OpenCL™ programs
- Detection of the OpenCL™ 2.0 Shared Virtual Memory (SVM) usage types per kernel instance
Usability improvements:
- Support for the Attach to Process target analysis with the event-based sampling for low privileged Java* daemons on Linux*
- Event selection mechanism for custom hardware event-based sampling analysis extended with filtering options
- Arbitrary target GUI configuration to generate a command line for performance analysis on a system that is not accessible from the current host
- UI improvements for the grid views and identification of performance issues
Intel Performance Snapshot (Preview) introducing the following tools as part of the VTune Amplifier:
- Application Performance Snapshot tool provides a quick look at your application performance and helps you understand whether your application will benefit from tuning. It identifies how effectively your application uses the hardware platform and displays basic performance enhancement opportunities.
- Storage Performance Snapshot tool analyzes your system's storage, CPU, memory, and network usage and displays basic performance enhancement opportunities for systems using Intel hardware.
Note

A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com.
Support for Fedora* 23 and 24, Ubuntu* 15.10 and 16.04
Support for Linux* kernel up to 4.4

What's New in Intel® VTune™ Amplifier XE

VTune Amplifier XE 2017 Update 2

VTune Amplifier XE 2017 Update 1

VTune Amplifier XE 2017

Note

See Also