Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
The Microarchitecture Analysis branch introduces analysis types based on the event-based sampling data collection:
General Exploration: Event-based analysis that helps identify the most significant hardware issues affecting the performance of your application. Consider this analysis type as a starting point when you do hardware-level analysis.
Memory Access: Event-based analysis that measures a set of metrics to identify memory access related issues (for example, specific to NUMA architectures).
SGX Hotspots: Event-based analysis that helps identify performance-critical program units inside security enclaves on systems with Intel Software Guard Extensions (Intel SGX) feature enabled.
TSX Exploration: Event-based sampling analysis that is targeted for Intel processors supporting Intel Transactional Synchronization Extensions (Intel TSX). It helps analyze Intel TSX usage and causes of transactional aborts.
TSX Hotspots: Event-based sampling analysis that is targeted for Intel processors supporting Intel TSX. It helps analyze hotspots inside transactions.
Prerequisites:
It is recommended to install the sampling driver for hardware event-based sampling collection types. For Linux* and Android* targets, if the sampling driver is not installed, VTune Amplifier can work on Perf* (driverless collection). Be aware of the following configuration settings for Linux target systems:
To enable system-wide and uncore event collection that allows the measurement of DRAM and MCDRAM memory bandwidth that is a part of the Memory Access analysis type, use root or sudo to set /proc/sys/kernel/perf_event_paranoid to 0.
>echo 0>/proc/sys/kernel/perf_event_paranoid
To enable collection with the General Exploration analysis type, increase the default limit of opened file descriptors. Use root or sudo to increase the default value in /etc/security/limits.conf to 100*<number_of_logical_CPU_cores>.
<user> hard nofile <100 * number_of_logic_CPU_cores>
<user> soft nofile <100 * number_of_logic_CPU_cores>