Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Similar to the Summary window, available in GUI, the summary report provides overall performance data of your target. Intel® VTune™ Amplifier automatically generates the summary report when data collection completes. To disable this report, use the no-summary option in your command when performing a collect or collect-with action.
Use the following syntax to generate the Summary report from a preexisting result:
$ amplxe-cl -report summary -result-dir <result_path>
The summary report output depends on the collection type:
For User-Mode Sampling and Tracing Collection results, the summary report includes the following sections:
Collection and Platform Information
CPU Information
Summary per basic analysis metrics
Examples
The following example generates the summary report for the r000hs Basic Hotspots analysis result.
$ amplxe-cl -report summary -r r000hs
Collection and Platform Info
----------------------------
Parameter r000hs
------------------------ -------------------------------------------------------------
Application Command Line /home/tachyon/find_hotspots
Operating System Ubuntu 11.04
Computer Name My Computer
Result Size 2926817
Collection start time 11:17:06 13/10/2015 UTC
Collection stop time 11:17:20 13/10/2015 UTC
CPU
---
Parameter r000hs
---------------------- -------------------------------------------------
Name 4th generation Intel® Core™ Processor family
Frequency 2494226458
Logical CPU Core Count 4
Summary
-------
Elapsed Time: 14.094
CPU Time: 11.319
Average CPU Usage: 0.749
amplxe: Executing actions 100 % done
This example generates a summary report for the Locks and Waits analysis result r003lw. The summary portion of the report shows that the multithreaded target spent 64 seconds waiting, with an average concurrency of only 1.073. To identify the cause of the wait, view the result in the GUI performance pane, or generate a performance report.
$ amplxe-cl -report summary -r r003lw
Summary
-------
Average Concurrency: 1.073
Elapsed Time: 13.911
CPU Time: 11.031
Wait Time: 64.468
Average CPU Usage: 0.768
For Hardware Event-based Sampling Collection results, the summary report includes the following information (if available):
For HPC Performance Characterization analysis, the command-line summary report provides an issue description for metrics that exceed the predefined threshold. If you want to skip issues in the summary report, do one of the following:
Examples
This example generates the summary report for the r001ah Advanced Hotspots analysis result.
$ amplxe-cl -report summary -r r001ah
Collection and Platform Info
----------------------------
Parameter r001ah
------------------------ --------------------------------------------------------------------------------------------
Application Command Line /home/tachyon/find_hotspots
Operating System Ubuntu 11.04
Computer Name My Computer
Result Size 37188680
Collection start time 09:59:01 11/09/2015 UTC
Collection stop time 09:59:28 11/09/2015 UTC
CPU
---
Parameter 001ah
---------------------- -------------------------------------------------
Name 4th generation Intel® Core™ Processor family
Frequency 2494232562
Logical CPU Core Count 4
Summary
-------
Elapsed Time: 26.785
CPU Time: 16.394
Average CPU Usage: 0.610
CPI Rate: 0.413
Event summary
-------------
Hardware Event Type Hardware Event Count:Self Hardware Event Sample Count:Self Events Per Sample
------------------------ ------------------------- -------------------------------- -----------------
INST_RETIRED.ANY 110633200000 58228 1900000
CPU_CLK_UNHALTED.THREAD 45653200000 24028 1900000
CPU_CLK_UNHALTED.REF_TSC 40889900000 21521 1900000
Use the Elapsed Time metric as your performance baseline to estimate your optimizations. The CPU Usage metric is the total CPU time divided by the Elapsed time, which demonstrates an average value of CPU utilization. For example, 'Average CPU Usage: 5.907' means near 6 cores were running on the average for your program overall. If you have, for example, 8 core system, there is some potential to parallelize the code.
This command generates the summary report for the HPC Performance Characterization analysis result and skips issue descriptions:
$ amplxe-cl -report summary -r r001hpc -report-knob show-issues=false
Elapsed Time: 23.182s
GFLOPS: 14.748
CPU Utilization: 58.0%
Average CPU Usage: 13.920 Out of 24 logical CPUs
Serial Time: 0.069s (0.3%)
Parallel Region Time: 23.113s (99.7%)
Estimated Ideal Time: 14.010s (60.4%)
OpenMP Potential Gain: 9.103s (39.3%)
Memory Bound: 0.446
Cache Bound: 0.175
DRAM Bound: 0.216
NUMA: % of Remote Accesses: 38.3%
FPU Utilization: 2.7%
GFLOPS: 14.748
Scalar GFLOPS: 4.801
Packed GFLOPS: 9.947
Collection and Platform Info
Application Command Line: ./sp.B.x
User Name: vtune
Operating System: 3.10.0-327.el7.x86_64 NAME="Red Hat Enterprise Linux Server" VERSION="7.2 (Maipo)" ID="rhel" ID_LIKE="fedora" VERSION_ID="7.2" P
RETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.2:GA:server" HOME_URL="https://w
ww.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.
2 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.2"
Computer Name: nntvtune235
Result Size: 1 GB
Collection start time: 19:04:30 13/07/2016 UTC
Collection stop time: 19:04:53 13/07/2016 UTC
Name: Intel® Xeon® E5/E7 v2 Processor code named Ivytown
Frequency: 2.694 GHz
Logical CPU Count: 24
CPU
Name: Intel® Xeon® E5/E7 v2 Processor code named Ivytown
Frequency: 2.694 GHz
Logical CPU Count: 24