Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Intel® VTune™ Amplifier provides several ways to view GPU Analysis data collected from the command line.
Example 1: Report per OpenCL Kernels
This example shows how to view the collected data per OpenCL kernels submitted and executed on Intel® HD Graphics and Intel® Iris™ Graphics using the gpu-computing-tasks report type:
$ amplxe-cl -report gpu-computing-tasks -result-dir r010ah
Computing Task (GPU) Global Size Local Size SIMD Width Total Time Average Time Instance Count
----------------------- ----------- ---------- ---------- ---------- ------------ --------------
BitonicSort 4194304 [Unknown] 16 3.435s 0.012s 276
BitonicSort 8388608 [Unknown] 16 0.330s 0.014s 24
clEnqueueMapBuffer [Unknown] [Unknown] [Unknown] 0.000s 0.000s 1
clEnqueueUnmapMemObject [Unknown] [Unknown] [Unknown] 0.000s 0.000s 1
Example 2: Report Grouped per OpenCL Kernels
This example filters and groups the collected data by OpenCL kernel instances:
$ amplxe-cl -report gpu-computing-tasks -result-dir r010ah -group-by=computing-instance
Computing Task (GPU) Instance Global Size Local Size SIMD Width Total Time Average Time Instance Count
----------------------- -------- ----------- ---------- ---------- ---------- ------------ --------------
BitonicSort 1 8388608 [Unknown] 16 0.045s 0.045s 1
BitonicSort 3 8388608 [Unknown] 16 0.017s 0.017s 1
BitonicSort 2 4194304 [Unknown] 16 0.016s 0.016s 1
BitonicSort 54 4194304 [Unknown] 16 0.015s 0.015s 1
BitonicSort 152 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 104 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 103 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 296 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 5 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 248 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 201 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 202 4194304 [Unknown] 16 0.014s 0.014s 1
BitonicSort 249 4194304 [Unknown] 16 0.014s 0.014s 1
...