Intel® C++ Compiler 16.0 User and Reference Guide

Offload Timers for Intel® Graphics Technology

This topic only applies to Intel® 64 and IA-32 architectures targeting Intel® Graphics Technology.

The timers maintained by the offload runtime are useful for breaking down the time of an offload session and to better focus your tuning efforts. For example, timers may show that further tuning of the compiled code will not help because the overhead of offloading is decisive.

You can set the runtime to print timing information at the end of execution by setting GFX_SHOW_TIME to 1. The runtime prints something similar to the following:

GFX performance timers with non-zero value (milllsecond,activation counter):
                   Offload Total = 3.94, 3
                 Device Creation = 42.10, 1
                 Kernel Creation = 0.09, 1
                Kernel Execution = 2.53, 1
      Kernel Execution on Device = 0.03, 1
                 Buffer Creation = 0.25, 4
              Buffer Destruction = 0.03, 3
                  Buffer Reading = 0.08, 1
                  Buffer Writing = 0.39, 4
       Iteration Space Splitting = 0.02, 1
                  Argument Setup = 0.62, 2
                     ELF Parsing = 0.12, 1
                 Program Loading = 14.31, 1

To disable timer printing, set GFX_SHOW_TIME to either an empty string or 0.

The following table describes the meaning of each timer:

Timer Name

Description

Device Creation

The runtime and device initialization time.

Offload Total

The total time spent for all offload sessions.

Program Loading

The total time spent to load all the kernels in the program, including JIT compilation time.

Kernel Creation

The total time spent for kernel creation. Does not include JIT compilation time.

Kernel Execution

The total time for all kernel executions. Measured from placing a kernel into a queue until receiving the completion signal.

Kernel Execution on Device

The total time for kernel executions measured by the Intel® Graphics Technology driver stack. This time is usually less than Kernel Execution time because it excludes event waiting and other overhead.

Buffer Creation, Buffer Destruction

Time spent for creation and destruction of the target's memory areas.

Buffer Reading, Buffer Writing

Time spent for copying data to and from the target's memory areas.