Intel® C++ Compiler 16.0 User and Reference Guide
This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
You can measure the amount of time it takes to execute an offload region of code, as well as the amount of data transferred during the execution of the offload region.
You can print an offload report, which contains information about an offload as the execution proceeds on the host and on the target. An offload report includes the following information:
the amount of time it takes to execute an offload region of code
the amount of data transferred between the host and the target
additional details, including device initialization, and individual variable transfers
The following mechanisms enable and disable offload reporting:
the OFFLOAD_REPORT environment variable.
the _Offload_report API.
A compiler offload report line starts with [Offload] to clearly mark prints from compiler offloads, as opposed to other offloads, such as those from the Intel® Math Kernel Library.
Activities on the host are marked with [HOST], while activities on the target are marked with [MIC n] where n is the logical number of the coprocessor to which the offload is sent. The top of the report shows the mapping of logical devices to physical devices. (Note that
An offloaded program can use a subset of physical devices when you specify the OFFLOAD_DEVICES environment variable.
Because multiple offloads may be in progress concurrently, either when multiple host threads initiate offloads or when asynchronous offloads are used, it is necessary to tag all the output associated with a specific offload pragma. Otherwise the reports from several concurrent offloads would be interleaved, making it impossible to determine to which offload a particular line of output belongs. A tag of the form [Tag n] uniquely identify lines in the offload report that belong to a particular offload.
For each offload, the first two report lines are the source file name and the line number of the offload pragma. After that, a line that assigns a Tag to that offload is printed. Subsequent report lines printed for that offload each use the tag [Tag n] to associate that line with that offload.
The rest of the report contains a line for each major activity. These lines contain an annotation of the activity after the tag that identifies which offload the activity belongs to. The annotations are as follows:
Line Marker | Descrption |
---|---|
[State] |
Activity being performed as part of the offload. |
[Var] |
The name of a variable transferred and the direction(s) of transfer. |
[CPU Time] |
The total time measured for that offload pragma on the host. |
[MIC Time] |
The total time measured for executing the offload on the target. This excludes the data transfer time between the host and the target, and counts only the execution time on the target. |
[CPU->MIC Data] |
The number of bytes of data transferred from the host to the target. |
[MIC->CPU Data] |
The number of bytes of data transferred from the target to the host. |
The various activities printed after [State] describe the internal operation of the Offload Library and are helpful in diagnosing the point at which a runtime failure may occur. In most cases the description is self-explanatory.
For this example, example.c, the offload report output is explained below.
1 int Hysum(int * abc, int * efg, int siz) 2 { 3 int sumT = 0; 4 int k; 5 6 #pragma offload target(mic:0) \ 7 in(abc : length(siz)) \ 8 out(efg : length(siz/2)) \ 9 nocopy(k) 10 { 11 if (_Offload_get_device_number() > -1) { 12 printf("On device : %d\n",_Offload_get_device_number()); 13 fflush(0); 14 } 15 16 sumT = 0; 17 for (k=0; k < (siz/2) ;k++) { 18 efg[k] = abc[k] + abc[k + (siz/2)]; 19 sumT += efg[k]; 20 } 21 } 22 return sumT; 23 } 24 25 int main() 26 { 27 int j = 10; 28 int i = 0; 29 int n; 30 int *tuv, *xyz; 31 32 tuv = _mm_malloc(j * sizeof(int), 64); 33 xyz = _mm_malloc( (j/2) * sizeof(int), 64); 34 35 for (n=0; n < j; n++) { 36 tuv[n] = n; 37 } 38 xyz[0:(j/2)] = 0; 39 40 i = Hysum(tuv, xyz, j); 41 42 for (n=0; n < (j/2); n++) { 43 printf(" xyz[%d]=%d ",n,xyz[n]); 44 } 45 printf("\n sum total=%d\n", i); 46 47 return 0; 48 }
The compiler option [Q]opt-report-phase with the offload keyword provides summary information about data transfers between the host and the target. There are two reports for each offload code section defined in the source code. The first report beginning with Offload to target MIC is from the host compilation. The second report beginning with Outlined offload region is from the target compilation. The information in this report option reflects similar information in the output when the OFFLOAD_REPORT environment variable is set to 3.
$ icc example.c -o exampleC_exe -opt-report-phase=offload example.c(6-6):OFFLOAD:Hysum: Offload to target MIC <expr> sumT, default of INOUT changed to OUT siz_2_V$2, default of INOUT changed to IN sumT, default of INOUT changed to OUT siz_2_V$2, default of INOUT changed to IN Data sent from host to target abc_2_V$0, pointer to (<expr>) elements siz_2_V$2, scalar size 4 bytes Data received by host from target efg_2_V$1, pointer to (<expr>) elements sumT, scalar size 4 bytes example.c(6-6):OFFLOAD:Hysum: Outlined offload region sumT, default of INOUT changed to OUT siz_2_V$2, default of INOUT changed to IN sumT, default of INOUT changed to OUT siz_2_V$2, default of INOUT changed to IN Data received by target from host abc_2_V$0, pointer to (<expr>) elements siz_2_V$2, scalar size 4 bytes Data sent from target to host efg_2_V$1, pointer to (<expr>) elements sumT, scalar size 4 bytes
The rest of this example explains the offload report for the source program shown above.
The host and the target execute independently, so once an offload has been initiated the sequence between host prints and target prints is unpredictable, and can vary from run to run. However, all the host prints will be in the same sequence, as well as all the target prints.
Here, the target device is initialized, and the report shows the mapping between logical and physical devices:
[Offload] [HOST] [State] Initialize logical card 0 = physical card 0 [Offload] [HOST] [State] Initialize logical card 1 = physical card 1
Offload code in example.c at line number 6 has started executing on the target.
[Offload] [MIC 0] [File] example.c [Offload] [MIC 0] [Line] 6
Tag value Tag0 is assigned to this offload to enable identifying reports printed for this offload.
[Offload] [MIC 0] [Tag] Tag0
The offload is initiated on the host.
[Offload] [HOST] [Tag 0] [State] Start Offload
The target function corresponding to this offload.
[Offload] [HOST] [Tag 0] [State] Initialize function __offload_entry_example_c_6Hysum
Create data transfer buffers for pointer data .
[Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory [Offload] [HOST] [Tag 0] [State] Create buffer from Host memory [Offload] [HOST] [Tag 0] [State] Create buffer from MIC memory
Pointer data sent from the host to the target using DMA, for the variable abc.
[Offload] [HOST] [Tag 0] [State] Send pointer data [Offload] [HOST] [Tag 0] [State] CPU->MIC pointer data 40
Non-pointer data from the variable siz is collected together and sent from host to target.
[Offload] [HOST] [Tag 0] [State] Gather copyin data [Offload] [HOST] [Tag 0] [State] CPU->MIC copyin data 4
The offloaded code (line 6) on the target is started.
[Offload] [HOST] [Tag 0] [State] Compute task on MIC
The offload has completed. Start receiving data from target.
Pointer data from the variable efg received from the target using DMA.
[Offload] [HOST] [Tag 0] [State] Receive pointer data [Offload] [HOST] [Tag 0] [State] MIC->CPU pointer data 20
This is output from the target. The offload code is invoked on the target.
[Offload] [MIC 0] [Tag 0] [State] Start target function __offload_entry_example_c_6Hysum
The variable abc is an IN for this offload.
[Offload] [MIC 0] [Tag 0] [Var] abc_2_V$0 IN
The variable efg is an OUT for this offload.
[Offload] [MIC 0] [Tag 0] [Var] efg_2_V$1 OUT
The variable sumT is an OUT for this offload.
[Offload] [MIC 0] [Tag 0] [Var] sumT OUT
The variable siz is an IN for this offload.
[Offload] [MIC 0] [Tag 0] [Var] siz_2_V$2 IN
On the target, non-pointer data is copied into target memory.
[Offload] [MIC 0] [Tag 0] [State] Scatter copyin data
User program output:
On device : 0
On the target, non-pointer data, the variable sumT, is gathered and sent from the target to the host.
[Offload] [MIC 0] [Tag 0] [State] Gather copyout data [Offload] [MIC 0] [Tag 0] [State] MIC->CPU copyout data 4
Non-pointer data is copied into target memory.
[Offload] [HOST] [Tag 0] [State] Scatter copyout data
No host time on the target.
[Offload] [MIC 0] [CPU Time] 0.000000 (seconds)
The total amount of pointer and non-pointer data, the variables abc and siz, transferred from host to target.
[Offload] [MIC 0] [CPU->MIC Data] 44 (bytes)
Computation time on the target.
[Offload] [MIC 0] [MIC Time] 0.000298 (seconds)
The total amount of pointer and non-pointer data, the variables efg and sumT, transferred from target to host.
[Offload] [MIC 0] [MIC->CPU Data] 24 (bytes)
User Program output:
xyz[0]=5 xyz[1]=7 xyz[2]=9 xyz[3]=11 xyz[4]=13 sum total=45
Cleanup at end of program:
[Offload] [MIC 1] [State] Unregister data tables [Offload] [MIC 0] [State] Unregister data tables [Offload] [HOST] [State] Unregister data tables