Intel® C++ Compiler 16.0 User and Reference Guide
Initiates and completes a synchronous data transfer. If used with the signal clause, initiates an asynchronous data transfer. This pragma only applies to Intel® MIC Architecture.
#pragma offload_transfer clause[ clause...] |
offload-parameter
target ( target-name [ :target-number ] )
Optional Clauses
if-clause
mandatory
optional
signal ( tag )
status ( statusvarname )
stream ( handle )
wait ( tag [, tag, ...] )
Required Clauses
Controls how the program variables and the amount of data are copied between the host and the target. This clause can be one of the following:
in |
The variable is strictly an input to the target region. The value is not copied back after the region completes. Syntax: in ( variable-ref [, variable-ref …] [ modifier[ , modifier… ] ] ) |
out |
The variable is strictly an output of the target region. The host does not copy the variable to the target. Syntax: out ( variable-ref [, variable-ref …] [ modifier[ , modifier… ] ] ) |
nocopy |
A variable whose value is reused from a previous target execution or one that is used entirely within the offloaded code section may be named in a nocopy clause to avoid any copying. Syntax: nocopy ( variable-ref [, variable-ref …] [ modifier[ , modifier… ] ] ) |
An in or out expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.
An array variable whose size is known from the declaration is copied in its entirety. If a subset of an array is to be processed, use a pointer to the starting element of the subset and the element-count-expr to transfer the array subset.
The following are the variables to use with this argument:
variable-ref |
Is one of the following:
|
||||||||||||||||||||
modifier |
Is one of the following:
|
||||||||||||||||||||
target-name represents the target. Use mic for Intel® Xeon Phi™ products.
target-number is required for the signal and wait clauses. target-number is an integer expression whose value is interpreted as follows:
Executes the statements on a specified target according to the following formula:
target = target-number % number_of_targets
For example, in a system with four targets:
Specifying 2 or 6 tells the runtime systems to execute the code on target 2, because the result of 2 % 4 and 6 % 4 is 2.
Specifying 1000 tells the runtime systems to execute the code on target 0, because the result of 1000 % 4 is 0.
Executes the statements on a target selected by the runtime system.
target-number is required for the signal and wait clauses.
If the target is not available, the program fails with an error message unless you also specify the optional clause. The optional clause allows the statements to execute on the host if the target is not available.
Optional Clauses
A Boolean expression.
If the expression evaluates to ... |
... then the following occurs. |
---|---|
true |
The statements are executed on the target. |
false |
The statements are executed on the host. The behavior is undefined if the if-clause is used with either the signal or wait clauses. |
Use this clause to control whether offload is enabled. A set of related pragmas should use this clause in a coordinated fashion, so that either all or none of the related offload statements are enabled.
Specifies execution on the target is required. Execution on the host is not allowed.
To continue the program if the correct target hardware is not available, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.
This clause is implied if the optional clause is not specified. You can explicitly specify this clause to reinforce the implied default.
Specifies execution on the target is requested but not required. Execution on the host is allowed if the target is not available.
To determine why the statements were executed on the host instead of the target, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.
Do not use this clause and the mandatory clause in the same pragma as these clauses are opposites of each other.
A handle on an asynchronous data transfer or computational activity. The computation performed by the offload clause and any results returned from the offload using out clauses occurs concurrently with host execution of the code after the pragma. If this clause is not used, then the entire offload and associated data transfer are executed synchronously. The host will not continue past the pragma until it has completed.
tag is an expression that is a pointer-size value in the baseline language which serves as a handle on an asynchronous activity, either data transfer or computation.
You must specify a target clause with a target-number that is greater than or equal to zero with this clause.
Determine the status of the execution of an offloading construct. The statusvarname variable contains the value that explains the status of the execution. The Description section below explains how to initialize a status variable and the possible values for this variable.
When used with the optional clause and the target is unavailable, the statements to be executed on the target are instead executed on the host.
When used with the mandatory clause and the target is unavailable, the statements are ignored and the program continues. To determine why the statements were ignored or executed on the host, examine the value of this variable.
Offloads to the stream specified by handle. The handle is obtained from the function _Offload_create_stream, which specifies on which Intel® MIC Architecture device to create the stream. The offload is to the device on which the stream had been created. For more information, see Offload Using Streams.
Specifies a wait until a previously initiated asynchronous data transfer or asynchronous computation is completed.
tag is an expression that is a pointer-size value in the baseline language. This expression serves as a handle on a previously initiated asynchronous activity which used the same expression value in a signal clause. The activity could be an asynchronous computation or asynchronous data transfer.
You must specify a target clause with a target-number that is greater than or equal to zero with this clause.
Querying a signal before the signal has been initiated results in undefined behavior and a runtime abort of the application. For example, querying a signal on target:0 that was initiated for target:1 results in a runtime abort of the application because the signal was initiated for target:1, so there is no signal associated with target:0.
This pragma initiates asynchronous data transfer and also initiates and completes synchronous data transfer.
The statements following this pragma are executed on the target if the target is available. If the target is not available, the optional, mandatory, and status ( statusvarname ) clauses determine how the statements are executed.
If you specify these clauses and the target is unavailable ... |
... then the following occurs. |
---|---|
optional |
The statements are executed on the host. |
optional and status ( statusvarname ) |
The statements are executed on the host and the statusvarname contains the reason why the target was unavailable. |
mandatory |
The statements are ignored and the program ends. |
mandatory and status ( statusvarname ) |
The statements are ignored and the program continues. The statusvarname contains the reason why the target was unavailable. |
To initialize a status variable statusvarname, use the OFFLOAD_STATUS_INIT( statusvarname ) macro. The values of the status variables are defined in offload.h and can be the following values:
Value |
Description |
---|---|
OFFLOAD_SUCCESS = 0 |
The statements were successfully executed on the target. |
OFFLOAD_DISABLED |
The statements were not executed on the target. If you specified if-clause and the value of this clause is false, the statements were successfully executed on the host. |
OFFLOAD_UNAVAILABLE |
The statements were not executed on the target because the target was unavailable. |
OFFLOAD_OUT_OF_MEMORY |
The statements were not executed on the target because there was not enough memory available for offload-parameter. |
OFFLOAD_PROCESS_DIED |
The statements were not executed on the target because a runtime error occurred on the target that caused in the target process to terminate. |
OFFLOAD_ERROR |
The statements were not executed on the target because of an error. |
Using two different pragmas to receive data asynchronously from the target and the host |
---|
01 const int N = 4086; 02 float *f1, *f2; 03 f1 = (float *)memalign(64, N*sizeof(float)); 04 f2 = (float *)memalign(64, N*sizeof(float)); ... 10 // Host sends f1 as input synchronously 11 // The output is in f2, but is not needed immediately 12 #pragma offload target (mic:0) signal(f2) \ 13 in( f1 : length(N) ) \ 14 nocopy( f2 : length(N) ) signal(f2) 15 { 16 foo(N, f1, f2); 17 } .. 20 #pragma offload_transfer (mic:0) wait(f2) \ out( f2 : length(N) alloc_if(0) free_if(1)) 21 22 // Host can now use the result in f2 |
The offload_target performs the computation but only initiates data transfer. The offload_transfer pragma causes a wait for the data transfer to complete.
Example of pointer array with special alignment on target |
---|
#define ARRAY_SIZE 4 #define DATA_ELEMS 100 __declspec(target(mic)) int start[ARRAY_SIZE]; __declspec(target(mic)) int len[ARRAY_SIZE]; __declspec(target(mic)) int align[ARRAY_SIZE]; __declspec(target(mic)) float *p[ARRAY_SIZE]; float *q[ARRAY_SIZE]; int main() { int i, j; bool failed = false; bool align_failed = false; for (i=0; i<ARRAY_SIZE; i++) { // Alloc ptr array elements; assume memory is available p[i] = (float *)malloc(sizeof(float)*DATA_ELEMS); q[i] = (float *)malloc(sizeof(float)*DATA_ELEMS); p[i][0:DATA_ELEMS] = i; q[i][0:DATA_ELEMS] = p[i][0:DATA_ELEMS]; } start[0] = 0; start[1] = 1; start[2] = 1; start[3] = 0; len[0] = DATA_ELEMS; len[1] = DATA_ELEMS - 2; len[2] = DATA_ELEMS - 2; len[3] = DATA_ELEMS; align[0] = 2048; align[1] = 4096; align[2] = 8192; align[3] = 8; // Start is a section and length is also a section // Default values of alloc_if, free_if // Special alignment // Array allocations will start at element 0, but transfers will not // Update some elements and get them from MIC // UUUUUUUUUUUUUU // .UUUUUUUUUUUU. // .UUUUUUUUUUUU. // UUUUUUUUUUUUUU #pragma offload target(mic) \ inout( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ align(align[0:ARRAY_SIZE]) ) { for (i=0; i<ARRAY_SIZE; i++) { if (((long long)&p[i][0] & (align[i]-1)) != 0) { align_failed = true; printf("p[%d] failed alignment\n", i); fflush(0); } p[i][start[i]:len[i]] += 1.0; } } ... return 0; } |
Example of pointer array and alloc_if and free_if using array sections |
---|
#define ARRAY_SIZE 4 #define DATA_ELEMS 100 __declspec(target(mic)) int start[ARRAY_SIZE]; __declspec(target(mic)) int len[ARRAY_SIZE]; __declspec(target(mic)) int allocif[ARRAY_SIZE]; __declspec(target(mic)) int freeif[ARRAY_SIZE]; __declspec(target(mic)) int *p[ARRAY_SIZE]; int main() { int i, j; bool failed = false; for (i=0; i<ARRAY_SIZE; i++) { // Alloc ptr array elements; assume memory is available p[i] = (int *)malloc(sizeof(int)*DATA_ELEMS); p[i][0:DATA_ELEMS] = i; } start[:] = 1; len[:] = 98; // Start is a section and length is also a section // Default values of free_if and align // alloc_if uses a vector and free_if a scalar that is expanded // Allocate only allocif[:] = 1; #pragma offload_transfer target(mic) \ nocopy( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ alloc_if(allocif[0:ARRAY_SIZE]) \ free_if(0) ) // Do the offload reusing memory #pragma offload target(mic) \ inout( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ alloc_if(0) \ free_if(0) ) { for (i=0; i<ARRAY_SIZE; i++) { p[i][start[i]:len[i]] += 1; } } // Free the memory // alloc_if uses a scalar, free_if a vector freeif[:] = 1; #pragma offload_transfer target(mic) \ nocopy( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ alloc_if(0) \ free_if(freeif[0:ARRAY_SIZE]) ) ... return 0; } |
Example of pointer array with into, into_extent and alloc_extent |
---|
#define ARRAY_SIZE 4 #define DATA_ELEMS 100 #define SEND_ROWS 2 #define SEND_COLS 50 #define P_OFFSET_IN 2 #define P_OFFSET_OUT 0 #define Q_OFFSET 0 #define COL_OFFSET 50 __declspec(target(mic)) int len[ARRAY_SIZE]; __declspec(target(mic)) short int *p[ARRAY_SIZE]; __declspec(target(mic)) short int *q[ARRAY_SIZE*2]; #ifdef RUN_ON_CPU bool run_on_cpu = true; #else __declspec(target(mic)) bool run_on_cpu = false; #endif int main() { int i, j; bool failed = false; for (i=0; i<ARRAY_SIZE; i++) { // Alloc ptr array elements; assume memory is available p[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS); q[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS); q[i+ARRAY_SIZE] = (short int *)malloc(sizeof(short int)*DATA_ELEMS); p[i][0:DATA_ELEMS] = i; } len[:] = SEND_COLS; // Scalars used for extent start and extent length // Default values of alloc_if, free_if and align // Data sent to MIC and fetched from MIC // p[2][50:50] -> q[0][50:50] allocate only those 50 elements // p[3][50:50] -> q[1][50:50] allocate only those 50 elements // compute // p[0][50:50] <- q[0][50:50] // p[1][50:50] <- q[1][50:50] #pragma offload target(mic) \ in (p[P_OFFSET_IN:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \ into(q[Q_OFFSET:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) \ alloc_extent(COL_OFFSET:len[0:SEND_ROWS]) ) \ out(q[Q_OFFSET:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \ into(p[P_OFFSET_OUT:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) ) { for (i=0; i<SEND_ROWS; i++) { // If running on CPU, mimic the "in into" if (run_on_cpu) { q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] = p[P_OFFSET_IN+i][COL_OFFSET:SEND_COLS]; } q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] += i*2; // If running on CPU, mimic the "out into" if (run_on_cpu) { p[P_OFFSET_OUT+i][COL_OFFSET:SEND_COLS] = q[Q_OFFSET+i][COL_OFFSET:SEND_COLS]; } } } ... return 0; } |