Intel® C++ Compiler 16.0 User and Reference Guide
Executes the statements on the target. This pragma only applies to Intel® MIC Architecture and Intel® Graphics Technology.
#pragma offload clause[, clause...] |
<expression-stmt>
Where clause can be the following required and optional clauses:
Required Clauses
offload-parameter
target ( target-name [ :target-number ] )
Optional Clauses
if ( if-clause )
mandatory
optional
signal ( tag )
status ( statusvarname )
stream ( handle )
wait ( tag [, tag, ...] )
Required Clauses
Controls how the program variables and the amount of data are copied between the host and the target. This clause can be one of the following:
in |
The variable is strictly an input to the target region. The value is not copied back after the region completes. Syntax: in ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] ) |
out |
The variable is strictly an output of the target region. The host does not copy the variable to the target. Syntax: out ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] ) |
inout |
The variable is both copied from the host to the target and back from the target to the host. Syntax: inout ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] ) |
nocopy |
A variable whose value is reused from a previous target execution or one that is used entirely within the offloaded code section may be named in a nocopy clause to avoid any copying. Syntax: nocopy ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] ) |
pin |
A variable whose value is shared between the host and the target. Syntax: pin ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] ) |
The data selected for transfer is a combination of variables implicitly transferred because the variables are lexically referenced within offload constructs, and variables explicitly listed in offload-parameter.
An in or out element-count-expr expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.
An array variable whose size is known from the declaration is copied in its entirety. If a subset of an array is to be processed, use a pointer to the starting element of the subset and the element-count-expr to transfer the array subset.
Because a data pointer variable not listed in an in clause is uninitialized within the construct, it must be assigned a value before it can be de-referenced.
The following are the variables to use with this argument:
variable-ref |
Is one of the following:
|
||||||||||||||||||||
modifier |
Is one of the following:
|
target-name represents the target and can be one of the following values:
gfx |
Intel® Graphics Technology |
mic |
Intel® Xeon Phi™ products |
target-number is an integer expression whose value is interpreted as follows:
Executes the statement on a specified target according to the following formula:
target = target-number % number_of_targets
For example, in a system with four targets:
Specifying 2 or 6 tells the runtime systems to execute the code on target 2, because the result of 2 % 4 and 6 % 4 is 2.
Specifying 1000 tells the runtime systems to execute the code on target 0, because the result of 1000 % 4 is 0.
Executes the statements on a target selected by the runtime system.
target-number is required for the signal and wait clauses.
If the target is not available, the program fails with an error message unless you also specify the optional clause. The optional clause allows the statements to execute on the host if the target is not available.
Optional Clauses
A Boolean expression.
If the expression evaluates to ... |
... then the following occurs. |
---|---|
true |
The statements are executed on the target. |
false |
The statements are executed on the host. The behavior is undefined if the if-clause is used with either the signal or wait clauses. |
Use this clause to control whether offload is enabled. A set of related pragmas should use this clause in a coordinated fashion, so that either all or none of the related offload statements are enabled.
Specifies execution on the target is required. Execution on the host is not allowed.
To continue the program if the correct target hardware is not available, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.
This clause is implied if the optional clause is not specified. You can explicitly specify this clause to reinforce the implied default.
Specifies execution on the target is requested but not required. Execution on the host is allowed if the target is not available.
To determine why the statements were executed on the host instead of the target, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.
Do not use this clause and the mandatory clause in the same pragma as these clauses are opposites of each other.
A handle on an asynchronous data transfer or computational activity. The computation performed by the offload clause and any results returned from the offload using out clauses, occurs concurrently with host execution of the code after the pragma. If this clause is not used, then the entire offload and associated data transfer are executed synchronously. The host will not continue past the pragma until it has completed.
tag is an expression that is a pointer-size value in the baseline language which serves as a handle on an asynchronous activity, either data transfer or computation.
You must specify a target clause with a target-number that is greater than or equal to zero with this clause.
Determine the status of the execution of an offloading construct. The statusvarname variable contains the value that explains the status of the execution. The Description section below explains how to initialize a status variable and the possible values for this variable.
When used with the optional clause and the target is unavailable, the statements to be executed on the target are instead executed on the host.
When used with the mandatory clause and the target is unavailable, the statements are ignored and the program continues. To determine why the statements were ignored or executed on the host, examine the value of this variable.
Offloads to the stream specified by handle. The handle is obtained from the function _Offload_create_stream, which specifies on which Intel® MIC Architecture device to create the stream. The offload is to the device on which the stream had been created. For more information, see Offload Using Streams.
Specifies a wait until a previously initiated asynchronous data transfer or asynchronous computation is completed.
tag is an expression that is a pointer-size value in the baseline language. This expression serves as a handle on a previously initiated asynchronous activity which used the same expression value in a signal clause. The activity could be an asynchronous computation or asynchronous data transfer.
You must specify a target clause with a target-number that is greater than or equal to zero with this clause.
Querying a signal before the signal has been initiated results in undefined behavior and a runtime abort of the application. For example, querying a signal on target:0 that was initiated for target:1 results in a runtime abort of the application because the signal was initiated for target:1, so there is no signal associated with target:0.
This pragma both transfers data and offloads computation to the target.
You can use this pragma before any statement, including a compound statement, or an OpenMP* parallel pragma, to specify remote execution of that compound statement or top-level OpenMP* construct, or a single call statement.
Do not use the __MIC__ macro inside this pragma. You can, however, use the __MIC__ macro in a subprogram called from the pragma.
For Intel® Graphics Technology:
Shared Virtual Memory (SVM) pointers cannot appear in an in clause to a GFX offload region. An error is reported at compile time if a pointer is specified in an in clause when compiled for SVM mode. You do not need to specify pointers in any memory sharing or pinning clauses when compiling for SVM mode. If you assign a pointer an out, inout or pin clause to a GFX offload region, then a warning is reported at compile time.
Physical memory is shared between the CPU and the processor graphics. The offload-parameter values out and inout map to the nocopy implementation, except for direct access to global data objects. On entry to the device region the physical memory for the list items are saved in memory until exit from the device region. This reduces the offload overhead by avoiding the copy and maintains the behavior. Any CPU access to the same memory must be synchronized to avoid race conditions for both the copy and nocopy cases.
Conceptually, this is the sequence of events when this pragma is encountered:
If there is no if clause, go to step 3.
On the host, evaluate the if-clause clause. If the clause evaluates to true, go to step 3. Otherwise, execute the statements on the host and be done.
Attempt to acquire the target. If successful, go to step 4. Otherwise, execute the statements on the host and be done.
On the host, compute all alloc_if, free_if, and element-count-expr expressions used in the in and out clauses, and element-count-expr expressions used in out clause.
On the host, gather all variable values that are inputs to the offload.
Send the input values from the host to the target.
On the target, allocate memory for variable-length out variables.
On the target, copy input values into corresponding target variables.
On the target, execute the statements.
On the target, compute all element-count-expr expressions used in out clauses.
On the target, gather all variable values that are outputs of the offload.
Send output values back from the target to the host.
On the host, copy values received into corresponding host variables.
The statements following this pragma are executed on the target if the target is available. If the target is not available, the optional, mandatory, and status ( statusvarname ) clauses determine how the statements are executed.
If you specify these clauses and the target is unavailable ... |
... then the following occurs. |
---|---|
optional |
The statements are executed on the host. |
optional and status ( statusvarname ) |
The statements are executed on the host and the statusvarname contains the reason why the target was unavailable. |
mandatory |
The statements are ignored and the program ends. |
mandatory and status ( statusvarname ) |
The statements are ignored and the program continues. The statusvarname contains the reason why the target was unavailable. |
To initialize a status variable statusvarname, use the OFFLOAD_STATUS_INIT( statusvarname ) macro. The values of the status variables are defined in offload.h and can be the following values:
Value |
Description |
---|---|
OFFLOAD_SUCCESS = 0 |
The statements were successfully executed on the target. |
OFFLOAD_DISABLED |
The statements were not executed on the target. If you specified if-clause and the value of this clause is false, the statements were successfully executed on the host. |
OFFLOAD_UNAVAILABLE |
The statements were not executed on the target because the target was unavailable. |
OFFLOAD_OUT_OF_MEMORY |
The statements were not executed on the target because there was not enough memory available for offload-parameter. |
OFFLOAD_PROCESS_DIED |
The statements were not executed on the target because a runtime error occurred on the target that caused in the target process to terminate. |
OFFLOAD_ERROR |
The statements were not executed on the target because of an error. |
Using a variable-length array to specify a number of elements copied between the host and target |
---|
void sample(const int nx) { float temp[nx]; #pragma offload target(mic) in(temp : length(nx)) { ... } } |
Using variable-ref in the in/out clauses |
---|
typedef int ARRAY[10][10]; int a[1000][500]; int *p; ARRAY *q; int *r[10][10]; int i, j; struct { int y; } x; #pragma offload … in( a ) #pragma offload … out( a[i:j][:] ) #pragma offload … in( p[0:100] ) #pragma offload … in( (*q)[5][:] ) #pragma offload … in( r[5][5][0:2] ) #pragma offload … out( x.y ) |
Example of pointer array with special alignment on target |
---|
#define ARRAY_SIZE 4 #define DATA_ELEMS 100 __declspec(target(mic)) int start[ARRAY_SIZE]; __declspec(target(mic)) int len[ARRAY_SIZE]; __declspec(target(mic)) int align[ARRAY_SIZE]; __declspec(target(mic)) float *p[ARRAY_SIZE]; float *q[ARRAY_SIZE]; int main() { int i, j; bool failed = false; bool align_failed = false; for (i=0; i<ARRAY_SIZE; i++) { // Alloc ptr array elements; assume memory is available p[i] = (float *)malloc(sizeof(float)*DATA_ELEMS); q[i] = (float *)malloc(sizeof(float)*DATA_ELEMS); p[i][0:DATA_ELEMS] = i; q[i][0:DATA_ELEMS] = p[i][0:DATA_ELEMS]; } start[0] = 0; start[1] = 1; start[2] = 1; start[3] = 0; len[0] = DATA_ELEMS; len[1] = DATA_ELEMS - 2; len[2] = DATA_ELEMS - 2; len[3] = DATA_ELEMS; align[0] = 2048; align[1] = 4096; align[2] = 8192; align[3] = 8; // Start is a section and length is also a section // Default values of alloc_if, free_if // Special alignment // Array allocations will start at element 0, but transfers will not // Update some elements and get them from MIC // UUUUUUUUUUUUUU // .UUUUUUUUUUUU. // .UUUUUUUUUUUU. // UUUUUUUUUUUUUU #pragma offload target(mic) \ inout( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ align(align[0:ARRAY_SIZE]) ) { for (i=0; i<ARRAY_SIZE; i++) { if (((long long)&p[i][0] & (align[i]-1)) != 0) { align_failed = true; printf("p[%d] failed alignment\n", i); fflush(0); } p[i][start[i]:len[i]] += 1.0; } } ... return 0; } |
Example of pointer array and alloc_if and free_if using array sections |
---|
#define ARRAY_SIZE 4 #define DATA_ELEMS 100 __declspec(target(mic)) int start[ARRAY_SIZE]; __declspec(target(mic)) int len[ARRAY_SIZE]; __declspec(target(mic)) int allocif[ARRAY_SIZE]; __declspec(target(mic)) int freeif[ARRAY_SIZE]; __declspec(target(mic)) int *p[ARRAY_SIZE]; int main() { int i, j; bool failed = false; for (i=0; i<ARRAY_SIZE; i++) { // Alloc ptr array elements; assume memory is available p[i] = (int *)malloc(sizeof(int)*DATA_ELEMS); p[i][0:DATA_ELEMS] = i; } start[:] = 1; len[:] = 98; // Start is a section and length is also a section // Default values of free_if and align // alloc_if uses a vector and free_if a scalar that is expanded // Allocate only allocif[:] = 1; #pragma offload_transfer target(mic) \ nocopy( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ alloc_if(allocif[0:ARRAY_SIZE]) \ free_if(0) ) // Do the offload reusing memory #pragma offload target(mic) \ inout( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ alloc_if(0) \ free_if(0) ) { for (i=0; i<ARRAY_SIZE; i++) { p[i][start[i]:len[i]] += 1; } } // Free the memory // alloc_if uses a scalar, free_if a vector freeif[:] = 1; #pragma offload_transfer target(mic) \ nocopy( p[0:ARRAY_SIZE] : \ extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \ alloc_if(0) \ free_if(freeif[0:ARRAY_SIZE]) ) ... return 0; } |
Example of pointer array with into, into_extent and alloc_extent |
---|
#define ARRAY_SIZE 4 #define DATA_ELEMS 100 #define SEND_ROWS 2 #define SEND_COLS 50 #define P_OFFSET_IN 2 #define P_OFFSET_OUT 0 #define Q_OFFSET 0 #define COL_OFFSET 50 __declspec(target(mic)) int len[ARRAY_SIZE]; __declspec(target(mic)) short int *p[ARRAY_SIZE]; __declspec(target(mic)) short int *q[ARRAY_SIZE*2]; #ifdef RUN_ON_CPU bool run_on_cpu = true; #else __declspec(target(mic)) bool run_on_cpu = false; #endif int main() { int i, j; bool failed = false; for (i=0; i<ARRAY_SIZE; i++) { // Alloc ptr array elements; assume memory is available p[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS); q[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS); q[i+ARRAY_SIZE] = (short int *)malloc(sizeof(short int)*DATA_ELEMS); p[i][0:DATA_ELEMS] = i; } len[:] = SEND_COLS; // Scalars used for extent start and extent length // Default values of alloc_if, free_if and align // Data sent to MIC and fetched from MIC // p[2][50:50] -> q[0][50:50] allocate only those 50 elements // p[3][50:50] -> q[1][50:50] allocate only those 50 elements // compute // p[0][50:50] <- q[0][50:50] // p[1][50:50] <- q[1][50:50] #pragma offload target(mic) \ in (p[P_OFFSET_IN:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \ into(q[Q_OFFSET:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) \ alloc_extent(COL_OFFSET:len[0:SEND_ROWS]) ) \ out(q[Q_OFFSET:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \ into(p[P_OFFSET_OUT:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) ) { for (i=0; i<SEND_ROWS; i++) { // If running on CPU, mimic the "in into" if (run_on_cpu) { q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] = p[P_OFFSET_IN+i][COL_OFFSET:SEND_COLS]; } q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] += i*2; // If running on CPU, mimic the "out into" if (run_on_cpu) { p[P_OFFSET_OUT+i][COL_OFFSET:SEND_COLS] = q[Q_OFFSET+i][COL_OFFSET:SEND_COLS]; } } } ... return 0; } |