Intel® C++ Compiler 16.0 User and Reference Guide

offload

Executes the statements on the target. This pragma only applies to Intel® MIC Architecture and Intel® Graphics Technology.

Syntax

#pragma offload clause[, clause...]

<expression-stmt>

Where clause can be the following required and optional clauses:

Required Clauses

Optional Clauses

Arguments

Required Clauses

offload-parameter

Controls how the program variables and the amount of data are copied between the host and the target. This clause can be one of the following:

in

The variable is strictly an input to the target region. The value is not copied back after the region completes.

Syntax: in ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] )

out

The variable is strictly an output of the target region. The host does not copy the variable to the target.

Syntax: out ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] )

inout

The variable is both copied from the host to the target and back from the target to the host.

Syntax: inout ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] )

nocopy

A variable whose value is reused from a previous target execution or one that is used entirely within the offloaded code section may be named in a nocopy clause to avoid any copying.

Syntax: nocopy ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] )

pin

A variable whose value is shared between the host and the target.

Syntax: pin ( variable-ref [, variable-ref …] [ modifier [ modifier … ] ] )

The data selected for transfer is a combination of variables implicitly transferred because the variables are lexically referenced within offload constructs, and variables explicitly listed in offload-parameter.

An in or out element-count-expr expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.

An array variable whose size is known from the declaration is copied in its entirety. If a subset of an array is to be processed, use a pointer to the starting element of the subset and the element-count-expr to transfer the array subset.

Because a data pointer variable not listed in an in clause is uninitialized within the construct, it must be assigned a value before it can be de-referenced.

The following are the variables to use with this argument:

variable-ref

Is one of the following:

  • A C/C++ identifier.

  • variable-ref.identifier

    Use the following syntax for the variables in this arguments:

    • var : length ( l)

    • var [ 0 : length ]

  • array-slice

    An expression that denotes one contiguous set of array elements.

modifier

Is one of the following:

  • align (align-expression)

    where value of expression is a scalar integral expression or an array slice expression. Use it with:

    • Pointer variables.

      align-expression must be a scalar integral expression. It specifies the minimum alignment for pointer data allocated on the target. The value must be a power of two.

    • Pointer arrays.

      align-expression may be either a scalar integral expression or an array slice expression. The specified value(s) requests the minimum alignment for pointer data allocated on the target. The value(s) must be a power of two.

      align-expression that is an array slice expression specifies a set of alignment values that apply one-to-one with the pointers in the pointer array. The first alignment value applies to the first pointer, the second alignment to the second pointer, and so on.

  • alloc (array-slice)

    where array-slice specifies a set of elements of the array that need allocation. Data specified by the in/out expression is transferred into the corresponding section of the array allocated on the target.

    Use the following syntax for the argument in this modifier:

    • var [ start : length ]

    For more information, see Allocating Memory for Parts of Arrays.

  • alloc_if ( condition )

    where condition is a scalar Boolean expression or a Boolean array slice expression. Use it with:

    • Pointer variables.

      condition must be a scalar Boolean expression. It controls whether to allocate the memory allocated for the variables in the in/out/inout/nocopy clause. If the expression evaluates to true, a new memory allocation is performed for each variable listed in the clause. If the condition evaluates to false, the existing allocated values on the target are reused (data persistence). You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a free_if (0) clause on an earlier offload.

    • Pointer arrays.

      condition may be either a Boolean expression or a Boolean array slice expression. The specified value(s) control whether to allocate the memory for the variables in the in/out/inout/nocopy clause on the target.

      condition that is an array slice expression specifies a set of Boolean values that apply one-to-one with the pointers in the pointer array. The first Boolean value applies to the first pointer, the second Boolean value to the second pointer, and so on.

    The following are the default settings for this modifier:

    Modifier

    Default Setting

    in

    true

    inout

    true

    out

    true

    nocopy

    false

  • alloc_extent ( start : length )

    where start and length are either integral expressions or array slice expressions that yield sets of integral values. Both forms of expressions are computed at runtime. Use this modifier with:

    • Pointer arrays.

      start and length may be integral expressions or array slice expressions. If an integral expression is specified then it is assumed to apply to each of the pointers. If an array slice, then there is a one-to-one correspondence between elements of the pointer array and elements of the array slice.

      start specifies the first element to be allocated on the target. length specifies the number of elements to allocate. If start is negative a runtime error occurs.

  • extent ( start : length )

    where start and length are either integral expressions or array slice expressions that yield sets of integral values. Both forms of expressions are computed at runtime. Use this modifier with:

    • Pointer arrays.

      The values of the pointers in the array are never copied across the host/target interface because there is no correspondence between the memory addresses of the host and the target. Instead, objects that the pointers point to are copied to or from the target, and the pointer array is recreated.

      start and length may be integral expressions or array slice expressions. If an integral expression is specified, it is assumed to apply to each of the pointers. If an array slice, then there is a one-to-one correspondence between elements of the pointer array and elements of the array slice.

      start specifies the first element to be transferred. length specifies the number of elements to transfer. If start is negative a runtime error occurs.

  • free_if (condition)

    where condition is a scalar Boolean expression or a Boolean array slice expression. Use it with:

    • Pointer variables.

      condition must be a scalar Boolean expression. It controls whether to deallocate the memory allocated for the variables in the in/out/inout/nocopy clause. If the expression evaluates to false , no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory (data persistence).

    • Pointer arrays.

      condition may be either a Boolean expression or a Boolean array slice expression. The specified value(s) control whether to deallocate the memory for the variables in the in/out/inout/nocopy clause on the target.

      condition that is an array slice expression specifies a set of Boolean values that apply one-to-one with the pointers in the pointer array. The first Boolean value applies to the first pointer, the second Boolean value to the second pointer, and so on.

    The following are the default settings for this modifier:

    Modifier

    Default Setting

    in

    true

    inout

    true

    out

    true

    nocopy

    false

    For more information, see Managing Memory Allocation for Pointer Variables.

  • into (var-exp)

    where var-exp is a variable expression. This modifier allows data to be transferred from one variable on the host to another variable on the target, and vice versa. Only one item is allowed in variable-ref when using this modifier.

    Use the following syntax for the argument in this clause:

    • var (when a length modifier is also used.

    • var [ start : length ] (when a separate length modifier is not used).

    For more information, see Moving Data from One Variable to Another.

  • into_extent ( start : length )

    where start and length are either integral expressions or array slice expressions that yield sets of integral values. Both forms of expressions are computed at runtime. Use this modifier with:

    • Pointer arrays used with the into modifier.

      start and length may be integral expressions or array slice expressions. If an integral expression is specified, it is assumed to apply to each of the pointers. If an array slice, then there is a one-to-one correspondence between elements of the pointer array and elements of the array slice.

      start specifies the first element to allocated on the target. length specifies the number of elements to allocate. If start is negative a runtime error occurs.

  • length (element-count-expr)

    where element-count-expr is a scalar integral expression or an array slice expression, computed at runtime. Use it with:

    • Pointer variables.

      Pointer variable values themselves are never copied across the host/target interface because there is no correspondence between the memory addresses of the host and the target. Instead, objects that a pointer points to are copied to or from the target, and the value of the pointer variable is recreated.

      You can use a scalar integral expression element-count-expr to specify how many elements of the pointer type should be considered as the data the pointer points to. If the expression value is zero or negative, a runtime error occurs. For scalar pointer variables, an array slice expression may not be specified.

    • Variable-length arrays.

      element-count-expr specifies a number of elements copied between the host and target.

    • Pointer arrays.

      You can specify a scalar integral expression as element-count-expr to specify how many elements of each of the pointers in the array should be transferred, beginning with the 0th element. If the expression value is zero or negative, a runtime error occurs.

      element-count-expr that is an array slice expression specifies a set of length values that apply one-to-one with the pointers in the pointer array. The first length value applies to the first pointer, the second length to the second pointer and so on. The starting element to transfer is 0 in each case.

    • [preallocated] targetptr

      Permits allocating memory for Intel® MIC Architecture only. No CPU memory is allocated. If you want to do memory allocation on Intel® MIC Architecture yourself, specify preallocated. It specifies that the memory on Intel® MIC Architecture is already allocated by the user and has to be made available for data transfer. For more information, see Device-Only Memory Allocation.

target ( target-name [ :target-number ] )

target-name represents the target and can be one of the following values:

gfx

Intel® Graphics Technology

mic

Intel® Xeon Phi™ products

target-number is an integer expression whose value is interpreted as follows:

>=0

Executes the statement on a specified target according to the following formula:

target = target-number % number_of_targets

For example, in a system with four targets:

  • Specifying 2 or 6 tells the runtime systems to execute the code on target 2, because the result of 2 % 4 and 6 % 4 is 2.

  • Specifying 1000 tells the runtime systems to execute the code on target 0, because the result of 1000 % 4 is 0.

-1 or no value

Executes the statements on a target selected by the runtime system.

<-1

Reserved.

target-number is required for the signal and wait clauses.

If the target is not available, the program fails with an error message unless you also specify the optional clause. The optional clause allows the statements to execute on the host if the target is not available.

Optional Clauses

if-clause

A Boolean expression.

If the expression evaluates to ...

... then the following occurs.

true

The statements are executed on the target.

false

The statements are executed on the host. The behavior is undefined if the if-clause is used with either the signal or wait clauses.

Note

Use this clause to control whether offload is enabled. A set of related pragmas should use this clause in a coordinated fashion, so that either all or none of the related offload statements are enabled.

mandatory

Specifies execution on the target is required. Execution on the host is not allowed.

To continue the program if the correct target hardware is not available, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.

This clause is implied if the optional clause is not specified. You can explicitly specify this clause to reinforce the implied default.

optional

Specifies execution on the target is requested but not required. Execution on the host is allowed if the target is not available.

To determine why the statements were executed on the host instead of the target, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.

Note

Do not use this clause and the mandatory clause in the same pragma as these clauses are opposites of each other.

signal ( tag ) )

A handle on an asynchronous data transfer or computational activity. The computation performed by the offload clause and any results returned from the offload using out clauses, occurs concurrently with host execution of the code after the pragma. If this clause is not used, then the entire offload and associated data transfer are executed synchronously. The host will not continue past the pragma until it has completed.

tag is an expression that is a pointer-size value in the baseline language which serves as a handle on an asynchronous activity, either data transfer or computation.

Note

You must specify a target clause with a target-number that is greater than or equal to zero with this clause.

status ( statusvarname )

Determine the status of the execution of an offloading construct. The statusvarname variable contains the value that explains the status of the execution. The Description section below explains how to initialize a status variable and the possible values for this variable.

When used with the optional clause and the target is unavailable, the statements to be executed on the target are instead executed on the host.

When used with the mandatory clause and the target is unavailable, the statements are ignored and the program continues. To determine why the statements were ignored or executed on the host, examine the value of this variable.

stream ( handle )

Offloads to the stream specified by handle. The handle is obtained from the function _Offload_create_stream, which specifies on which Intel® MIC Architecture device to create the stream. The offload is to the device on which the stream had been created. For more information, see Offload Using Streams.

wait ( tag [, tag, ...] ) )

Specifies a wait until a previously initiated asynchronous data transfer or asynchronous computation is completed.

tag is an expression that is a pointer-size value in the baseline language. This expression serves as a handle on a previously initiated asynchronous activity which used the same expression value in a signal clause. The activity could be an asynchronous computation or asynchronous data transfer.

Note

You must specify a target clause with a target-number that is greater than or equal to zero with this clause.

Querying a signal before the signal has been initiated results in undefined behavior and a runtime abort of the application. For example, querying a signal on target:0 that was initiated for target:1 results in a runtime abort of the application because the signal was initiated for target:1, so there is no signal associated with target:0.

Description

This pragma both transfers data and offloads computation to the target.

You can use this pragma before any statement, including a compound statement, or an OpenMP* parallel pragma, to specify remote execution of that compound statement or top-level OpenMP* construct, or a single call statement.

Note

Do not use the __MIC__ macro inside this pragma. You can, however, use the __MIC__ macro in a subprogram called from the pragma.

Note

For Intel® Graphics Technology:

  • Shared Virtual Memory (SVM) pointers cannot appear in an in clause to a GFX offload region. An error is reported at compile time if a pointer is specified in an in clause when compiled for SVM mode. You do not need to specify pointers in any memory sharing or pinning clauses when compiling for SVM mode. If you assign a pointer an out, inout or pin clause to a GFX offload region, then a warning is reported at compile time.

  • Physical memory is shared between the CPU and the processor graphics. The offload-parameter values out and inout map to the nocopy implementation, except for direct access to global data objects. On entry to the device region the physical memory for the list items are saved in memory until exit from the device region. This reduces the offload overhead by avoiding the copy and maintains the behavior. Any CPU access to the same memory must be synchronized to avoid race conditions for both the copy and nocopy cases.

Conceptually, this is the sequence of events when this pragma is encountered:

  1. If there is no if clause, go to step 3.

  2. On the host, evaluate the if-clause clause. If the clause evaluates to true, go to step 3. Otherwise, execute the statements on the host and be done.

  3. Attempt to acquire the target. If successful, go to step 4. Otherwise, execute the statements on the host and be done.

  4. On the host, compute all alloc_if, free_if, and element-count-expr expressions used in the in and out clauses, and element-count-expr expressions used in out clause.

  5. On the host, gather all variable values that are inputs to the offload.

  6. Send the input values from the host to the target.

  7. On the target, allocate memory for variable-length out variables.

  8. On the target, copy input values into corresponding target variables.

  9. On the target, execute the statements.

  10. On the target, compute all element-count-expr expressions used in out clauses.

  11. On the target, gather all variable values that are outputs of the offload.

  12. Send output values back from the target to the host.

  13. On the host, copy values received into corresponding host variables.

The statements following this pragma are executed on the target if the target is available. If the target is not available, the optional, mandatory, and status ( statusvarname ) clauses determine how the statements are executed.

If you specify these clauses and the target is unavailable ...

... then the following occurs.

optional

The statements are executed on the host.

optional and status ( statusvarname )

The statements are executed on the host and the statusvarname contains the reason why the target was unavailable.

mandatory

The statements are ignored and the program ends.

mandatory and status ( statusvarname )

The statements are ignored and the program continues. The statusvarname contains the reason why the target was unavailable.

To initialize a status variable statusvarname, use the OFFLOAD_STATUS_INIT( statusvarname ) macro. The values of the status variables are defined in offload.h and can be the following values:

Value

Description

OFFLOAD_SUCCESS = 0

The statements were successfully executed on the target.

OFFLOAD_DISABLED

The statements were not executed on the target. If you specified if-clause and the value of this clause is false, the statements were successfully executed on the host.

OFFLOAD_UNAVAILABLE

The statements were not executed on the target because the target was unavailable.

OFFLOAD_OUT_OF_MEMORY

The statements were not executed on the target because there was not enough memory available for offload-parameter.

OFFLOAD_PROCESS_DIED

The statements were not executed on the target because a runtime error occurred on the target that caused in the target process to terminate.

OFFLOAD_ERROR

The statements were not executed on the target because of an error.

Examples

Using a variable-length array to specify a number of elements copied between the host and target

void sample(const int nx) {
  float temp[nx];
  #pragma offload target(mic) in(temp : length(nx))
  { ... }
}

Using variable-ref in the in/out clauses

typedef int ARRAY[10][10]; 
int a[1000][500];
int *p;
ARRAY *q;
int *r[10][10];
int i, j;
struct { int y; } x;
#pragma offload …  in( a )
#pragma offload … out( a[i:j][:] )
#pragma offload …  in( p[0:100] )
#pragma offload …  in( (*q)[5][:] )
#pragma offload …  in( r[5][5][0:2] )
#pragma offload … out( x.y )

Example of pointer array with special alignment on target

#define ARRAY_SIZE 4
#define DATA_ELEMS 100

__declspec(target(mic)) int start[ARRAY_SIZE];
__declspec(target(mic)) int len[ARRAY_SIZE];
__declspec(target(mic)) int align[ARRAY_SIZE];
__declspec(target(mic)) float *p[ARRAY_SIZE];
float *q[ARRAY_SIZE];

int main() {
  int i, j;
  bool failed = false;
  bool align_failed = false;

  for (i=0; i<ARRAY_SIZE; i++) {
    // Alloc ptr array elements; assume memory is available
    p[i] = (float *)malloc(sizeof(float)*DATA_ELEMS);
    q[i] = (float *)malloc(sizeof(float)*DATA_ELEMS);
    p[i][0:DATA_ELEMS] = i;
    q[i][0:DATA_ELEMS] = p[i][0:DATA_ELEMS];
  }
 
  start[0] = 0;
  start[1] = 1;
  start[2] = 1;
  start[3] = 0;
  len[0] = DATA_ELEMS;
  len[1] = DATA_ELEMS - 2;
  len[2] = DATA_ELEMS - 2;
  len[3] = DATA_ELEMS;
  align[0] = 2048;
  align[1] = 4096;
  align[2] = 8192;
  align[3] = 8;

  // Start is a section and length is also a section
  // Default values of alloc_if, free_if
  // Special alignment
  // Array allocations will start at element 0, but transfers will not
  // Update some elements and get them from MIC
  // UUUUUUUUUUUUUU
  // .UUUUUUUUUUUU.
  // .UUUUUUUUUUUU.
  // UUUUUUUUUUUUUU

  #pragma offload target(mic) \
  inout( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  align(align[0:ARRAY_SIZE]) ) { 
    for (i=0; i<ARRAY_SIZE; i++) {
       if (((long long)&p[i][0] & (align[i]-1)) != 0) {
          align_failed = true;
          printf("p[%d] failed alignment\n", i);
          fflush(0);
       }
    p[i][start[i]:len[i]] += 1.0;
    }
 }
 ...
 return 0;
}

Example of pointer array and alloc_if and free_if using array sections

#define ARRAY_SIZE 4
#define DATA_ELEMS 100

__declspec(target(mic)) int start[ARRAY_SIZE];
__declspec(target(mic)) int len[ARRAY_SIZE];
__declspec(target(mic)) int allocif[ARRAY_SIZE];
__declspec(target(mic)) int freeif[ARRAY_SIZE];
__declspec(target(mic)) int *p[ARRAY_SIZE];

int main() {
  int i, j;
  bool failed = false;

  for (i=0; i<ARRAY_SIZE; i++) {
    // Alloc ptr array elements; assume memory is available
    p[i] = (int *)malloc(sizeof(int)*DATA_ELEMS);
    p[i][0:DATA_ELEMS] = i;
  }
 
  start[:] = 1;
  len[:] = 98;

  // Start is a section and length is also a section
  // Default values of free_if and align
  // alloc_if uses a vector and free_if a scalar that is expanded
  // Allocate only
  allocif[:] = 1;
  #pragma offload_transfer target(mic) \
  nocopy( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  alloc_if(allocif[0:ARRAY_SIZE]) \
  free_if(0) )

  // Do the offload reusing memory
  #pragma offload target(mic) \
  inout( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  alloc_if(0) \
  free_if(0) ) { 
    for (i=0; i<ARRAY_SIZE; i++) { p[i][start[i]:len[i]] += 1; }
  }

  // Free the memory
  // alloc_if uses a scalar, free_if a vector
  freeif[:] = 1;
  #pragma offload_transfer target(mic) \
  nocopy( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  alloc_if(0) \
  free_if(freeif[0:ARRAY_SIZE]) )

 ...
 return 0;
}

Example of pointer array with into, into_extent and alloc_extent

#define ARRAY_SIZE 4
#define DATA_ELEMS 100
#define SEND_ROWS 2
#define SEND_COLS 50
#define P_OFFSET_IN 2
#define P_OFFSET_OUT 0
#define Q_OFFSET 0
#define COL_OFFSET 50

__declspec(target(mic)) int len[ARRAY_SIZE];
__declspec(target(mic)) short int *p[ARRAY_SIZE];
__declspec(target(mic)) short int *q[ARRAY_SIZE*2];

#ifdef RUN_ON_CPU
bool run_on_cpu = true;
#else
__declspec(target(mic)) bool run_on_cpu = false;
#endif

int main() {
  int i, j;
  bool failed = false;

  for (i=0; i<ARRAY_SIZE; i++) {
    // Alloc ptr array elements; assume memory is available
    p[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS);
    q[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS);
    q[i+ARRAY_SIZE] = (short int *)malloc(sizeof(short int)*DATA_ELEMS);
    p[i][0:DATA_ELEMS] = i;
  }
 
  len[:] = SEND_COLS;

  // Scalars used for extent start and extent length
  // Default values of alloc_if, free_if and align
  // Data sent to MIC and fetched from MIC
  // p[2][50:50] -> q[0][50:50] allocate only those 50 elements
  // p[3][50:50] -> q[1][50:50] allocate only those 50 elements
  //  compute
  // p[0][50:50] <- q[0][50:50]
  // p[1][50:50] <- q[1][50:50]

  #pragma offload target(mic) \
  in (p[P_OFFSET_IN:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \
  into(q[Q_OFFSET:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) \
  alloc_extent(COL_OFFSET:len[0:SEND_ROWS]) ) \
  out(q[Q_OFFSET:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \
  into(p[P_OFFSET_OUT:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) ) {
    for (i=0; i<SEND_ROWS; i++) {
      // If running on CPU, mimic the "in into"
      if (run_on_cpu) {
        q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] = p[P_OFFSET_IN+i][COL_OFFSET:SEND_COLS];
      }

    q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] += i*2;
   
    // If running on CPU, mimic the "out into"
    if (run_on_cpu) {
      p[P_OFFSET_OUT+i][COL_OFFSET:SEND_COLS] = q[Q_OFFSET+i][COL_OFFSET:SEND_COLS];
    }
  }
}
 ...
 return 0;
}

See Also