Intel® C++ Compiler 16.0 User and Reference Guide

offload_transfer

Initiates and completes a synchronous data transfer. If used with the signal clause, initiates an asynchronous data transfer. This pragma only applies to Intel® MIC Architecture.

Syntax

#pragma offload_transfer clause[ clause...]

Required Clauses

Optional Clauses

Arguments

Required Clauses

offload-parameter

Controls how the program variables and the amount of data are copied between the host and the target. This clause can be one of the following:

in

The variable is strictly an input to the target region. The value is not copied back after the region completes.

Syntax: in ( variable-ref [, variable-ref …] [ modifier[ , modifier… ] ] )

out

The variable is strictly an output of the target region. The host does not copy the variable to the target.

Syntax: out ( variable-ref [, variable-ref …] [ modifier[ , modifier… ] ]

)

nocopy

A variable whose value is reused from a previous target execution or one that is used entirely within the offloaded code section may be named in a nocopy clause to avoid any copying.

Syntax: nocopy ( variable-ref [, variable-ref …] [ modifier[ , modifier… ] ] )

An in or out expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.

An array variable whose size is known from the declaration is copied in its entirety. If a subset of an array is to be processed, use a pointer to the starting element of the subset and the element-count-expr to transfer the array subset.

The following are the variables to use with this argument:

variable-ref

Is one of the following:

  • A C/C++ identifier.

  • variable-ref . identifier

    Use the following syntax for the variables in this argument:

    • var : length ( l)

    • var [ 0 : length ]

  • array-slice

    An array expression that denotes one contiguous set of array elements.

modifier

Is one of the following:

  • align (expression)

    where align-expression is a scalar integral expression or an array slice expression. Use it with:

    • Pointer variables.

      align-expression must be a scalar integral expression. It specifies the minimum alignment for pointer data allocated on the target. The value must be a power of two.

    • Pointer arrays.

      align-expression may be either a scalar integral expression or an array slice expression. The specified value(s) requests the minimum alignment for pointer data allocated on the target. The value(s) must be a power of two.

      align-expression that is an array slice expression specifies a set of alignment values that apply one-to-one with the pointers in the pointer array. The first alignment value applies to the first pointer, the second alignment to the second pointer, and so on.

  • alloc (array-slice)

    where array-slice specifies a set of elements of the array that need allocation. Data specified by the in/out expression is transferred into the corresponding section of the array allocated on the target.

    Use the following syntax for the argument in this modifier:

    • var [ start : length ]

    For more information, see Allocating Memory for Parts of Arrays.

  • alloc_if ( condition )

    where condition is a scalar Boolean expression or a Boolean array slice expression. Use it with:

    • Pointer variables.

      condition must be a scalar Boolean expression. It controls whether to allocate the memory allocated for the variables in the in/out/inout/nocopy clause. If the expression evaluates to true, a new memory allocation is performed for each variable listed in the clause. If the condition evaluates to false, the existing allocated values on the target are reused (data persistence). You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a free_if (0) clause on an earlier offload.

    • Pointer arrays.

      condition may be either a Boolean expression or a Boolean array slice expression. The specified value(s) control whether to allocate the memory for the variables in the in/out/inout/nocopy clause on the target.

      condition that is an array slice expression specifies a set of Boolean values that apply one-to-one with the pointers in the pointer array. The first Boolean value applies to the first pointer, the second Boolean value to the second pointer, and so on.

    The following are the default settings for this modifier:

    Modifier

    Default Setting

    in

    true

    inout

    true

    out

    true

    nocopy

    false

  • alloc_extent ( start : length )

    where start and length are either integral expressions or array slice expressions that yield sets of integral values. Both forms of expressions are computed at runtime. Use this modifier with:

    • Pointer arrays.

      start and length may be integral expressions or array slice expressions. If an integral expression is specified then it is assumed to apply to each of the pointers. If an array slice, then there is a one-to-one correspondence between elements of the pointer array and elements of the array slice.

      start specifies the first element to be allocated on the target. length specifies the number of elements to allocate. If start is negative a runtime error occurs.

  • extent ( start : length )

    where start and length are either integral expressions or array slice expressions that yield sets of integral values. Both forms of expressions are computed at runtime. Use this modifier with:

    • Pointer arrays.

      The values of the pointers in the array are never copied across the host/target interface because there is no correspondence between the memory addresses of the host and the target. Instead, objects that the pointers point to are copied to or from the target, and the pointer array is recreated.

      start and length may be integral expressions or array slice expressions. If an integral expression is specified, it is assumed to apply to each of the pointers. If an array slice, then there is a one-to-one correspondence between elements of the pointer array and elements of the array slice.

      start specifies the first element to be transferred. length specifies the number of elements to transfer. If start is negative a runtime error occurs.

  • free_if (condition)

    where condition is a scalar Boolean expression or a Boolean array slice expression. Use it with:

    • Pointer variables.

      condition must be a scalar Boolean expression. It controls whether to deallocate the memory allocated for the variables in the in/out/inout/nocopy clause. If the expression evaluates to false , no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory (data persistence).

    • Pointer arrays.

      condition may be either a Boolean expression or a Boolean array slice expression. The specified value(s) control whether to deallocate the memory for the variables in the in/out/inout/nocopy clause on the target.

      condition that is an array slice expression specifies a set of Boolean values that apply one-to-one with the pointers in the pointer array. The first Boolean value applies to the first pointer, the second Boolean value to the second pointer, and so on.

    The following are the default settings for this modifier:

    Modifier

    Default Setting

    in

    true

    inout

    true

    out

    true

    nocopy

    false

    For more information, see Managing Memory Allocation for Pointer Variables.

  • into (var-exp)

    where var-exp is a variable expression.

    This modifier allows data to be transferred from one variable on the host to another variable on the target, and vice versa. Only one item is allowed in variable-ref when using this modifier.

    Use the following syntax for the argument in this clause:

    • var (when a length modifier is also used.

    • var [ start : length ] (when a separate length modifier is not used).

    For more information, see Moving Data from One Variable to Another.

  • into_extent ( start : length )

    where start and length are either integral expressions or array slice expressions that yield sets of integral values. Both forms of expressions are computed at runtime. Use this modifier with:

    • Pointer arrays used with the into modifier.

      start and length may be integral expressions or array slice expressions. If an integral expression is specified, it is assumed to apply to each of the pointers. If an array slice, then there is a one-to-one correspondence between elements of the pointer array and elements of the array slice.

      start specifies the first element to allocated on the target. length specifies the number of elements to allocate. If start is negative a runtime error occurs.

  • length (element-count-expr)

    where element-count-expr is a scalar integral expression or an array slice expression, computed at runtime. Use it with:

    • Pointer variables.

      Pointer variable values themselves are never copied across the host/target interface because there is no correspondence between the memory addresses of the host and the target. Instead, objects that a pointer points to are copied to or from the target, and the value of the pointer variable is recreated.

      You can use a scalar integral expression element-count-expr to specify how many elements of the pointer type should be considered as the data the pointer points to. If the expression value is zero or negative, a runtime error occurs. For scalar pointer variables, an array slice expression may not be specified.

    • Variable-length arrays.

      element-count-expr specifies a number of elements copied between the host and target.

    • Pointer arrays.

      You can specify a scalar integral expression as element-count-expr to specify how many elements of each of the pointers in the array should be transferred, beginning with the 0th element. If the expression value is zero or negative, a runtime error occurs.

      element-count-expr that is an array slice expression specifies a set of length values that apply one-to-one with the pointers in the pointer array. The first length value applies to the first pointer, the second length to the second pointer and so on. The starting element to transfer is 0 in each case.

    • [preallocated] targetptr

      Permits allocating memory for Intel® MIC Architecture only. No CPU memory is allocated. If you want to do memory allocation on Intel® MIC Architecture yourself, specify preallocated. It specifies that the memory on Intel® MIC Architecture is already allocated by the user and has to be made available for data transfer. For more information, see Device-Only Memory Allocation.

target ( target-name [ :target-number ])

target-name represents the target. Use mic for Intel® Xeon Phi™ products.

target-number is required for the signal and wait clauses. target-number is an integer expression whose value is interpreted as follows:

>=0

Executes the statements on a specified target according to the following formula:

target = target-number % number_of_targets

For example, in a system with four targets:

  • Specifying 2 or 6 tells the runtime systems to execute the code on target 2, because the result of 2 % 4 and 6 % 4 is 2.

  • Specifying 1000 tells the runtime systems to execute the code on target 0, because the result of 1000 % 4 is 0.

-1 or no value

Executes the statements on a target selected by the runtime system.

<= -1

Reserved.

target-number is required for the signal and wait clauses.

If the target is not available, the program fails with an error message unless you also specify the optional clause. The optional clause allows the statements to execute on the host if the target is not available.

Optional Clauses

if-clause

A Boolean expression.

If the expression evaluates to ...

... then the following occurs.

true

The statements are executed on the target.

false

The statements are executed on the host. The behavior is undefined if the if-clause is used with either the signal or wait clauses.

Note

Use this clause to control whether offload is enabled. A set of related pragmas should use this clause in a coordinated fashion, so that either all or none of the related offload statements are enabled.

mandatory

Specifies execution on the target is required. Execution on the host is not allowed.

To continue the program if the correct target hardware is not available, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.

This clause is implied if the optional clause is not specified. You can explicitly specify this clause to reinforce the implied default.

optional

Specifies execution on the target is requested but not required. Execution on the host is allowed if the target is not available.

To determine why the statements were executed on the host instead of the target, initialize a variable statusvarname and use the status ( statusvarname ) clause in this pragma. The Description section below explains how to initialize a status variable and the possible values for the status variable.

Note

Do not use this clause and the mandatory clause in the same pragma as these clauses are opposites of each other.

signal ( tag )

A handle on an asynchronous data transfer or computational activity. The computation performed by the offload clause and any results returned from the offload using out clauses occurs concurrently with host execution of the code after the pragma. If this clause is not used, then the entire offload and associated data transfer are executed synchronously. The host will not continue past the pragma until it has completed.

tag is an expression that is a pointer-size value in the baseline language which serves as a handle on an asynchronous activity, either data transfer or computation.

Note

You must specify a target clause with a target-number that is greater than or equal to zero with this clause.

status ( statusvarname )

Determine the status of the execution of an offloading construct. The statusvarname variable contains the value that explains the status of the execution. The Description section below explains how to initialize a status variable and the possible values for this variable.

When used with the optional clause and the target is unavailable, the statements to be executed on the target are instead executed on the host.

When used with the mandatory clause and the target is unavailable, the statements are ignored and the program continues. To determine why the statements were ignored or executed on the host, examine the value of this variable.

stream ( handle )

Offloads to the stream specified by handle. The handle is obtained from the function _Offload_create_stream, which specifies on which Intel® MIC Architecture device to create the stream. The offload is to the device on which the stream had been created. For more information, see Offload Using Streams.

wait ( tag [, tag, ...] )

Specifies a wait until a previously initiated asynchronous data transfer or asynchronous computation is completed.

tag is an expression that is a pointer-size value in the baseline language. This expression serves as a handle on a previously initiated asynchronous activity which used the same expression value in a signal clause. The activity could be an asynchronous computation or asynchronous data transfer.

Note

You must specify a target clause with a target-number that is greater than or equal to zero with this clause.

Querying a signal before the signal has been initiated results in undefined behavior and a runtime abort of the application. For example, querying a signal on target:0 that was initiated for target:1 results in a runtime abort of the application because the signal was initiated for target:1, so there is no signal associated with target:0.

Description

This pragma initiates asynchronous data transfer and also initiates and completes synchronous data transfer.

The statements following this pragma are executed on the target if the target is available. If the target is not available, the optional, mandatory, and status ( statusvarname ) clauses determine how the statements are executed.

If you specify these clauses and the target is unavailable ...

... then the following occurs.

optional

The statements are executed on the host.

optional and status ( statusvarname )

The statements are executed on the host and the statusvarname contains the reason why the target was unavailable.

mandatory

The statements are ignored and the program ends.

mandatory and status ( statusvarname )

The statements are ignored and the program continues. The statusvarname contains the reason why the target was unavailable.

To initialize a status variable statusvarname, use the OFFLOAD_STATUS_INIT( statusvarname ) macro. The values of the status variables are defined in offload.h and can be the following values:

Value

Description

OFFLOAD_SUCCESS = 0

The statements were successfully executed on the target.

OFFLOAD_DISABLED

The statements were not executed on the target. If you specified if-clause and the value of this clause is false, the statements were successfully executed on the host.

OFFLOAD_UNAVAILABLE

The statements were not executed on the target because the target was unavailable.

OFFLOAD_OUT_OF_MEMORY

The statements were not executed on the target because there was not enough memory available for offload-parameter.

OFFLOAD_PROCESS_DIED

The statements were not executed on the target because a runtime error occurred on the target that caused in the target process to terminate.

OFFLOAD_ERROR

The statements were not executed on the target because of an error.

Examples

Using two different pragmas to receive data asynchronously from the target and the host

01   const int N = 4086;
02   float *f1, *f2;
03   f1 = (float *)memalign(64, N*sizeof(float)); 
04   f2 = (float *)memalign(64, N*sizeof(float));
...

10   // Host sends f1 as input synchronously
11   // The output is in f2, but is not needed immediately
12   #pragma offload target (mic:0) signal(f2) \
13                          in(  f1 : length(N) ) \
14                          nocopy( f2 : length(N) ) signal(f2)
15   { 
16        foo(N, f1, f2);
17   }
..
20   #pragma offload_transfer (mic:0) wait(f2) \
                     out( f2 : length(N) alloc_if(0) free_if(1))
21   
22   // Host can now use the result in f2

The offload_target performs the computation but only initiates data transfer. The offload_transfer pragma causes a wait for the data transfer to complete.

Using double buffers for inputs to an offload

#pragma offload_attribute(push, target(mic))
int count = 25000000;
int iter = 10;
float *in1, *out1;
float *in2, *out2;
#pragma offload_attribute(pop)

void do_async_in() {
      int i;
      #pragma offload_transfer target(mic:0) in(in1 : length(count) alloc_if(0) free_if(0) ) signal(in1)
      for (i=0; i<iter; i++) {
            if (i%2 == 0) {
                  #pragma offload_transfer target(mic:0) if(i!=iter-1) in(in2 : length(count) alloc_if(0) free_if(0) ) signal(in2)
                  #pragma offload target(mic:0) nocopy(in1) wait(in1) out(out1 : length(count) alloc_if(0) free_if(0) )
                  compute(in1, out1);
            } else {
                  #pragma offload_transfer target(mic:0) if(i!=iter-1) in(in1 : length(count) alloc_if(0) free_if(0) ) signal(in1)
                  #pragma offload target(mic:0) nocopy(in2) wait(in2) out(out2 : length(count) alloc_if(0) free_if(0) )
                  compute(in2, out2);
            }
      }
}

Example of pointer array with special alignment on target

#define ARRAY_SIZE 4
#define DATA_ELEMS 100

__declspec(target(mic)) int start[ARRAY_SIZE];
__declspec(target(mic)) int len[ARRAY_SIZE];
__declspec(target(mic)) int align[ARRAY_SIZE];
__declspec(target(mic)) float *p[ARRAY_SIZE];
float *q[ARRAY_SIZE];

int main() {
  int i, j;
  bool failed = false;
  bool align_failed = false;

  for (i=0; i<ARRAY_SIZE; i++) {
    // Alloc ptr array elements; assume memory is available
    p[i] = (float *)malloc(sizeof(float)*DATA_ELEMS);
    q[i] = (float *)malloc(sizeof(float)*DATA_ELEMS);
    p[i][0:DATA_ELEMS] = i;
    q[i][0:DATA_ELEMS] = p[i][0:DATA_ELEMS];
  }
 
  start[0] = 0;
  start[1] = 1;
  start[2] = 1;
  start[3] = 0;
  len[0] = DATA_ELEMS;
  len[1] = DATA_ELEMS - 2;
  len[2] = DATA_ELEMS - 2;
  len[3] = DATA_ELEMS;
  align[0] = 2048;
  align[1] = 4096;
  align[2] = 8192;
  align[3] = 8;

  // Start is a section and length is also a section
  // Default values of alloc_if, free_if
  // Special alignment
  // Array allocations will start at element 0, but transfers will not
  // Update some elements and get them from MIC
  // UUUUUUUUUUUUUU
  // .UUUUUUUUUUUU.
  // .UUUUUUUUUUUU.
  // UUUUUUUUUUUUUU

  #pragma offload target(mic) \
  inout( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  align(align[0:ARRAY_SIZE]) ) { 
    for (i=0; i<ARRAY_SIZE; i++) {
       if (((long long)&p[i][0] & (align[i]-1)) != 0) {
          align_failed = true;
          printf("p[%d] failed alignment\n", i);
          fflush(0);
       }
    p[i][start[i]:len[i]] += 1.0;
    }
 }
 ...
 return 0;
}

Example of pointer array and alloc_if and free_if using array sections

#define ARRAY_SIZE 4
#define DATA_ELEMS 100

__declspec(target(mic)) int start[ARRAY_SIZE];
__declspec(target(mic)) int len[ARRAY_SIZE];
__declspec(target(mic)) int allocif[ARRAY_SIZE];
__declspec(target(mic)) int freeif[ARRAY_SIZE];
__declspec(target(mic)) int *p[ARRAY_SIZE];

int main() {
  int i, j;
  bool failed = false;

  for (i=0; i<ARRAY_SIZE; i++) {
    // Alloc ptr array elements; assume memory is available
    p[i] = (int *)malloc(sizeof(int)*DATA_ELEMS);
    p[i][0:DATA_ELEMS] = i;
  }
 
  start[:] = 1;
  len[:] = 98;

  // Start is a section and length is also a section
  // Default values of free_if and align
  // alloc_if uses a vector and free_if a scalar that is expanded
  // Allocate only
  allocif[:] = 1;
  #pragma offload_transfer target(mic) \
  nocopy( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  alloc_if(allocif[0:ARRAY_SIZE]) \
  free_if(0) )

  // Do the offload reusing memory
  #pragma offload target(mic) \
  inout( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  alloc_if(0) \
  free_if(0) ) { 
    for (i=0; i<ARRAY_SIZE; i++) { p[i][start[i]:len[i]] += 1; }
  }

  // Free the memory
  // alloc_if uses a scalar, free_if a vector
  freeif[:] = 1;
  #pragma offload_transfer target(mic) \
  nocopy( p[0:ARRAY_SIZE] : \
  extent(start[0:ARRAY_SIZE]:len[0:ARRAY_SIZE]) \
  alloc_if(0) \
  free_if(freeif[0:ARRAY_SIZE]) )

 ...
 return 0;
}

Example of pointer array with into, into_extent and alloc_extent

#define ARRAY_SIZE 4
#define DATA_ELEMS 100
#define SEND_ROWS 2
#define SEND_COLS 50
#define P_OFFSET_IN 2
#define P_OFFSET_OUT 0
#define Q_OFFSET 0
#define COL_OFFSET 50

__declspec(target(mic)) int len[ARRAY_SIZE];
__declspec(target(mic)) short int *p[ARRAY_SIZE];
__declspec(target(mic)) short int *q[ARRAY_SIZE*2];

#ifdef RUN_ON_CPU
bool run_on_cpu = true;
#else
__declspec(target(mic)) bool run_on_cpu = false;
#endif

int main() {
  int i, j;
  bool failed = false;

  for (i=0; i<ARRAY_SIZE; i++) {
    // Alloc ptr array elements; assume memory is available
    p[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS);
    q[i] = (short int *)malloc(sizeof(short int)*DATA_ELEMS);
    q[i+ARRAY_SIZE] = (short int *)malloc(sizeof(short int)*DATA_ELEMS);
    p[i][0:DATA_ELEMS] = i;
  }
 
  len[:] = SEND_COLS;

  // Scalars used for extent start and extent length
  // Default values of alloc_if, free_if and align
  // Data sent to MIC and fetched from MIC
  // p[2][50:50] -> q[0][50:50] allocate only those 50 elements
  // p[3][50:50] -> q[1][50:50] allocate only those 50 elements
  //  compute
  // p[0][50:50] <- q[0][50:50]
  // p[1][50:50] <- q[1][50:50]

  #pragma offload target(mic) \
  in (p[P_OFFSET_IN:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \
  into(q[Q_OFFSET:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) \
  alloc_extent(COL_OFFSET:len[0:SEND_ROWS]) ) \
  out(q[Q_OFFSET:SEND_ROWS] : extent(COL_OFFSET:SEND_COLS) \
  into(p[P_OFFSET_OUT:SEND_ROWS]) into_extent(COL_OFFSET:SEND_COLS) ) {
    for (i=0; i<SEND_ROWS; i++) {
      // If running on CPU, mimic the "in into"
      if (run_on_cpu) {
        q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] = p[P_OFFSET_IN+i][COL_OFFSET:SEND_COLS];
      }

    q[Q_OFFSET+i][COL_OFFSET:SEND_COLS] += i*2;
   
    // If running on CPU, mimic the "out into"
    if (run_on_cpu) {
      p[P_OFFSET_OUT+i][COL_OFFSET:SEND_COLS] = q[Q_OFFSET+i][COL_OFFSET:SEND_COLS];
    }
  }
}
 ...
 return 0;
}

See Also