Intel® Fortran Compiler 16.0 User and Reference Guide

OFFLOAD_TRANSFER

OFFLOAD Compiler Directive: Initiates asynchronous data transfer, or initiates and completes synchronous data transfer. The action performed depends on the whether SIGNAL is specified. This directive only applies to Intel® MIC Architecture.

!DIR$ OFFLOAD_TRANSFER clause[[,] clause...]

clause

Can be any of the following:

  • TARGET (target-name [:target-number])

  • IF (if-specifier)

    An optional clause. Include it to allow a test at execution time for whether or not the executable should try to offload the statement.

    Use the IF clause to control whether the offload is enabled. All OFFLOAD directives that have data dependencies should use the IF clause in a coordinated fashion, so that either all or none of the related offloads are enabled.

  • SIGNAL (tag)

    An optional clause. Include it to enable asynchronous data transfer, meaning that data transfers are initiated, and the CPU can continue executing after initiating the OFFLOAD_TRANSFER. When you don't include it, data transfer is synchronous.

    SIGNAL is device specific; if you use this clause, you must specify a target-number >=0 in the TARGET clause.

  • WAIT (tag [, tag, ...])

    An optional clause. Include it to specify a wait for completion of a previously initiated asynchronous data transfer or asynchronous computation.

    WAIT is device specific; if you use this clause, you must specify a target-number >=0 in the TARGET clause.

  • MANDATORY

    An optional clause. Include it to specify that execution on the coprocessor is required. Execution on the CPU is not allowed. If the correct target hardware needed to run the offloaded code is not available on the system, the program fails with an error message.

    You cannot specify both MANDATORY and OPTIONAL since they are opposites.

    If OPTIONAL is not specified for this directive and it is not specified in option [q or Q]offload, then MANDATORY is implied. You can explicitly specify MANDATORY to reinforce this implied default.

  • OPTIONAL

    An optional clause. Include it to specify that execution on the coprocessor is requested but not required. Execution on the CPU is allowed. If the correct target hardware needed to run the offloaded code is not available on the system, the program is executed on the CPU, not the TARGET.

    You cannot specify both MANDATORY and OPTIONAL since they are opposites.

    If OPTIONAL is not specified for this directive and it is not specified in option [q or Q]offload, then MANDATORY is implied. You can explicitly specify MANDATORY to reinforce this implied default.

  • STATUS (var)

    An optional clause. Include it to determine the status of the execution of an offloading construct. When used with the OPTIONAL or MANDATORY clause, the STATUS clause also lets you modify the action to take upon failure.

    var is of derived type offload_status defined in mic_lib.mod. This status variable can be queried for details about the offload. The module mic_lib.mod is provided and contains the following definitions:

    use, intrinsic :: iso_c_binding
     
    enum , bind (C)
     enumerator :: OFFLOAD_SUCCESS         = 0
     enumerator :: OFFLOAD_DISABLED        = 1  ! offload is disabled
     enumerator :: OFFLOAD_UNAVAILABLE     = 2  ! card is not available
     enumerator :: OFFLOAD_OUT_OF_MEMORY   = 3  ! not enough memory on device
     enumerator :: OFFLOAD_PROCESS_DIED    = 4  ! target process has died
     enumerator :: OFFLOAD_ERROR           = 5  ! unspecified error
    end enum
     
    type, bind (C) :: offload_status
     integer(kind=c_int) ::  result        = OFFLOAD_DISABLED   ! result, see enum above
     integer(kind=c_int) ::  device_number = -1  ! device number
     integer(kind=c_int) ::  data_sent     =  0  ! number of bytes sent to the target
     integer(kind=c_int) ::  data_received =  0  ! number of bytes received by host
    end type offload_status

    When you specify STATUS with the MANDATORY clause, and the offload cannot be performed, the construct is not automatically executed on the host. The program continues execution. You should examine var for the cause of the offload failure and take appropriate action. Note that if no STATUS clause is specified, then the program is terminated.

    When you specify STATUS with the OPTIONAL clause, and the offload cannot be performed, the construct is executed on the host processor. You can examine the var in the STATUS clause to determine the cause of the offload failure.

  • offload-parameter [[,] offload-parameter]

    One or more data movement clauses (see below).

The following arguments are used in the above clause items:

target-name

Is an identifier that represents the target. The only allowable target name is MIC.

target-number

(Required for SIGNAL and WAIT) Is an integer expression whose value is interpreted as shown in the following table.

When target-number is specified, the implicit MANDATORY offload is overridden and execution on the CPU is allowed when either the OPTIONAL clause is also specified or optional is also specified in option [q or Q]offload.

-1

This value specifies execution on the coprocessor. The runtime system chooses the specific coprocessor. Execution on the CPU is not allowed.

If the correct target hardware needed to run the offloaded code is not available on the system, the program fails with an error message. Execution on the CPU is allowed when the OPTIONAL clause is also specified or optional is also specified in option [q or Q]offload.

This value is not allowed if you specify the SIGNAL or WAIT clause.

>= 0

A value greater than or equal to zero specifies execution on a specific coprocessor. The number of the specific coprocessor is determined as follows:

coprocessor = MOD (target-number, number_of_coprocs)

If the correct target hardware needed to run the offloaded code is not available on the system, the program fails with an error message. Execution on the CPU is allowed when the OPTIONAL clause is also specified or optional is also specified in option [q or Q]offload.

< -1

These values are reserved.

If you don't specify the target-number argument, the runtime system executes the code on the coprocessor, and if multiple coprocessors are available, on which coprocessor. If no coprocessor is available, the program fails with an error message.

For example, in a system with 4 coprocessors:

  • Specifying 2 or 6 tells the runtime systems to use coprocessor 2 for the transfer, because both MOD(2,4) and MOD(6,4) equal 2.

  • Specifying 1000 tells the runtime systems to use coprocessor 0 for the transfer, because MOD(1000,4) = 0.

if-specifier

Is a Boolean expression.

If the expression evaluates to true, then the data transfer specified by the directive occurs. If the specified target coprocessor is absent from the system or not available at that time because it is fully loaded, then no action is taken.

If the expression evaluates to false, then no action is taken and none of the other offload clauses have any effect.

tag

Is a scalar integer expression. Its value is used to coordinate an asynchronous computation or an asynchronous data transfer.

When used with SIGNAL, tag is an integer value associated with an asynchronous computation or an asynchronous data transfer. tag can be used in subsequent WAIT clauses in other OFFLOAD, OFFLOAD_TRANSFER, or OFFLOAD_WAIT directives.

When used with WAIT, tag is an integer value associated with a previously initiated asynchronous computation or asynchronous data transfer. Use the same tag that you specified in the SIGNAL clause that started the asynchronous computation or data transfer with the OFFLOAD or OFFLOAD_TRANSFER directive.

offload-parameter

Can be any of the following data movement clauses:

  • IN ( identifier[, identifier ] [: modifier[[,] modifier ] ] )

  • OUT ( identifier[, identifier ] [: modifier[,] modifier ] ] )

  • NOCOPY ( identifier[, identifier ] [: modifier[,] modifier ] ] )

When a program runs in a heterogeneous environment, program variables are copied back and forth between the CPU and the target. The offload-parameter is a specification for controlling the direction in which variables are copied, and for pointers, the amount of data that is copied.

IN

This indicates that the variable is strictly an input to the target region and it is copied from the CPU to the coprocessor. Its value is not copied back from the coprocessor to the CPU after the region completes.

OUT

This indicates that the variable is strictly an output of the target region. The host CPU does not copy the variable to the target coprocessor. It is copied from the coprocessor to the CPU.

NOCOPY

This indicates that the variable should not be copied. A variable whose value is reused from the last target execution or one that is used entirely within the offloaded code section may be named in a NOCOPY clause to avoid any copying.

Memory is allocated on the coprocessor for the variable.

This clause is required. It must not contain both an IN clause and an OUT clause.

An IN or OUT element-count-expr expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.

An array variable whose size is known from its declaration is copied in its entirety. If a subset of an array is to be processed, use the name of the starting element of the subset and the element-count-expr to transfer the array subset.

identifier

Is a variable, a subscripted variable, an array slice, or a component reference. The variable or the component reference may have the ALLOCATABLE or POINTER attribute. An array slice may be contiguous or non-contiguous.

modifier

Is one of the following:

  • LENGTH ( element-count-expr )

    where element-count-expr is an integer expression, computed at runtime. Use it with:

    • Integer pointer variables (not Fortran 90 POINTERs)

      Pointer variable values themselves are never copied across the host/target interface because there is no correspondence between the memory addresses of the host CPU and the target. Instead, objects that an integer pointer points to are copied to or from the target, and the value of the pointer variable is recreated. By default, a single element is copied.

      Use element-count-expr to specify how many elements pointed to should be considered as data. If the expression value is zero or negative, the program fails with an error message.

    • Arrays (including assumed-size)

      element-count-expr specifies a number of elements copied between the CPU and target.

      A Fortran array variable can be one of four major types: explicit-shape, assumed-shape, deferred-shape and assumed-size. The runtime descriptor for the first three types makes that array variable's size known at compile time or at runtime. The last dimension of an assumed-size array is the length of its variable.

      By default, the compiler copies explicit-shape, assumed-shape and deferred-shape arrays in their entirety. They do not need an element-count-expr. However, you can use an optional element-count-expr to specify the total number of elements to copy, which limits the number of elements copied back and forth in the last dimension of the array.

      You must specify an assumed-size array with an element-count-expr specification, because the compiler does not know the total size of the array. The value of the element-count-expr is the total number of elements of the array to be copied.

  • ALLOC_IF ( condition ) | FREE_IF (condition )

    where condition is a Boolean expression.

    The ALLOC_IF modifier specifies a Boolean condition that controls whether the allocatable variables in the IN clause will be allocated a new block of memory on the target when the offload is executed on the target. If the expression evaluates to true, a new memory allocation is performed for each variable listed in the clause. If the condition evaluates to false, the existing allocated values on the target are reused. You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a FREE_IF(.FALSE.) clause on an earlier offload.

    The FREE_IF modifier specifies a Boolean condition that controls whether to deallocate the memory allocated for the allocatable variables in an IN clause. If the expression evaluates to true, the memory pointed to by each variable listed in the clause is deallocated. If the condition evaluates to false, no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory (data persistence).

    The following are the default settings for ALLOC_IF and FREE_IF:

    ALLOC_IF

    FREE_IF

    IN

    True

    True

    INOUT

    True

    True

    OUT

    True

    True

    NOCOPY

    False

    False

    For more information, see Managing Memory Allocation for Pointer Variables.

  • ALIGN (expression)

    where the value of expression should be a power of two.

    This modifier applies to pointer variables and requests the specified minimum alignment for pointer data allocated on the target.

  • ALLOC (array-subscript-list)

    where array-subscript-list is a list of array section triplets that specifies a set of elements of an array that need allocation. The array-subscript-list takes the following form:

    start-element : end-element [ : stride] [ , start-element : end-element [ : stride] ]…

    This modifier can only be used in IN and OUT clauses and only one identifier must be listed in the clause. A one to one correspondence is established between the identifier in the IN or OUT clause and the base variable that will be allocated with array-subscript-list dimensions.

    When ALLOC is specified, the allocation on the target is the same shape as array-subscript-list. The variable being transferred or allocated must be the same variable used in the ALLOC modifier (identifier or into-identifier). Only unit strides are allowed in array-subscript-list. When array-subscript-list has rank greater than one, the second and subsequent subscript triplet must specify all elements at that dimension. Therefore, array-subscript-list must describe an array that is simply contiguous. (See CONTIGUOUS.)

    Data is transferred into that portion of the array specified by the IN or OUT clauses. Therefore, memory allocation and the data transfer can use separate array slice references.

    When the lower bound of the first dimension of array-subscript-list in the ALLOC is greater than the lower bound of the first dimension of identifier, then the memory allocation begins with that element. The memory below the lower bound is unallocated and should not be referenced by the program. This allows a smaller section of the array to be transferred to the target without requiring that the entire array be allocated.

    For more information, see Allocating Memory for Parts of Arrays.

  • INTO (into-identitier)

    where into-identifier is a variable, a subscripted variable, an array slice, or a component reference with the same form, rank, dimensions, and kind type parameters as the identifier in the clause.

    This modifier can only be used in IN and OUT clauses and only one identifier must be listed in the clause. When INTO is specified, data can be transferred from one variable on the CPU to another on the target, and vice versa. This establishes a one to one correspondence between a single source variable and a single destination variable.

    You can specify ALLOC, ALLOC_IF, and FREE_IF modifiers along with the INTO modifier.

    When INTO is used in an IN clause, data is copied from the CPU object identifier to the target object into-identifier. If the ALLOC_IF, FREE_IF, or ALLOC modifier is specified, it applies only to the into-identifier in the INTO clause.

    When INTO is used in an OUT clause, data is copied from the target object into-identifier to the CPU object identifier. If the ALLOC_IF, FREE_IF, or ALLOC modifier is specified, it applies only to the identifier in the OUT expression.

    When this modifier is used, the source expression generates a stream of elements to be copied into the memory range specified by the INTO expression.

    If overlap occurs between the source and destination variables, it causes undefined behavior (although with disjoint memories it will work as expected). No ordering can be assumed between transfers from different IN or OUT clauses.

    For more information, see Moving Data from One Variable to Another.

This directive initiates asynchronous data transfer if SIGNAL is specified. If SIGNAL is not specified, it initiates and completes synchronous data transfer.

You can choose whether to offload a statement based on runtime conditions, such as the size of a data set. The IF (if-specifier) clause lets you specify the condition.

Use SIGNAL (tag) to start the asynchronous computation.

The SIGNAL and WAIT clauses refer to a specific target device, so you must specify target-number in the TARGET clause. If you query a signal before the signal has been initiated, it results in undefined behavior and a runtime abort of the application. For example, if you query a signal (SIG1) on target device 0 that was initiated for target device 1, it results in a runtime abort of the application. This is because the signal (SIG1) was initiated for target device 1, so there is no signal (SIG1) associated with target device 0.

If the if-specifier evaluates to false and a SIGNAL (tag) clause is used in the directive, then the SIGNAL is undefined and any WAIT on this SIGNAL has undefined behavior.

When you specify the STATUS clause, it affects the behavior of optional and mandatory offloads differently when the offload request is not successful:

See Example 2 in the OFFLOAD Examples section for an example showing how to use offload_status to identify the target-number value when using -1, or when target-number is not specified.

For both optional and mandatory offloads, when offload is successful, the status variable has the value OFFLOAD_SUCCESS.

In the data movement clauses (IN, OUT, INOUT, and NOCOPY) and the modifiers ALLOC and INTO, you can specify an array slice of any rank. For an assumed-size dummy array, you can specify the following syntax, interchangeably:

Example

See the examples in OFFLOAD.

See Also