Intel® Fortran Compiler 16.0 User and Reference Guide

OFFLOAD

OFFLOAD Compiler Directive: Enables statements to execute on the target. This directive only applies to Intel® MIC Architecture.

!DIR$ [OMP] OFFLOAD clause[[,] clause...]

clause

Can be any of the following:

  • TARGET (target-name [:target-number])

  • IF (if-specifier)

    An optional clause. Include it to allow a test at execution time for whether or not the executable should try to offload the statement.

    Use the IF clause to control whether the offload is enabled. All OFFLOAD directives that have data dependencies should use the IF clause in a coordinated fashion, so that either all or none of the related offloads are enabled.

  • SIGNAL (tag)

    An optional clause. Include it to enable the CPU to continue execution after initiating the OFFLOAD.

    If this clause is used, the computation performed by the offload and any data transfers occur concurrently with the CPU execution of the code after the directive.

    If this clause is not used, then the entire offload and associated data transfers are executed synchronously: the CPU will not continue execution past the directive until the computation performed by the offload has completed.

    SIGNAL is device specific; if you use this clause, you must specify a target-number >=0 in the TARGET clause.

  • WAIT (tag [, tag, ...])

    An optional clause. Include it to specify a wait for completion of a previously initiated asynchronous data transfer or asynchronous computation.

    WAIT is device specific; if you use this clause, you must specify a target-number >=0 in the TARGET clause.

  • MANDATORY

    An optional clause. Include it to specify that execution on the coprocessor is required. Execution on the CPU is not allowed. If the correct target hardware needed to run the offloaded code is not available on the system, the program fails with an error message.

    You cannot specify both MANDATORY and OPTIONAL since they are opposites.

    If OPTIONAL is not specified for this directive and it is not specified in option [q or Q]offload, then MANDATORY is implied. You can explicitly specify MANDATORY to reinforce this implied default.

  • OPTIONAL

    An optional clause. Include it to specify that execution on the coprocessor is requested but not required. Execution on the CPU is allowed. If the correct target hardware needed to run the offloaded code is not available on the system, the program is executed on the CPU, not the TARGET.

    You cannot specify both MANDATORY and OPTIONAL since they are opposites.

    If OPTIONAL is not specified for this directive and it is not specified in option [q or Q]offload, then MANDATORY is implied. You can explicitly specify MANDATORY to reinforce this implied default.

  • STATUS (var)

    An optional clause. Include it to determine the status of the execution of an offloading construct. When used with the OPTIONAL or MANDATORY clause, the STATUS clause also lets you modify the action to take upon failure.

    var is of derived type offload_status defined in mic_lib.mod. This status variable can be queried for details about the offload. The module mic_lib.mod is provided and contains the following definitions:

    use, intrinsic :: iso_c_binding
     
    enum , bind (C)
     enumerator :: OFFLOAD_SUCCESS         = 0
     enumerator :: OFFLOAD_DISABLED        = 1  ! offload is disabled
     enumerator :: OFFLOAD_UNAVAILABLE     = 2  ! card is not available
     enumerator :: OFFLOAD_OUT_OF_MEMORY   = 3  ! not enough memory on device
     enumerator :: OFFLOAD_PROCESS_DIED    = 4  ! target process has died
     enumerator :: OFFLOAD_ERROR           = 5  ! unspecified error
    end enum
     
    type, bind (C) :: offload_status
     integer(kind=c_int) ::  result        = OFFLOAD_DISABLED   ! result, see enum above
     integer(kind=c_int) ::  device_number = -1  ! device number
     integer(kind=c_int) ::  data_sent     =  0  ! number of bytes sent to the target
     integer(kind=c_int) ::  data_received =  0  ! number of bytes received by host
    end type offload_status

    When you specify STATUS with the MANDATORY clause, and the offload cannot be performed, the construct is not automatically executed on the host. The program continues execution. You should examine var for the cause of the offload failure and take appropriate action. Note that if no STATUS clause is specified, then the program is terminated.

    When you specify STATUS with the OPTIONAL clause, and the offload cannot be performed, the construct is executed on the host processor. You can examine the var in the STATUS clause to determine the cause of the offload failure.

  • offload-parameter [[,] offload-parameter]

    One or more data movement clauses (see below).

The following arguments are used in the above clause items:

target-name

Is an identifier that represents the target. The only allowable target name is MIC.

target-number

(Required for SIGNAL and WAIT) Is an integer expression whose value is interpreted as shown in the following table.

When target-number is specified, the implicit MANDATORY offload is overridden and execution on the CPU is allowed when either the OPTIONAL clause is also specified or optional is also specified in the [Q]offload option.

-1

This value specifies execution on the coprocessor. The runtime system chooses the specific coprocessor. Execution on the CPU is not allowed.

If the correct target hardware needed to run the offloaded code is not available on the system, the program fails with an error message. Execution on the CPU is allowed when the OPTIONAL clause is also specified or optional is also specified in option [q or Q]offload.

This value is not allowed if you specify the SIGNAL or WAIT clause.

>= 0

A value greater than or equal to zero specifies execution on a specific coprocessor. The number of the specific coprocessor is determined as follows:

coprocessor = MOD (target-number, number_of_coprocs)

If the correct target hardware needed to run the offloaded code is not available on the system, the program fails with an error message. Execution on the CPU is allowed when the OPTIONAL clause is also specified or optional is also specified in option [q or Q]offload.

< -1

These values are reserved.

If you don't specify the target-number argument, the runtime system executes the code on the coprocessor, and if multiple coprocessors are available, on which coprocessor. If no coprocessor is available, the program fails with an error message.

For example, in a system with 4 coprocessors:

  • Specifying 2 or 6 tells the runtime systems to use coprocessor 2 for the transfer, because both MOD(2,4) and MOD(6,4) equal 2.

  • Specifying 1000 tells the runtime systems to use coprocessor 0 for the transfer, because MOD(1000,4) = 0.

Note

Leaving data values on the coprocessor from one execution of offloaded code to another is called "data persistence". In a system with multiple coprocessors, you need to specify a target-number to reliably use data persistence. When you use ALLOC_IF or FREE_IF to implement data persistence on the coprocessor, but do not specify a target-number, the runtime system randomly chooses a coprocessor, so the chosen coprocessor could be one on which the data is not available.

if-specifier

Is a Boolean expression.

If the expression evaluates to true, then the program attempts to offload the statement. If the specified target coprocessor is absent from the system or not available at that time because it is fully loaded, then the offloaded code executes on the CPU.

If the expression evaluates to false, then the offloaded code executes on the CPU and none of the other OFFLOAD clauses have any effect.

tag

Is a scalar integer expression. Its value is used to coordinate an asynchronous computation or an asynchronous data transfer.

When used with SIGNAL, tag is an integer value associated with an asynchronous computation or an asynchronous data transfer. tag can be used in subsequent WAIT clauses in other OFFLOAD, OFFLOAD_TRANSFER, or OFFLOAD_WAIT directives.

When used with WAIT, tag is an integer value associated with a previously initiated asynchronous computation or asynchronous data transfer. Use the same tag that you specified in the SIGNAL clause that started the asynchronous computation or data transfer with the OFFLOAD or OFFLOAD_TRANSFER directive.

offload-parameter

Can be any of the following data movement clauses:

  • IN ( identifier[, identifier ] [: modifier[[,] modifier ] ] )

  • OUT ( identifier[, identifier ] [: modifier[[,] modifier ] ] )

  • INOUT ( identifier[, identifier ] [: modifier[[,] modifier ] ] )

  • NOCOPY ( identifier[, identifier ] [: modifier[[,] modifier ] ] )

When a program runs in a heterogeneous environment, program variables are copied back and forth between the CPU and the target. The offload-parameter is a specification for controlling the direction in which variables are copied, and for pointers, the amount of data that is copied.

The data selected for transfer is a combination of variables implicitly transferred because they are lexically referenced within offload constructs, and variables explicitly listed in an offload-parameter.

IN

This indicates that the variable is strictly an input to the target region and it is copied from the CPU to the coprocessor. Its value is not copied back from the coprocessor to the CPU after the region completes.

OUT

This indicates that the variable is strictly an output of the target region. The host CPU does not copy the variable to the target coprocessor. It is copied from the coprocessor to the CPU.

INOUT

This indicates that the variable is both copied from the CPU to the target coprocessor and back from the coprocessor to the CPU.

NOCOPY

This indicates that the variable should not be copied.

A variable whose value is reused from the last target execution or a variable that is used entirely within the offloaded code section can be named in a NOCOPY clause to avoid any copying.

If the NOCOPY clause is being used to retain values across offload instances (data persistence), then currently values of non-scalar variables with automatic storage allocation (stack variables) are not preserved across offload instances. However, all statically allocated variables are preserved, and non-scalar variables with automatic allocation are also preserved across offload instances.

If the NOCOPY clause is being used merely to suppress data transfer between the CPU and the target, then variables of all memory allocation types are supported.

An IN or OUT element-count-expr expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.

An array variable whose size is known from its declaration is copied in its entirety. If a subset of an array is to be processed, use the name of the starting element of the subset and the element-count-expr to transfer the array subset.

Because a data pointer variable not listed in an IN clause is uninitialized within the offload region, it must be assigned a value on the target before it can be referenced.

identifier

Is a variable, a subscripted variable, an array slice, or a component reference. The variable or the component reference may have the ALLOCATABLE or POINTER attribute. An array slice may be contiguous or non-contiguous.

modifier

Is one of the following:

  • LENGTH ( element-count-expr )

    where element-count-expr is an integer expression, computed at runtime. Use it with:

    • Integer pointer variables (not Fortran 90 POINTERs)

      Pointer variable values themselves are never copied across the host/target interface because there is no correspondence between the memory addresses of the host CPU and the target. Instead, objects that an integer pointer points to are copied to or from the target, and the value of the pointer variable is recreated. By default, a single element is copied.

      Use element-count-expr to specify how many elements pointed to should be considered as data. If the expression value is zero or negative, the program fails with an error message.

    • Arrays (including assumed-size)

      element-count-expr specifies a number of elements copied between the CPU and target.

      A Fortran array variable can be one of four major types: explicit-shape, assumed-shape, deferred-shape and assumed-size. The runtime descriptor for the first three types makes that array variable's size known at compile time or at runtime. The last dimension of an assumed-size array is the length of its variable.

      By default, the compiler copies explicit-shape, assumed-shape and deferred-shape arrays in their entirety. They do not need an element-count-expr. However, you can use an optional element-count-expr to specify the total number of elements to copy, which limits the number of elements copied back and forth in the last dimension of the array.

      You must specify an assumed-size array with an element-count-expr specification, because the compiler does not know the total size of the array. The value of the element-count-expr is the total number of elements of the array to be copied.

  • ALLOC_IF ( condition ) | FREE_IF (condition )

    where condition is a Boolean expression.

    The ALLOC_IF modifier specifies a Boolean condition that controls whether the allocatable variables in the IN clause will be allocated a new block of memory on the target when the offload is executed on the target. If the expression evaluates to true, a new memory allocation is performed for each variable listed in the clause. If the condition evaluates to false, the existing allocated values on the target are reused (data persistence). You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a FREE_IF(.FALSE.) clause on an earlier offload.

    The FREE_IF modifier specifies a Boolean condition that controls whether to deallocate the memory allocated for the allocatable variables in an IN clause. If the expression evaluates to true, the memory pointed to by each variable listed in the clause is deallocated. If the condition evaluates to false, no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory (data persistence).

    The following are the default settings for ALLOC_IF and FREE_IF:

    ALLOC_IF

    FREE_IF

    IN

    True

    True

    INOUT

    True

    True

    OUT

    True

    True

    NOCOPY

    False

    False

    For more information, see Managing Memory Allocation for Pointer Variables.

  • ALIGN (expression)

    where the value of expression should be a power of two.

    This modifier applies to pointer variables and requests the specified minimum alignment for pointer data allocated on the target.

  • ALLOC (array-subscript-list)

    where array-subscript-list is a list of array section triplets that specifies a set of elements of an array that need allocation. The array-subscript-list takes the following form:

    start-element : end-element [ : stride] [ , start-element : end-element [ : stride] ]…

    This modifier can only be used in IN and OUT clauses and only one identifier must be listed in the clause. A one to one correspondence is established between the identifier in the IN or OUT clause and the base variable that will be allocated with array-subscript-list dimensions.

    When ALLOC is specified, the allocation on the target is the same shape as array-subscript-list. The variable being transferred or allocated must be the same variable used in the ALLOC modifier (identifier or into-identifier). Only unit strides are allowed in array-subscript-list. When array-subscript-list has rank greater than one, the second and subsequent subscript triplet must specify all elements at that dimension. Therefore, array-subscript-list must describe an array that is simply contiguous. (See CONTIGUOUS.)

    Data is transferred into that portion of the array specified by the IN or OUT clauses. Therefore, memory allocation and the data transfer can use separate array slice references.

    When the lower bound of the first dimension of array-subscript-list in the ALLOC is greater than the lower bound of the first dimension of identifier, then the memory allocation begins with that element. The memory below the lower bound is unallocated and should not be referenced by the program. This allows a smaller section of the array to be transferred to the target without requiring that the entire array be allocated.

    For more information, see Allocating Memory for Parts of Arrays.

  • INTO (into-identifier)

    where into-identifier is a variable, a subscripted variable, an array slice, or a component reference with the same form, rank, dimensions, and kind type parameters as the identifier in the clause.

    This modifier can only be used in IN and OUT clauses and only one identifier must be listed in the clause. When INTO is specified, data can be transferred from one variable on the CPU to another on the target, and vice versa. This establishes a one to one correspondence between a single source variable and a single destination variable.

    You can specify ALLOC, ALLOC_IF, and FREE_IF modifiers along with the INTO modifier.

    When INTO is used in an IN clause, data is copied from the CPU object identifier to the target object into-identifier. If the ALLOC_IF, FREE_IF, or ALLOC modifier is specified, it applies only to the into-identifier in the INTO clause.

    When INTO is used in an OUT clause, data is copied from the target object into-identifier to the CPU object identifier. If the ALLOC_IF, FREE_IF, or ALLOC modifier is specified, it applies only to the identifier in the OUT expression.

    When this modifier is used, the source expression generates a stream of elements to be copied into the memory range specified by the INTO expression.

    If overlap occurs between the source and destination variables, it causes undefined behavior (although with disjoint memories it will work as expected). No ordering can be assumed between transfers from different IN or OUT clauses.

    For more information, see Moving Data from One Variable to Another.

The OFFLOAD directive both transfers data and offloads computation.

The OMP is optional in the syntax. When it is present, the next line, other than a comment, must be an OpenMP* PARALLEL, PARALLEL SECTIONS, or PARALLEL DO directive. Otherwise the compiler issues an error.

When OMP is not present in the syntax, the OFFLOAD directive must be followed by one of the following or the compiler issues an error:

You can choose whether to offload a statement based on runtime conditions, such as the size of a data set. The IF (if-specifier) clause lets you specify the condition.

The SIGNAL and WAIT clauses refer to a specific target device, so you must specify target-number in the TARGET clause. If you query a signal before the signal has been initiated, it results in undefined behavior and a runtime abort of the application. For example, if you query a signal (SIG1) on target device 0 that was initiated for target device 1, it results in a runtime abort of the application. This is because the signal (SIG1) was initiated for target device 1, so there is no signal (SIG1) associated with target device 0.

If the if-specifier evaluates to false and a SIGNAL (tag) clause is used in the directive, then the SIGNAL is undefined and any WAIT on this SIGNAL has undefined behavior.

When you specify the STATUS clause, it affects the behavior of optional and mandatory offloads differently when the offload request is not successful:

For both optional and mandatory offloads, when offload is successful, the status variable has the value OFFLOAD_SUCCESS.

In the data movement clauses (IN, OUT, INOUT, and NOCOPY) and the modifiers ALLOC and INTO, you can specify an array slice of any rank. For an assumed-size dummy array, you can specify the following syntax, interchangeably:

Note

Do not use the __MIC__ preprocessor symbol inside a statement following an OMP OFFLOAD directive. However, you can use it in a subprogram called from the directive.

Conceptually, this is the sequence of events when a statement marked for offload is encountered:

  1. If there is no IF clause, go to step 3.

  2. On the host, evaluate the IF expression. If it evaluates to true, go to step 3. Otherwise, execute the region on the host and go to step 17.

  3. Attempt to acquire the target. If successful, go to step 4. Otherwise:

    • If there is no MANDATORY clause, execute the region on the host and go to step 17.

    • If there is a STATUS clause, set the var to indicate the error and go to step 17.

    • Otherwise, terminate the program with an appropriate error message.

  4. On the host, compute all ALLOC_IF, FREE_IF, and element-count-expr expressions used in IN and OUT clauses.

  5. On the host, gather all variable values that are inputs to the offload.

  6. Send the input values from the host to the target.

  7. On the target, allocate memory for variable-length OUT variables.

  8. On the target, copy input values into corresponding target variables.

  9. On the target, execute the offloaded region.

  10. On the target, compute all element-count-expr expressions used in OUT clauses.

  11. On the target, gather all variable values that are outputs of the offload.

  12. Send output values back from the target to the host.

  13. On the host, copy values received into corresponding host variables.

  14. If no error occurred on the target, go to step 17.

  15. If there is a STATUS clause, set the var to indicate the error and go to step 17.

  16. Otherwise, terminate the program with an appropriate error message.

  17. Continue processing the program on the host.

Example

The following example demonstrates offloading a CALL statement or assignment statement. Note that !DIR$ OFFLOAD TARGET (MIC) prefixes the statement designated for offload.

! Offload call of routine calc
!DIR$ OFFLOAD TARGET(MIC)
CALL calc(...)
 
! Offload call of function recalc
!DIR$ OFFLOAD TARGET(MIC)
X = recalc(...)

The following example demonstrates using the OFFLOAD directive in conjunction with the OpenMP* PARALLEL directive to specify remote execution of the OpenMP construct.

! Offload OpenMP parallel construct
!DIR$ OMP OFFLOAD TARGET(MIC)
!$omp parallel
...
!$omp end parallel

The following example demonstrates how to use a variable-length array to specify a number of elements copied between the CPU and target.

subroutine sample (Z,N,M)
integer, intent(in)  :: N,M
real, dimension (N,*) :: Z
  ...
  !dir$ omp offload target(mic) in (Z:length(N*M))
  ...
end subroutine sample

The following example shows various forms of identifier and use of the ALLOC and INTO modifiers in IN clauses:

subroutine foo
real a(1000,500), b(1000,500), c(2000, 20)
real, pointer :: p(:)
p => c(1:20:2)
!dir$ offload target(mic) in( a : into (b) )
  ...
!dir$ offload target(mic) in( c(i:j:k,l:m:n) )   ! k and n must be strides of 1  
  ...
!dir$ offload target(mic) in( p(1:20) : alloc (p(1:100)) )
  ...
end

The following example demonstrates using the OFFLOAD directive, as well as directives OFFLOAD_TRANSFER and OFFLOAD_WAIT.

! Sample use of OFFLOAD, OFFLOAD_TRANSFER, and OFFLOAD_WAIT
 
module M
 
  integer, parameter :: iter = 10
  integer, parameter :: count = 25000
 
  !dir$ options /offload_attribute_target=mic
 
  real,    allocatable :: in1(:), in2(:), out1(:), out2(:)
 
  !dir$ end options
 
  integer              :: sin1, sin2, sout1, sout2
 
contains
 
   !dir$ attributes offload:mic ::compute 
   subroutine compute(x, y)
 
   real,    allocatable :: x(:), y(:) 
   integer              :: i 
 
   !dir$ omp parallel do num_threads(96) private(i)
   do i = 1, count
     y(i) = x(i) * x(i)
   end do
 
   end subroutine compute
 
   subroutine do_async_in()
 
   integer              :: i
 
   ! prime loop with initial in1 transfer to target 
 
   !dir$ offload_transfer target(mic:0) signal(sin1)                   &
                          in( in1 : alloc_if(.false.) free_if(.false.) )
 
   do i = 1, iter
      if (mod(i,2) == 0) then
 
         ! initiate another in1 data transfer to target, skip if last iteration 
 
         !dir$ offload_transfer target(mic:0) if(i /= iter) signal(sin1)     &
                                in( in1 : alloc_if(.false.) free_if(.false.) )
 
         ! wait for in2 transfer to complete, then offload computation 
 
         !dir$ offload target(mic:0) wait(sin2)                       &
                       nocopy(in2)                                    &
                       out( out2 : alloc_if(.false.) free_if(.false.) )
         call compute(in2, out2); 
 
         ! use out2 results on host 
         call use_result(out2); 
 
      else
 
         ! initiate another in2 data transfer to target, skip if last iteration 
 
         !dir$ offload_transfer target(mic:0) if(i /= iter) signal(sin2)     &
                                in( in2 : alloc_if(.false.) free_if(.false.) )
 
         ! wait for in1 transfer to complete, then offload computation 
 
         !dir$ offload target(mic:0) wait(sin1)                       &
                       nocopy( in1 )                                  &
                       out( out1 : alloc_if(.false.) free_if(.false.) ) 
         call compute(in1, out1); 
 
         ! use out1 results on host 
         call use_result(out1) 
 
      endif
   enddo
  
   end subroutine do_async_in
 
   subroutine do_async_out()
 
   integer              :: i
 
   do i = 1, (iter + 1)
      if ( mod(i,2) == 0 ) then
         if ( i < (iter + 1)) then
 
            ! offload computation, leave results on target 
 
            !dir$ offload target(mic:0)                                  &
                          in( in2 : alloc_if(.false.) free_if(.false.) ) &
                          nocopy( out2 )
            call compute(in2, out2) 
 
            ! transfer out2 results (asynchronously) back to host 
 
            !dir$ offload_transfer target(mic:0) signal(sout2)           & 
                          out( out2 : alloc_if(.false.) free_if(.false.) ) 
         endif
 
         ! wait for out1 results on host 
 
         !dir$ offload_wait target(mic:0) wait(sout1) 
 
         ! use out1 results on host 
         call use_result(out1); 
 
      else 
 
         if (i < (iter + 1)) then 
 
            ! offload computation, leave results on target 
 
            !dir$ offload target(mic:0)                                  & 
                          in( in1 : alloc_if(.false.) free_if(.false.) ) & 
                          nocopy( out1 ) 
            call compute(in1, out1) 
 
            ! transfer out1 results (asynchronously) back to host 
 
            !dir$ offload_transfer target(mic:0) signal(sout1)          & 
                         out( out1 : alloc_if(.false.) free_if(.false.) ) 
         endif
 
         if (i > 1) then
 
            ! wait for out2 results on host 
 
            !dir$ offload_wait target(mic:0) wait(sout2)
 
            ! use out2 results on host 
            call use_result(out2) 
 
         endif
      endif
   enddo 
 
   end subroutine do_async_out 
 
   subroutine do_sync() 
  
   integer              :: i 
 
   do i = 1, iter 
 
      ! transfer data to host, compute, and return results synchronously  
 
      !dir$ offload target(mic:0)                                  & 
                    in( in1 : alloc_if(.false.) free_if(.false.) ) & 
                    out( out1 : alloc_if(.false.) free_if(.false.) ) 
       call compute(in1, out1) 
 
       ! use out1 results on host 
       call use_result(out1) 
 
   enddo 
  
   end subroutine do_sync 
 
   subroutine use_result(x) 
  
   ! use results from offload computations on host  
  
   real, allocatable :: x(:) 
 
   print*, "USE_RESULT *****************" 
 
   end subroutine use_result 
 
end module M 
 
program main 
  
   use M 
   integer              :: i 
 
   allocate ( in1(count), in2(count), out1(count), out2(count) ) 
 
   !dir$ omp parallel do num_threads(96) private(i)   
   do i = 1, count  
      in1(i) = REAL(i) 
      in2(i) = REAL(i) 
   enddo 
   
   ! Initialize signal variables to unique values
   sin1 = 1
   sin2 = 2
   sout1 = 3
   sout2 = 4
  
   ! allocate memory on target only   
 
   !dir$ offload_transfer target(mic:0)        & 
         nocopy(in1, out1, in2, out2 : alloc_if(.true.) free_if(.false.) ) 
 
   !dir$ omp parallel do num_threads(96) private(i) 
   do i = 1, count 
       out1(i) = 0.  
       out2(i) = 0. 
   enddo  
   
   ! synchronous transfer, compute  
   call do_sync()  
 
   ! asynchronous IN transfer, compute 
   call do_async_in() 
     
   ! compute, asynchronous OUT transfer 
   call do_async_out() 
  
   ! free memory on target only   
  
   !dir$ offload_transfer target(mic:0)        &  
         nocopy(in1, out1, in2, out2 : alloc_if(.false.) free_if(.true.) )     
  
   deallocate( in1, in2, out1, out2 ) 
  
end program main 

The following example demonstrates the tag argument:

module mmod
!dir$ attributes offload : mic :: x, y
real :: x (1000) = 1.1
real :: y(1000) = 2.2
integer :: gtag = 1
end module mmod
 
program mmain
!			compile with symbol name "doGLOB" defined to use a global tag
!			compile with symbol name "doGLOB" undefined to use a local tag
 
use mmod
integer :: ktag = 2
call f (ktag)
call g (ktag)
print *, (y (j),j=1,1000,100)		! print every 100th element == 5.1
end program mmain
 
subroutine f (kktag)
use mmod
integer :: kktag
 
#ifdef doGLOB
  !dir$ offload_transfer target (mic:0) signal (gtag) IN(x)      ! copy X to MIC, signal when done
#else
  !dir$ offload_transfer target (mic:0) signal (kktag) IN(x)      ! copy X to MIC, signal when done
#endif
end subroutine f
 
subroutine g (kktag)
use mmod
integer :: kktag
 
#ifdef doGLOB
  !dir$ offload begin target (mic:0) wait (gtag) OUT (y)   ! wait for offload_transfer in f to copy x
#else
  !dir$ offload begin target (mic:0) wait (kktag) OUT (y)   ! wait for offload_transfer in f to copy x
#endif
 
y = x + 4                   ! y is all 5.1s
 
!dir$ end offload
end subroutine g

The following examples demonstrate using the STATUS clause.

Example 1
 
  use mic_lib
 
  type (offload_status) :: stat
 
  !dir$ offload target (mic:1) status (stat)
  …                   ! code to be offloaded to the coprocessor
  !dir$ end offload
 
  if (stat%result .ne. OFFLOAD_SUCCESS) then
     …                ! handle the error condition
  end if
 
Example 2
 
   use mic_lib
 
   type (offload_status) :: stat
   real :: p (20), q (20)
   integer :: my_mic
 
   !dir$ offload_transfer target (mic), status (stat), in (p, q)
   if (stat%result == OFFLOAD_OUT_OF_MEMORY) then
                      ! memory could not be allocated
     …                ! abandon offload target – use CPU host
   else
                      ! data has been transferred, can continue using coprocessor
     my_mic = stat%device_number
     !dir$ offload target (mic:my_mic), status (stat)
     …                ! do offload computation on device that was obtained
   end if

See Also