Intel® Fortran Compiler 16.0 User and Reference Guide
This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Creating Fortran pointer targets and allocatable variables is referred to here as dynamic memory variables.
Memory management on the CPU for dynamic memory variables used in offloaded programs is the same as non-offload programs. That is, the offload directives do not affect memory allocation and freeing on the CPU. As usual, you, the programmer, must do this.
Memory management on the coprocessor for dynamic memory variables named in in and out clauses of !DIR$ OFFLOAD is done automatically by the compiler and runtime system.
For in variables of a !DIR$ OFFLOAD the default behavior is to do a fresh memory allocation for each dynamic memory variable. On return from the construct the memory is de-allocated. In order to retain data between offloads, you can use the alloc_if and free_if qualifiers to modify the memory allocation defaults on the coprocessor.
The alloc_if qualifier specifies a logical condition that controls whether the dynamic memory variables in the in clause are allocated a fresh block of memory on the target when the construct is executed on the target. If the expression evaluates to .true., a fresh memory allocation is performed for each variable listed in the clause. If the condition evaluates to .false., the existing pointer values on the target are reused. You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a free_if(.false.) clause on an earlier offload.
The free_if qualifier specifies a logical condition which controls whether to free the memory allocated for the dynamic memory variables in an in clause. If the expression evaluates to .true., the memory pointed to by each variable listed in the clause is freed. If the condition evaluates to .false., no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory.
The alloc_if and free_if logical expressions are evaluated on the CPU at the point the construct is offloaded to the target.
By default an out variable is allocated fresh memory on the target at the start of an offload and the memory is freed at the end of the offload. The alloc_if and free_if modifiers change the defaults. The expressions are evaluated on the host and used to control coprocessor memory allocation.
When the output value is received on the host, no memory allocation is done. The variables listed in out clauses must point to allocated memory of sufficient size to receive the results on the host.
As described in the previous section a dynamic memory variable in an in, out, inout, or nocopy clause can retain the target memory allocation when you set the free_if modifier to false. You can reuse that memory in subsequent offloads by using in, out, inout, or nocopy and specifying alloc_if(.false.). When target memory is allocated it is associated with the value of the CPU dynamic memory variable used as the destination in the in, out, inout, or nocopy clause. When target memory is to be reused, it is located using the value of the CPU dynamic memory variable that is the destination of that transfer. The associations between the CPU address used when target memory is allocated and the target memory are automatically maintained by the offload runtime library. The associations are created or dropped along with target memory allocation or deallocation. Create the association at allocation time using alloc_if(.true.) free_if(.false.) and delete the association at de-allocation time using free_if(.true.).
Pointers to static data on the CPU are special-cased. The alloc_if and free_if modifiers are ignored when the following are both true:
The CPU address used during creation of a target memory association points to statically allocated data.
The variable is also available in the target binary because it has ATTRIBUTES OFFLOAD:mic.
The target's statically allocated memory is used as the destination of the transfer. This target memory is not dynamically allocated and never freed.
There is only one block of target memory associated with a CPU address. It is an error to call alloc_if(.true.) to create a second association for a CPU address before freeing the existing one. The new association overwrites the earlier one, which has the potential for causing memory leaks on the target.
It is an error to call free_if(.true.) for transferred dynamic memory if a matching association is not found. The attempted removal of an association is silently ignored. An association can be made with a CPU address, and a certain length, and another association made with a different CPU address within that range. Since origin addresses are different, you can use alloc_if and free_if to create distinct target allocations.
When memory is allocated for a dynamic memory variable on the target, it is aligned at the natural boundary for the declared type. Sometimes it may be necessary to request that the data be aligned on larger boundaries, for example, when the program expects to use assembly code or intrinsic functions or vectorization, that operate on data with more stringent alignment requirements. In these cases, the align modifier may be used to specify an alignment. The operand of the align modifier must be an integral expression which evaluates to a power of two. The expression is evaluated on the host and the region of memory allocated for the dynamic memory on the target is aligned at a boundary that is greater than or equal to the value of the expression. When the output value is received on the host, no memory allocation is done. The variables listed in out clauses must point to allocated memory of sufficient size to receive the results.
For optimal data transfer performance, by default, the target memory address for a transfer through dynamic memory is made to match the offset within 64 bytes of the CPU data. That is, if the CPU source address is 16 bytes past a 64 byte boundary, the target data address will also be 16 bytes past a 64 byte boundary.
The align modifier overrides this default and aligns the target memory at the requested alignment. To get the benefits of fast data transfer and the necessary alignment on the target, ensure that the CPU data is aligned on the same boundary as the alignment needed on the target. Doing so meets the requirements for fast data transfer and the requirements for target data alignment.
The following example illustrates the default behavior, which is no data persistence on the coprocessor.
The compiler allocates and frees data around the offload. No alloc or free modifiers are necessary.
real, dimension(:), pointer :: p !DIR$ ATTRIBUTES OFFLOAD:mic :: p real, dimension(10), target :: targ !DIR$ ATTRIBUTES OFFLOAD:mic :: targ p->targ !DIR$ OFFLOAD TARGET (mic) in (p)
The following examples illustrate keeping data on the coprocessor between offloads.
The following code allocates memory for p as part of this offload, and keeps the memory allocated for p after the offload.
Notice that ALLOC is the default, and you do not need to explicitly specify it.
!DIR$ OFFLOAD TARGET (mic) in (p : alloc_if(.true.) free_if(.false.))
The following code reuses the memory allocated for p previously. It only transfers fresh data into that memory, and after the offload completes, it continues to retain the memory.
!DIR$ OFFLOAD TARGET (mic) in (p : alloc_if(.false.) & !DIR$ free_if(.false.))
The following code reuses the memory allocated for p previously. However, it frees the memory for p after this offload.
Notice that FREE is the default, and you do not need to explicitly specify it.
!DIR$ OFFLOAD TARGET (mic) in (p : alloc_if(.false.) free_if(.true.))
The following code uses a pointer to create a memory allocation on the target. Then the pointer value is passed to another function. Through the pointer value, the target memory can be reused. Notice that the offload_transfer directive uses subscript notation. The length modifier is not required when a variable is specified in array notation.
! Transfer through a routine argument Module m integer, allocatable :: p(:) !dir$ attributes offload:mic :: p contains subroutine foo (arg_p, count) !dir$ attributes offload:mic :: foo integer, allocatable :: arg_p(:) integer count … end subroutine foo end module m program test use m allocate(p(100000)) !dir$ offload_transfer target (mic:0) in(p (100000) : & !dir$ alloc_if(.true)) call foo (p, 100000) end
The following transfers static data to the target. The target static data allocation for the matching CPU variable is automatically used.
! When bar is called with array_cpu_only, dynamic memory is used on target ! when bar is called with array_both, the target array_both is used module m integer array_both(1000) !dir$ attributes offload : mic :: array_both integer array_cpu_only(1000) end module m subroutine foo() use m call bar(array_cpu_only, 1000) call bar(array_both, 1000) end subroutine foo subroutine bar(iarray, count) integer iarray(count) !dir$ offload begin target (mic:0) in(iarray : & !dir$ alloc_if(.false.)) iarray=4 !dir$ end offload end subroutine bar
The following code shows the use of dummy arguments for allocatable arrays mixed between the CPU and the target.
module mod integer, target, allocatable :: arr(:) integer, pointer, dimension(:) :: ptr !dir$ attributes offload:mic :: ptr, arr end module mod program main use mic_lib use mod implicit none integer i, bar1, bar2 allocate(arr(1000)) ptr => arr(1:100) ! copies array to target !dir$ offload begin target(mic:0) in(arr : free_if(.false.)) arr = (/ (i, i=1,1000) /) ! 1, 2, 3, ... !dir$ end offload ! bar2 will use dynamically allocated memory on the target print *, bar2() ! 0 ! bar1 will use allocated "arr" from above print *, bar1(ptr, 100) ! /= 0 end program main integer function bar2() implicit none integer i, bar1 integer, allocatable :: my_p(:) allocate (my_p(100)) ! first 100 elements of my_p are allocated & transferrred to target ("in" implies alloc_if(.true.)) !dir$ offload begin target(mic:0) in(my_p:length(100) & !dir$ free_if(.false.)) my_p = (/ (0, i=1,100) /) ! 0, 0, 0, ... !dir$ end offload bar2 = bar1(my_p, 100) return end function bar2 integer function bar1(iarray, len) implicit none integer :: iarray(len) integer len, sum, i !dir$ offload begin target(mic:0) in(iarray:length(0) & !dir$ alloc_if(.false.) free_if(.false.)) sum = 0 do i = 1, len sum = sum + iarray(i) end do !dir$ end offload bar1 = sum return end function bar1