Managing Memory Allocation for Pointer Variables

This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Memory management on the CPU for pointer variables used in offloaded programs is the same as non-offload programs. That is, the offload pragmas do not affect memory allocation and freeing on the CPU. As usual, you, the programmer, must do this.

Memory management on the coprocessor for pointer variables named in in and out clauses of #pragma offload is done automatically by the compiler and runtime system.

Target Memory Management for Input Pointer Variables

For in variables of a #pragma offload the default behavior is to do a fresh memory allocation for each pointer variable. On return from the construct the memory is de-allocated. In order to retain data between offloads, you can use the alloc_if and free_if qualifiers to modify the memory allocation defaults on the coprocessor.

The alloc_if qualifier specifies a boolean condition that controls whether the pointer variables in the in clause are allocated a fresh block of memory on the target when the construct is executed on the target. If the expression evaluates to TRUE, a fresh memory allocation is performed for each variable listed in the clause. If the condition evaluates to FALSE, the existing pointer values on the target are reused. You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a free_if(FALSE) clause on an earlier offload.

The free_if qualifier specifies a boolean condition which controls whether to free the memory allocated for the pointer variables in an in clause. If the expression evaluates to TRUE, the memory pointed to by each variable listed in the clause is freed. If the condition evaluates to FALSE, no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory.

The alloc_if and free_if boolean expressions are evaluated on the CPU at the point the construct is offloaded to the target.

Target Memory Management for Output Pointer Variables

By default an out variable is allocated fresh memory on the target at the start of an offload and the memory is freed at the end of the offload. The alloc_if and free_if modifiers change the defaults. The expressions are evaluated on the host and used to control coprocessor memory allocation.

When the output value is received on the host, no memory allocation is done. The variables listed in out clauses must point to allocated memory of sufficient size to receive the results on the host.

Transferring Data into Pre-allocated Memory on the Target

As described in the previous section a pointer variable in an in, out, inout, or nocopy clause can retain the target memory allocation when you set the free_if modifier to false. You can reuse that memory in subsequent offloads by using in, out, inout, or nocopy and specifying alloc_if(0). When target memory is allocated it is associated with the value of the CPU pointer variable used as the destination in the in, out, inout, or nocopy clause. When target memory is to be reused, it is located using the value of the CPU pointer variable that is the destination of that transfer. The associations between the CPU address used when target memory is allocated and the target memory are automatically maintained by the offload runtime library. The associations are created or dropped along with target memory allocation or deallocation. Create the association at allocation time using alloc_if(1) free_if(0) and delete the association at de-allocation time using free_if(1).

Pointers to static data on the CPU are special-cased. The alloc_if and free_if modifiers are ignored when the following are both true:

The CPU address used during creation of a target memory association points to statically allocated data.
The variable is also available in the target binary because it has __declspec(target(mic)).

The target's statically allocated memory is used as the destination of the transfer. This target memory is not dynamically allocated and never freed.

There is only one block of target memory associated with a CPU address. It is an error to call alloc_if(1) to create a second association for a CPU address before freeing the existing one. The new association overwrites the earlier one, which has the potential for causing memory leaks on the target.

It is an error to call free_if(1) for a transferred pointer if a matching association is not found. The attempted removal of an association is silently ignored. An association can be made with a CPU address, and a certain length, and another association made with a different CPU address within that range. Since origin addresses are different, you can use alloc_if and free_if to create distinct target allocations.

Alignment of Pointer Variables

When memory is allocated for a pointer variable on the target, it is aligned at the natural boundary for the type of the data pointed to by the pointer. Sometimes it may be necessary to request that the data be aligned on larger boundaries, for example, when the program expects to use assembly code or intrinsic functions or array notation, that operate on data with more stringent alignment requirements. In these cases, the align modifier may be used to specify an alignment. The operand of the align modifier must be an integral expression which evaluates to a power of two. The expression is evaluated on the host and the region of memory allocated for the pointer on the target is aligned at a boundary that is greater than or equal to the value of the expression. When the output value is received on the host, no memory allocation is done. The variables listed in out clauses must point to allocated memory of sufficient size to receive the results.

Note

For optimal data transfer performance, by default, the target memory address for a transfer through a pointer is made to match the offset within 64 bytes of the CPU data. That is, if the CPU source address is 16 bytes past a 64 byte boundary, the target data address will also be 16 bytes past a 64 byte boundary.

The align modifier overrides this default and aligns the target memory at the requested alignment. To get the benefits of fast data transfer and the necessary alignment on the target, ensure that the CPU data is aligned on the same boundary as the alignment needed on the target. Doing so meets the requirements for fast data transfer and the requirements for target data alignment.

Examples

Consider the following macros, presented here to make the modifiers in the offload clauses more understandable:

#define ALLOC   alloc_if(1)
#define FREE    free_if(1)
#define RETAIN  free_if(0)
#define REUSE   alloc_if(0)

The following example illustrates the default behavior, which is no data persistence on the coprocessor.

The compiler allocates and frees data around the offload. No alloc or free modifiers are necessary.

#pragma offload target(mic) in( p:length(l) )   
...

The following examples illustrate keeping data on the coprocessor between offloads.

The following code allocates memory for p as part of this offload, and keeps the memory allocated for p after the offload.

Notice that ALLOC is the default, and you do not need to explicitly specify it.

#pragma offload target(mic) in (p:length(l) ALLOC RETAIN) 
...

The following code reuses the memory allocated for p previously. It only transfers fresh data into that memory, and after the offload completes, it continues to retain the memory.

#pragma offload target(mic) in (p:length(l) REUSE RETAIN) 
...

The following code reuses the memory allocated for p previously. However, it frees the memory for p after this offload.

Notice that FREE is the default, and you do not need to explicitly specify it.

#pragma offload target(mic) in (p:length(l) REUSE FREE)

The following code uses a pointer to create a memory allocation on the target. Then the pointer value is passed to another function. Through the pointer value, the target memory can be reused. Notice that the #pragma offload_transfer statement uses array notation. The length modifier is not required when a variable is specified in array notation.

// Transfer through a function parameter
int *p = malloc(…);
int count;
void bar()
{
	…
	// Allocate memory on the coprocessor, and transfer data
	#pragma offload_transfer … in( p[0:count] : RETAIN )
	foo(p, l);
}
foo(int *arg_p, int count)
{
	// Transfer will succeed
	#pragma offload … in( arg_p[0:count] : REUSE )
	…
}

The following transfers static data to the target. The target static data allocation for the matching CPU variable is automatically used.

// When bar is called with array_cpu_only, dynamic memory is used on target
// When bar is called with array_both, the target array_both is used 

__declspec(target(mic)) int array_both[1000];
int array_cpu_only[1000];
void foo()
{
	bar(&array_cpu_only[0]);
	bar(&array_both[0]);
}
void bar(int *p, int count)
{
	#pragma offload … in(p[0:count] REUSE)
	…
}

The following code shows mixing of pointers to statics and pointers to dynamically allocated memory between the CPU and the target.

// Associations created by offloading named variables and pointers
// to dynamically allocated variables are treated the same way

__declspec(target(mic)) float array[1000];
main()
{
	// copies array to target
	#pragma offload target(mic) in(array)
    	{   ...  }

	// bar1 will use dynamically allocated memory on the target 
    	printf("%e\n", bar2());

	// bar1 will use statically allocated "array"
   	printf("%e\n", bar1(&array[0], 100));
}
float bar2()
{
    float * my_p = malloc(100 * sizeof(float));
    #pragma offload target(mic) in(my_p[0:100] : RETAIN )
    {  ...  }
    return bar1(my_p, 100);
}
float bar1(float *p, int n)
{
    #pragma offload target(mic) IN(p : length(0) REUSE RETAIN)
    {  sum = … <sum of elements in p>  }
	return sum;
}