Intel® C++ Compiler 16.0 User and Reference Guide
This topic only applies to Intel® 64 and IA-32 architectures targeting Intel® Graphics Technology.
Offloaded code has the following restrictions:
You can place #pragma offload target(gfx) only before a perfect loop nest explicitly written as a _Cilk_for loop or using an Intel® Cilk™ Plus array notation statement.
Do not use the __GFX__ macro inside the statement following a #pragma offload statement. You can, however, use this macro in a subprogram called from the pragma.
All macros inside a #pragma offload region must evaluate to code using the same set of variables. For example, the following code leads to undefined behavior, because N, inputArray and outputArray are not used in the CPU version of the code:
#pragma offload target(gfx) pin(inputArray, outputArray: length(N)) _Cilk_for (int i = 0; i < N; i++){ #ifdef __GFX__ outputArray[i] = inputArray[i]/N; #else // nothing #endif
This restriction only applies to versions of Windows* earlier than Windows 8 and Windows Server* 2012. Offload to the target is only possible from a session which has access to the graphics driver. The practical implication is that:
Offload is possible:
from a local desktop session, when the desktop is not locked and no screen saver is running.
from a Remote Desktop connection, when the remote desktop client window is open.
Offload is not possible from Session 0 where all services are run on Windows Vista* OS, Windows* 7 OS, Windows Server* 2008, and later versions.
By default, the system does not allow for an offload task to execute longer than the system recovery timeout period, which is usually two seconds. In the offload runtime, the task appears to be hanging and abnormally terminating after another timeout, which has a value of 32 seconds. To enable your offload tasks to execute more than 32 seconds, refer to Microsoft's documentation on the Timeout Detection and Recovery (TDR) registry keys to disable recovery on timeout in the system registry.
This restriction only applies to versions of Windows* earlier than Windows 8 and Windows Server* 2012. To offload to the target on a machine with a discrete graphics card installed, you need to make the target the primary graphics device.
The parallel loops associated with #pragma offload must be perfectly nested. The parallel loops must follow the requirements for _Cilk_for, and the loop counter variable of those loops must be of either type int or unsigned, and the stride must be known at compile time.
If your application executes code on the target and host in parallel, you need to ensure that the host and the target do not modify the same cache lines to avoid the false sharing problem. For example, you can pad variables passed to the pin clause to a multiple of the cache line size, or if the host and the target operate on the same array, correspondingly organize the write access in your parallel offload code. Applying __declspec(avoid_false_share) for a variable ensures it is aligned and padded such that it is not subject to false sharing with any other variable.
The header file math.h expands single precision functions, such as sinf to double precision, and the compiler is not always able to convert them back, which may lead to performance problems or even compiler failures. To work around this problem, use mathimf.h instead.
The compiler supports only a subset of math functions, which either map directly to the Intel® Graphics Technology instruction set architecture when possible, or are implemented in the SVML library supplied with the compiler. Only the following functions are supported:
acos/acosf
acosh/acoshf
asin/asinf
asinh/asinhf
atan/atanf
atanh/atanhf
cbrt/cbrtf
sqrt/sqrtf
ceil/ceilf
cos/cosf
erf/erff
erfc/erfcf
exp/expf
exp10/exp10f
exp2/exp2f
expm1/expm1f
fabs/fabsf
floor/floorf
invsqrt/invsqrtf
log/logf
log10/log10f
log1p/log1pf
log2/log2f
nearbyint/nearbyintf
rint/rintf
round/roundf
sin/sinf
sinh/sinhf
tan/tanf
tanh/tanhf
trunc/truncf
copysign/copysignf
atan2/atan2f
fmax/fmaxf
fmin/fminf
hypot/hypotf
pow/powf
Double precision division is also supported and is translated to a call to an SVML function.
The compiler does not support Variable Length Array allocation in heterogeneous code. For example, instead of using float myArray[variableSize];, where variableSize is a variable, use float myArray[CONSTANT_SIZE]; where CONSTANT_SIZE is a compile-time constant.
long double operations are not supported in target code.
longjump/setjump is not supported in target code.
Indirect control flow is not supported in target code, including:
Function pointers, taking address of a function, calling a function by pointers
Calls to virtual function
Switch statements are supported in target code.
Exceptions are not allowed in target code.
RTTI is not supported in target code.
Functions with variable number of arguments (…) are not supported in target code.
None of the following restrictions apply when you use Shared Virtual Memory (SVM) mode.
Sharing or copying of pointers between the host and the target is not supported. Pointers have different meanings on the target and the host, so a pointer value valid for the host is meaningless on the target. No auto-translation of pointer values are done.
Offloaded code cannot use arrays of pointer-typed elements, pointers to pointers, or pointer-typed members of structures or classes.
Pointer or reference typed arguments to a target(gfx) vector function must be either linear or uniform, and vector functions cannot return pointers or reference typed values.
Global or static variables cannot be of pointer or reference types.
Conversion between pointer types and non-pointer types is not allowed.
Taking the address of a pointer or reference is not allowed.
The following pragmas are not supported:
offload_transfer
offload_wait
The following specifiers of pragma offload are not supported:
signal
wait
mandatory
The following modifiers of pragma offload parameters are not supported:
alloc_if
free_if
alloc
into
target-number in #pragma offload target (target-name [ :target-number ]) is ignored
Local scalar variables can only be passed in the in clause of #pragma offload.
Adding local scalar variables to the in clause is redundant and can be omitted, as the compiler automatically adds variables used in the lexical scope of a #pragma offloadstatement to the in clause. Scalar local variables are passed by value and any updates to the variable inside the target code are not visible on the host side after offload.
For example, the following code prints var = 55, i = 0.
int var = 55; int i = 0; #pragma offload target(gfx) _Cilk_for (i = 0; i < 1; i++) { ++var; } printf("var = %d, i = %d\n", var, i);
The following code results in a compile-time error because local variable var can only be listed in the in clause
int var = 55; int i = 0; #pragma offload target(gfx) inout(var) …
Global or static variables cannot be listed in the pin clause of #pragma offload.
The processor graphics does not have OpenMP* run-time library routines. Parallelization happens on the host side. So you cannot call the runtime APIs to change behavior, such as task scheduling, for the target side. All OpenMP environment variables, including those beginning with OMP_, KMP_, and GOMP_, are unsupported.
To use driver-managed Shared Virtual Memory (SVM):
Your target hardware platform should have a 5th generation Intel® Core™ processor with Intel® HD Graphics.
For Windows*, the target OS must be Windows 10 or later. For Linux*, see the Release Notes for target OS requirements.