Intel® C++ Compiler 16.0 User and Reference Guide
Resolves dependencies to facilitate auto-parallelization of the immediately following loop (parallel) or prevents auto-parallelization of the immediately following loop (noparallel).
#pragma parallel [clause[ [,]clause]...] |
#pragma noparallel |
clause |
Can be any of the following:
|
||||||||||
|
Like the private clause, both the firstprivate, and the lastprivate clauses specify a list of scalar and array variables (var) to privatize. An array or pointer variable can take an optional argument (expr) which is an int32 or int64 expression denoting the number of array elements to privatize. The same var is not allowed to appear in both the private and the lastprivate clauses for the same loop. The same var is not allowed to appear in both the private and the firstprivate clauses for the same loop. When expr is absent, the rules on var are the same as with OpenMP 4.0. A summary of rules to be observed is as follows:
When expr is present, the same rules apply, but var must be an array or a pointer variable.
|
The parallel pragma instructs the compiler to ignore potential dependencies that it assumes could exist and which would prevent correct parallelization in the immediately following loop. However, if dependencies are proven, they are not ignored.
The noparallel pragma prevents autoparallelization of the immediately following loop.
These pragmas take effect only if autoparallelization is enabled by the [Q]parallel compiler option. Using this option enables parallelization for both Intel® microprocessors and non-Intel microprocessors. The resulting executable may get additional performance gain on Intel® microprocessors than on non-Intel microprocessors. The parallelization can also be affected by certain options, such as the arch, m, or [Q]x compiler options.
Use this pragma with care. If a loop has cross-iteration dependencies, annotating it with this pragma can lead to incorrect program behavior.
Only use the parallel pragma if it is known that parallelizing the annotated loop will improve its performance.
Example: Using the parallel pragma |
---|
void example(double *A, double *B, double *C, double *D) { int i; #pragma parallel for (i=0; i<10000; i++) { A[i] += B[i] + C[i]; C[i] += A[i] + D[i]; } } |