Intel® C++ Compiler 16.0 User and Reference Guide
Insert a "%s parallel private(%s)" statement right before the loop at line %d to parallelize the loop.
Add "#pragma parallel private" before the specified loop. This pragma enables the parallelization of the loop at the specified line.
Consider the following:
float A[10][10000]; float B[10][10000]; float C[10][10000]; void foo( int n, int m1, int m2 ) { int i,j; float W[10000]; for (i =0; i < n; i++) { for (j =0; j < m1; j++) W[j] = A[i][j] * B[i][j]; for (j =0; j < m2; j++) C[i][j] += W[j] + 1.0; } }
In this case, the compiler does not parallelize the loop since it cannot determine whether m1 >= m2.
If you know that this property is true, and that no element of W is fetched before it is written to after the loop, then you can use the recommended pragma.
If you determine it is safe to do so, you can add the pragma as follows:
float A[10][10000]; float B[10][10000]; float C[10][10000]; void foo( int n, int m1, int m2 ) { int i,j; float W[10000]; #pragma parallel private (W) for (i =0; i < n; i++) { for (j =0; j < m1; j++) W[j] = A[i][j] * B[i][j]; for (j =0; j < m2; j++) C[i][j] += W[j] + 1.0; } }
Before an element of an array can be read in the loop, there must have been a previous write to it during the same loop iteration. In addition, if an element is read after the loop, there must have been a previous write to it before the read after the loop.