Vectorization is a key component of Intel Cilk Plus technology. SIMD Vectorization is discussed in some depth in the User-mandated or SIMD Vectorization section of the Auto-vectorization key feature chapter and the
simd pragma is documented in the Pragmas section of the Compiler reference Chapter.
In this section we introduce the
_Simd keyword which provides an alternative to the
simd pragma. Just like the
simd pragma, the
_Simd keyword modifies a serial
for loop for vectorization. The syntax is as follows:
_Simd [_Safelen(constant-expression)][_Reduction (reduction-identifier : list)]
The _Simd keyword and any clauses should come after the
for keyword as in this example:
for _Simd (int i=0; i<10; i++){
// loop body
}
The
_Simd keyword can also be used with
cilk_for. See the
cilk_for keyword documentation for a discussion of the cases where they are most effectively used together.
Differences between the
simd pragma and
_Simd keyword:
- Omission of the
private and
lastprivate clauses of the
simd pragma construct because C and C++ already have variable-scoping rules that allow a programmer to cleanly declare a private variable within the scope of a loop iteration
- The
linear clause is omitted because the ability to increment multiple variables makes it unnecessary. See the following example:
float add_floats(float *a, float *b, int n){
int i=0;
int j=0;
float sum=0;
for _Simd _Reduction(+:sum) (i=0; i<n; i++, j+=2){
a[i] = a[i] + b[j];
sum += a[i];
}
return sum;
}
To ensure that your loop is vectorized keep the following in mind:
- The countable loop for the _Simd keyword has to conform to the for-loop style of an OpenMP* canonical loop form except that multiple variables may be incremented in the incr-expr (See http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf ).
- The loop control variable must be a signed integer type.
- The vector values should be signed 8-, 16-, 32-, or 64-bit integers, single or double-precision floating point numbers, or single or double-precision complex numbers.
- You cannot use any control constructs to jump into or out of a SIMD loop. That includes the
break,
return,
goto, and
throw constructs.
- A SIMD loop may contain another loop (for,
while,
do-while) in it, but
goto out of such inner loops is not supported. You may use
break and
continue with the inner loop.
- A SIMD loop performs memory references unconditionally. Therefore, all address computations must result in valid memory addresses, even though such locations may not be accessed if the loop is executed sequentially