Intel® C++ Compiler 16.0 User and Reference Guide
The Intel® C++ Class Libraries for SIMD Operations provide a convenient interface to access the underlying instructions for processors as specified in Processor Requirements for Use of Class Libraries. These processor-instruction extensions enable parallel processing using the single instruction-multiple data (SIMD) technique as illustrated in the following figure.
Performing four operations with a single instruction improves efficiency by a factor of four for that particular instruction.
These new processor instructions can be implemented using assembly inlining, intrinsics, or the C++ SIMD classes. Compare the coding required to add four 32-bit floating-point values, using each of the available interfaces:
Assembly Inlining |
Intrinsics |
SIMD Class Libraries |
---|---|---|
... __m128 a,b,c; __asm{ movaps xmm0,b movaps xmm1,c addps xmm0,xmm1 movaps a, xmm0 } ... |
#include <xmmintrin.h> ... __m128 a,b,c; a = _mm_add_ps(b,c); ... |
#include <fvec.h> ... F32vec4 a,b,c; a = b +c; ... |
This table shows an addition of four single-precision floating-point values using assembly inlining, intrinsics, and the libraries. You can see how much easier it is to code with the Intel C++ SIMD Class Libraries. Besides using fewer keystrokes and fewer lines of code, the notation is like the standard notation in C++, making it much easier to implement over other methods.