Intel® C++ Compiler 16.0 User and Reference Guide
Following the guidelines below will help auto-vectorization of the loop.
The special __m64, __m128, and __m256 datatypes are not vectorizable. The loop body cannot contain any function calls. Use of the Intel® Streaming SIMD Extensions (Intel® SSE) and Intel® Advanced Vector Extensions (Intel® AVX) intrinsics (for example, mm_add_ps) is not allowed.