Intel® C++ Compiler 16.0 User and Reference Guide
These Supplemental Streaming SIMD Extensions 3 (SSSE3) intrinsics are used for horizontal addition. The prototypes for these intrinsics are in tmmintrin.h. You can also use the ia32intrin.h header file for these intrinsics.
extern __m128i _mm_hadd_epi16(__m128i a, __m128i b);
Adds horizontally packed signed words. Interpreting a, b, and r as arrays of 16-bit signed integers:
for (i = 0; i < 4; i++) { r[i] = a[2*i] + a[2i+1]; r[i+4] = b[2*i] + b[2*i+1]; }
extern __m128i _mm_hadd_epi32(__m128i a, __m128i b);
Adds horizontally packed signed doublewords. Interpreting a, b, and r as arrays of 32-bit signed integers:
for (i = 0; i < 2; i++) { r[i] = a[2*i] + a[2i+1]; r[i+2] = b[2*i] + b[2*i+1]; }
extern __m128i _mm_hadds_epi16(__m128i a, __m128i b);
Adds horizontally packed signed words with signed saturation. Interpreting a, b, and r as arrays of 16-bit signed integers:
for (i = 0; i < 4; i++) { r[i] = signed_saturate_to_word(a[2*i] + a[2i+1]); r[i+4] = signed_saturate_to_word(b[2*i] + b[2*i+1]); }
extern __m64 _mm_hadd_pi16(__m64 a, __m64 b);
Adds horizontally packed signed words. Interpreting a, b, and r as arrays of 16-bit signed integers:
for (i = 0; i < 2; i++) { r[i] = a[2*i] + a[2i+1]; r[i+2] = b[2*i] + b[2*i+1]; }
extern __m64 _mm_hadd_pi32(__m64 a, __m64 b);
Adds horizontally packed signed doublewords. Interpreting a, b, and r as arrays of 32-bit signed integers:
r[0] = a[1] + a[0]; r[1] = b[1] + b[0];
extern __m64 _mm_hadds_pi16(__m64 a, __m64 b);
Adds horizontally packed signed words with signed saturation. Interpreting a, b, and r as arrays of 16-bit signed integers:
for (i = 0; i < 2; i++) { r[i] = signed_saturate_to_word(a[2*i] + a[2i+1]); r[i+2] = signed_saturate_to_word(b[2*i] + b[2*i+1]); }