Intel® C++ Compiler 16.0 User and Reference Guide
Permutes 128-bit blocks of an int32 vector. The corresponding instruction is VPERMF32X4. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Without Mask extern __m512i __cdecl _mm512_permute4f128_epi32(__m512i v2, _MM_PERM_ENUM permute); |
With Mask extern __m512i __cdecl _mm512_mask_permute4f128_epi32(__m512 v1_old, __mmask16 k1, __m512i v2, _MM_PERM_ENUM permute); |
v1_old | Source vector that retains the old values of the destination vector. The resulting vector gets the corresponding elements from v1_old for zero mask bits. |
v2 | A source int32 vector, whose elements are permuted. |
k1 | A writemask. Only those elements of the source vectors with corresponding bit set to one in the k1 mask are computed and stored in the result. The elements in the result vector corresponding to the zero bit in k1 are copied from corresponding elements of vector v1_old. |
permute | A constant that defines how to permute elements in each 32-bit block. |
Shuffles 128-bit blocks of the int32 vector v2 using the permute parameter as an index for permuting the elements in each block.
The resulting vector for the masked variant is populated by elements for which the corresponding bit in the writemask vector k1 is set. The remaining elements of the resulting vector for the masked variant are populated by corresponding elements from v1_old.
The non-masked variant of the intrinsic is equivalent to the masked variant with full mask (k1=0xffff).
Returns the result of the permutation.