Intel® C++ Compiler 16.0 User and Reference Guide
Loads float32 vector. Corresponding instruction is VMOVAPS. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Without Mask extern __m512 __cdecl _mm512_load_ps(void const* mt); |
With Mask extern __m512 __cdecl _mm512_mask_load_ps(__m512 v1_old, __mmask16 k1, void const* mt); |
v1_old |
Source vector that retains old values of the destination vector; the resulting vector gets corresponding elements from v1_old for zero mask bits |
k1 |
Writemask; only those elements of the source vectors with corresponding bit set to '1' in the k1 mask are computed and stored in the result; elements in the result vector corresponding to zero bit in k1 are copied from corresponding elements of vector v1_old |
mt |
memory address to load from |
Loads 16 single precision floating point values from memory address mt into float32 vector. The address mt must be 64-byte-aligned.
In the masked variant, only those elements with the corresponding bit set in vector mask register k1 are computed. Elements in resulting vector with the corresponding bit clear in k1 obtain values from the v1_old vector.
Returns the result of the load operation.