Intel® C++ Compiler 16.0 User and Reference Guide
Loads/broadcasts/converts float32 vector. Corresponding instruction is VMOVAPS. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Without Mask extern __m512 __cdecl _mm512_extload_ps(void const* mt, _MM_UPCONV_PS_ENUM conv, _MM_BROADCAST32_ENUM bc, int hint); |
With Mask extern __m512 __cdecl _mm512_mask_extload_ps(__m512 v1_old, __mmask16 k1, void const* mt, _MM_UPCONV_PS_ENUM conv, _MM_BROADCAST32_ENUM bc, int hint); |
Depending on the bc parameter, loads one (bc=_MM_BROADCAST_1X16), four (bc=_MM_BROADCAST_4X16), or 16 (bc=_MM_BROADCAST32_NONE) elements at memory address mt, converts them to float32 values, and returns the result in a float32 vector. The type and the size of elements read from memory depend on the parameter conv .
The masked variant has two additional arguments: v1_old and k1. Only those elements with the corresponding bit set to one in vector mask k1 are computed. Elements in resulting vector with the corresponding bit clear in k1 obtain values from the v1_old vector.
Returns the result of the load/broadcast/convert operation.