Intel® C++ Compiler 16.0 User and Reference Guide

_mm512_load_ps/ _mm512_mask_load_ps

Loads float32 vector. Corresponding instruction is VMOVAPS. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Syntax

Without Mask

extern __m512 __cdecl _mm512_load_ps(void const* mt);

With Mask

extern __m512 __cdecl _mm512_mask_load_ps(__m512 v1_old, __mmask16 k1, void const* mt);

Arguments

v1_old

Source vector that retains old values of the destination vector; the resulting vector gets corresponding elements from v1_old for zero mask bits

k1

Writemask; only those elements of the source vectors with corresponding bit set to '1' in the k1 mask are computed and stored in the result; elements in the result vector corresponding to zero bit in k1 are copied from corresponding elements of vector v1_old

mt

memory address to load from

Description

Loads 16 single precision floating point values from memory address mt into float32 vector. The address mt must be 64-byte-aligned.

In the masked variant, only those elements with the corresponding bit set in vector mask register k1 are computed. Elements in resulting vector with the corresponding bit clear in k1 obtain values from the v1_old vector.

Returns

Returns the result of the load operation.