Intel® C++ Compiler 16.0 User and Reference Guide
Packs mask-enabled elements of float32 vector to form an unaligned float32 stream and stores that portion of the stream that maps to the low 64-byte aligned portion of the memory destination. Corresponding instruction is VPACKSTORELD. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Without Mask extern void __cdecl _mm512_packstorelo_ps(void* mt, __m512 v1); |
With Mask extern void __cdecl _mm512_packstorelo_ps(void* mt, __mmask16 k1, __m512 v1); |
v1 |
source vector to store elements from |
k1 |
vector mask to select elements to add to the stream |
mt |
memory location to store vector elements |
Packs the mask-enabled elements of float32 vector v1 into a float32 stream logically mapped starting at element-aligned address mt , and stores the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following mt , the low cache line in the current implementation). The length of the stream depends on the number of enabled masks, as elements disabled by the mask are not added to the stream.
The mask parameter k1 is not used as a writemask for this function. Instead, the mask is used as an element selector, choosing which elements are added to the stream.
In conjunction with _mm512_packstorehi_ps, this function is useful for packing data into a queue. Also in conjunction with _mm512_packstorehi_ps, it allows unaligned vector stores (vector stores that are only element-wise, not vector-wise, aligned). The typical intrinsic sequence to perform an unaligned vector store would be:
_mm512_packstorelo_ps(mt, v1); _mm512_packstorehi_ps(mt+64, v1);
Returns nothing.