Intel® C++ Compiler 16.0 User and Reference Guide
Packs mask-enabled elements of int64 vector to form an unaligned int64 stream and stores that portion of the stream that maps to the high 64-byte aligned portion of the memory destination. Corresponding instruction is VPACKSTOREHPD. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Without Mask extern void __cdecl _mm512_packstorehi_epi64(void* mt, __mmask8 k1, __m512i v1); |
With Mask extern void __cdecl _mm512_mask_packstorehi_epi64(void* mt, __mmask8 k1, __m512i v1); |
v1 |
source vector to store elements from |
k1 |
vector mask to select elements to add to the stream |
mt |
memory location to store vector elements |
Packs the mask-enabled elements of int64 vector v1 into an int64 stream logically mapped starting at element-aligned address (mt − 64), and stores the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (mt − 64), the high cache line in the current implementation). The length of the stream depends on the number of enabled masks, as elements disabled by the mask are not added to the stream.
The mask parameter k1 is not used as a writemask for this function. Instead, the mask is used as an element selector, choosing which elements are added to the stream.
In conjunction with _mm512_packstorelo_epi64, this function is useful for packing data into a queue. Also in conjunction with _mm512_packstorelo_epi64, it allows unaligned vector stores (that is, vector stores that are only element-wise, not vector-wise, aligned). The typical intrinsic sequence to perform an unaligned vector store would be:
_mm512_packstorelo_epi64(mt, v1); _mm512_packstorehi_epi64(mt+64, v1);
Returns nothing.