Intel® C++ Compiler 16.0 User and Reference Guide
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file. For purposes of including a header in your code, use immintrin.h .
Intrinsic Name |
Operation |
Corresponding |
---|---|---|
_mm512_i32gather_pd, _mm512_mask_i32gather_pd |
Gathers float64 vector elements from memory with int32 indices. |
VGATHERDPD |
_mm512_i32gather_ps, _mm512_mask_i32gather_ps |
Gathers float32 vector elements from memory with int32 indices. |
VGATHERDPS |
_mm512_i64gather_pd, _mm512_mask_i64gather_pd |
Gathers float64 vector elements from memory with int64 indices. |
VGATHERQPD |
_mm512_i64gather_ps, _mm512_mask_i64gather_ps |
Gathers float32 vector elements from memory with int64 indices. |
VGATHERQPS |
_mm512_prefetch_i32gather_pd, _mm512_mask_prefetch_i32gather_pd |
Gathers prefetch float64 vector with int32 indices. |
VGATHERPF0DPD, VGATHERPF1DPD |
_mm512_prefetch_i64gather_pd, _mm512_mask_prefetch_i64gather_pd |
Gathers prefetch float64 vector with int64 indices. |
VGATHERPF0QPD, VGATHERPF1QPD |
_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_ps |
Gathers prefetch float64 vector with int64 indices. |
VGATHERPF0QPS, VGATHERPF1QPS |
_mm512_i32scatter_pd, _mm512_mask_i32scatter_pd |
Scatters float64 vector elements from memory with int32 indices. |
VSCATTERDPD |
_mm512_i32scatter_ps, _mm512_mask_i32scatter_ps |
Scatters float32 vector elements from memory with int32 indices. |
VSCATTERDPD |
_mm512_i64scatter_pd, _mm512_mask_i64scatter_pd |
Scatters float64 vector elements from memory with int64 indices. |
VSCATTERQPD |
_mm512_i64scatter_ps, _mm512_mask_i64scatter_ps |
Scatters float32 vector elements from memory with int64 indices. |
VSCATTERQPS |
_mm512_prefetch_i32scatter_pd, _mm512_mask_prefetch_i32scatter_pd |
Scatters prefetch float64 vector with int32 indices. |
VSCATTERPF0DPD, VSCATTERPF1DPD |
_mm512_prefetch_i64scatter_pd, _mm512_mask_prefetch_i64scatter_pd |
Scatters prefetch float64 vector with int64 indices. |
VSCATTERPF0QPD, VSCATTERPF1QPD |
_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_ps |
Scatters prefetch float64 vector with int64 indices. |
VSCATTERPF0QPS, VSCATTERPF1QPS |
_mm512_i32gather_pd
extern __m512 __cdecl _mm512_i32gather_pd(__m512i vindex, void const* base_addr, int scale, int hint);
Gathers float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.
_mm512_mask_i32gather_pd
extern __m512 __cdecl _mm512_mask_i32gather_pd(__m512 src, __mmask16 k, __m512i vindex, void const* base_addr, int scale, int hint);
Gathers float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_i32gather_ps
extern __m512 __cdecl _mm512_i32gather_ps(__m512i vindex, void const* base_addr, int scale, int hint);
Gathers float32 elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.
_mm512_mask_i32gather_ps
extern __m512 __cdecl _mm512_mask_i32gather_ps(__m512 src, __mmask16 k, __m512i vindex, void const* base_addr, int scale, int hint);
Gathers float32 elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_i64gather_pd
extern __m512d __cdecl _mm512_i64gather_pd(__m512i vindex, void const* base_addr, int scale, int hint);
Gathers float64 elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.
_mm512_mask_i64gather_pd
extern __m512d __cdecl _mm512_mask_i64gather_pd(__m512d a, __mmask8 k, __m512i vindex, void const* base_addr, int scale, int hint);
Gathers float64 elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_i64gather_ps
extern __m512 __cdecl _mm512_i64gather_ps(__m512i vindex, void const* base_addr, int scale, int hint);
Gathers float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.
_mm512_mask_i64gather_ps
extern __m512 __cdecl _mm512_mask_i64gather_ps(__m512 src, __mmask8 k, __m512i vindex, void const*, int scale, int hint);
Gathers float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_prefetch_i64gather_ps
extern void __cdecl _mm512_prefetch_i64gather_ps(__m512i vindex, void const* base_addr, int scale, int hint);
Prefetches float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged in cache. scalehint should be 0 or 1 and indicates which cache level to read values into.
_mm512_mask_prefetch_i64gather_ps
extern void __cdecl _mm512_mask_prefetch_i64gather_ps(__m512i vindex, __mmask8 k, void const* base_addr, int scale, int hint);
Prefetches float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged in cache using writemask k (elements are only brought into cache when their corresponding mask bit is set). scalehint should be 0 or 1 and indicates which cache level to read values into.
_mm512_prefetch_i32gather_pd
extern void __cdecl _mm512_prefetch_i32scatter_pd(void* base_addr, __m256i vindex, int scale, int hint);
Prefetches float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
scalehint should be 0 or 1 and indicates which cache level to bring values into. Gathered elements are merged in cache.
_mm512_mask_prefetch_i32gather_pd
extern void __cdecl _mm512_mask_prefetch_i32scatter_pd(void* base_addr, __mmask8 k, __m256i vindex, int scale, int hint);
Prefetches float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged in cache using writemask k (elements are brought into cache only when their corresponding mask bits are set). scalehint should be 0 or 1 and indicates which cache level to bring values into.
_mm512_prefetch_i64gather_pd
extern void __cdecl _mm512_prefetch_i64gather_pd(__m512i vindex, void const* base_addr, int scale, int hint);
Prefetches float64 elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scalehint should be 0 or 1.
_mm512_mask_prefetch_i64gather_pd
extern void __cdecl _mm512_mask_prefetch_i64gather_pd(__m512i vindex, __mmask8 k, void const* base_addr, int scale, int hint);
Prefetches float64 elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Prefetched elements are merged in cache using writemask k (elements are copied from memory when the corresponding mask bit is set). scalehint should be 0 or 1.
_mm512_i32scatter_pd
extern void __cdecl _mm512_i32scatter_pd(void* base_addr, __m512i vindex, __m512d a, int scale, int hint);
Scatters float64 elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i32scatter_pd
extern void __cdecl _mm512_mask_i32scatter_pd(void* base_addr, __mmask8 k, __m512i vindex, __m512d a, int scale, int hint);
Scatters float64 elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).
_mm512_i32scatter_ps
extern void __cdecl _mm512_i32scatter_ps(void* base_addr, __m512i vindex, __m512 a, int scale, int hint);
Scatters float32 elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i32scatter_ps
extern void __cdecl _mm512_mask_i32scatter_ps(void* base_addr, __mmask16 k, __m512i vindex, __m512 a, int scale, int hint);
Scatters float32 elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).
_mm512_i64scatter_pd
extern void __cdecl _mm512_i64scatter_pd(void* base_addr, __m512i vindex, __m512d a, int scale, int hint);
Scatters float64 elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i64scatter_pd
extern void __cdecl _mm512_mask_i64scatter_pd(void* base_addr, __mmask8 k, __m512i vindex, __m512d a, int scale, int hint);
Scatters float64 elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).
_mm512_i64scatter_ps
extern void __cdecl _mm512_i64scatter_ps(void* base_addr, __m512i vindex, __m512 a, int scale, int hint);
Scatters float32 elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).
_mm512_mask_i64scatter_ps
extern void __cdecl _mm512_mask_i64scatter_ps(void* base_addr, __mmask8 k, __m512i vindex, __m512 a, int scale, int hint);
Scatters float32 elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).
_mm512_prefetch_i64scatter_ps
extern void __cdecl _mm512_prefetch_i64scatter_ps(void* base_addr, __m512i vindex, int scale, int hint);
Prefetches float32 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i64scatter_ps
extern void __cdecl _mm512_mask_prefetch_i64scatter_ps(void* base_addr, __mmask8 k, __m512i vindex, int scale, int hint);
Prefetches float32 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not brought into cache when the corresponding mask bit is not set).
_mm512_prefetch_i32scatter_pd
extern void __cdecl _mm512_prefetch_i32gather_pd(__m256i vindex, void const* base_addr, int scale, int hint);
Prefetches float64 elements with intent to write using 32-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i32scatter_pd
extern void __cdecl _mm512_mask_prefetch_i32gather_pd(__m256i vindex, __mmask8 k, void const* base_addr, int scale, int hint);
Prefetches float64 elements with intent to write using 32-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not brought into cache when the corresponding mask bit is not set).
_mm512_prefetch_i64scatter_pd
extern void __cdecl _mm512_prefetch_i64scatter_pd(void* base_addr, __m512i vindex, int scale, int hint);
Prefetches float64 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i64scatter_pd
extern void __cdecl _mm512_mask_prefetch_i64scatter_pd(void* base_addr, __mmask8 k, __m512i vindex, int scale, int hint);
Prefetches float64 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not brought into cache when the corresponding mask bit is not set).