Intel® C++ Compiler 16.0 User and Reference Guide

Intrinsics for FP Gather and Scatter Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file. For purposes of including a header in your code, use immintrin.h .


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_i32gather_pd, _mm512_mask_i32gather_pd

Gathers float64 vector elements from memory with int32 indices.

VGATHERDPD

_mm512_i32gather_ps, _mm512_mask_i32gather_ps

Gathers float32 vector elements from memory with int32 indices.

VGATHERDPS

_mm512_i64gather_pd, _mm512_mask_i64gather_pd

Gathers float64 vector elements from memory with int64 indices.

VGATHERQPD

_mm512_i64gather_ps, _mm512_mask_i64gather_ps

Gathers float32 vector elements from memory with int64 indices.

VGATHERQPS

_mm512_prefetch_i32gather_pd, _mm512_mask_prefetch_i32gather_pd

Gathers prefetch float64 vector with int32 indices.

VGATHERPF0DPD, VGATHERPF1DPD

_mm512_prefetch_i64gather_pd, _mm512_mask_prefetch_i64gather_pd

Gathers prefetch float64 vector with int64 indices.

VGATHERPF0QPD, VGATHERPF1QPD

_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_ps

Gathers prefetch float64 vector with int64 indices.

VGATHERPF0QPS, VGATHERPF1QPS

_mm512_i32scatter_pd, _mm512_mask_i32scatter_pd

Scatters float64 vector elements from memory with int32 indices.

VSCATTERDPD

_mm512_i32scatter_ps, _mm512_mask_i32scatter_ps

Scatters float32 vector elements from memory with int32 indices.

VSCATTERDPD

_mm512_i64scatter_pd, _mm512_mask_i64scatter_pd

Scatters float64 vector elements from memory with int64 indices.

VSCATTERQPD

_mm512_i64scatter_ps, _mm512_mask_i64scatter_ps

Scatters float32 vector elements from memory with int64 indices.

VSCATTERQPS

_mm512_prefetch_i32scatter_pd, _mm512_mask_prefetch_i32scatter_pd

Scatters prefetch float64 vector with int32 indices.

VSCATTERPF0DPD, VSCATTERPF1DPD

_mm512_prefetch_i64scatter_pd, _mm512_mask_prefetch_i64scatter_pd

Scatters prefetch float64 vector with int64 indices.

VSCATTERPF0QPD, VSCATTERPF1QPD

_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_ps

Scatters prefetch float64 vector with int64 indices.

VSCATTERPF0QPS, VSCATTERPF1QPS


variable definition
k

writemask used as a selector

a

first source vector element

src

source element to use based on writemask result

hint

Indicates which cache level to bring values into, where _MM_HINT_ENUM can be one of the following:

  • _MM_HINT_NONE 0x0 - Off.
  • _MM_HINT_NT 0x1 - On: Load or store is non-temporal.

scale

Where _MM_INDEX_SCALE_ENUM can be one of the following:

  • _MM_SCALE_1 - 1
  • _MM_SCALE_2 - 2
  • _MM_SCALE_4 - 4
  • _MM_SCALE_8 - 8

round

Rounding control values; these can be one of the following (along with the sae suppress all exceptions flag):

  • _MM_FROUND_TO_NEAREST_INT - rounds to nearest even
  • _MM_FROUND_TO_NEG_INF - rounds to negative infinity
  • _MM_FROUND_TO_POS_INF - rounds to positive infinity
  • _MM_FROUND_TO_ZERO - rounds to zero
  • _MM_FROUND_CUR_DIRECTION - rounds using default from MXCSR register


_mm512_i32gather_pd

extern __m512 __cdecl _mm512_i32gather_pd(__m512i vindex, void const* base_addr,  int scale, int hint);

Gathers float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i32gather_pd

extern __m512 __cdecl _mm512_mask_i32gather_pd(__m512 src, __mmask16 k, __m512i vindex, void const* base_addr, int scale, int hint);

Gathers float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).


_mm512_i32gather_ps

extern __m512 __cdecl _mm512_i32gather_ps(__m512i vindex, void const* base_addr,  int scale, int hint);

Gathers float32 elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i32gather_ps

extern __m512 __cdecl _mm512_mask_i32gather_ps(__m512 src, __mmask16 k, __m512i vindex, void const* base_addr,  int scale, int hint);

Gathers float32 elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).


_mm512_i64gather_pd

extern __m512d __cdecl _mm512_i64gather_pd(__m512i vindex, void const* base_addr,  int scale, int hint);

Gathers float64 elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i64gather_pd

extern __m512d __cdecl _mm512_mask_i64gather_pd(__m512d a, __mmask8 k, __m512i vindex, void const* base_addr,  int scale, int hint);

Gathers float64 elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).


_mm512_i64gather_ps

extern __m512 __cdecl _mm512_i64gather_ps(__m512i vindex, void const* base_addr,  int scale, int hint);

Gathers float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i64gather_ps

extern __m512 __cdecl _mm512_mask_i64gather_ps(__m512 src, __mmask8 k, __m512i vindex, void const*,  int scale, int hint);

Gathers float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).


_mm512_prefetch_i64gather_ps

extern void __cdecl _mm512_prefetch_i64gather_ps(__m512i vindex, void const* base_addr, int scale, int hint);

Prefetches float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged in cache. scalehint should be 0 or 1 and indicates which cache level to read values into.


_mm512_mask_prefetch_i64gather_ps

extern void __cdecl _mm512_mask_prefetch_i64gather_ps(__m512i vindex, __mmask8 k, void const* base_addr, int scale, int hint);

Prefetches float32 elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged in cache using writemask k (elements are only brought into cache when their corresponding mask bit is set). scalehint should be 0 or 1 and indicates which cache level to read values into.


_mm512_prefetch_i32gather_pd

extern void __cdecl _mm512_prefetch_i32scatter_pd(void* base_addr, __m256i vindex, int scale, int hint);

Prefetches float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).

scalehint should be 0 or 1 and indicates which cache level to bring values into. Gathered elements are merged in cache.


_mm512_mask_prefetch_i32gather_pd

extern void __cdecl _mm512_mask_prefetch_i32scatter_pd(void* base_addr, __mmask8 k, __m256i vindex, int scale, int hint);

Prefetches float64 elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged in cache using writemask k (elements are brought into cache only when their corresponding mask bits are set). scalehint should be 0 or 1 and indicates which cache level to bring values into.


_mm512_prefetch_i64gather_pd

extern void __cdecl _mm512_prefetch_i64gather_pd(__m512i vindex, void const* base_addr, int scale, int hint);

Prefetches float64 elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scalehint should be 0 or 1.


_mm512_mask_prefetch_i64gather_pd

extern void __cdecl _mm512_mask_prefetch_i64gather_pd(__m512i vindex, __mmask8 k, void const* base_addr, int scale, int hint);

Prefetches float64 elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Prefetched elements are merged in cache using writemask k (elements are copied from memory when the corresponding mask bit is set). scalehint should be 0 or 1.


_mm512_i32scatter_pd

extern void __cdecl _mm512_i32scatter_pd(void* base_addr, __m512i vindex, __m512d a,  int scale, int hint);

Scatters float64 elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32scatter_pd

extern void __cdecl _mm512_mask_i32scatter_pd(void* base_addr, __mmask8 k, __m512i vindex, __m512d a,  int scale, int hint);

Scatters float64 elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).


_mm512_i32scatter_ps

extern void __cdecl _mm512_i32scatter_ps(void* base_addr, __m512i vindex, __m512 a,  int scale, int hint);

Scatters float32 elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32scatter_ps

extern void __cdecl _mm512_mask_i32scatter_ps(void* base_addr, __mmask16 k, __m512i vindex, __m512 a,  int scale, int hint);

Scatters float32 elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).


_mm512_i64scatter_pd

extern void __cdecl _mm512_i64scatter_pd(void* base_addr, __m512i vindex, __m512d a,  int scale, int hint);

Scatters float64 elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64scatter_pd

extern void __cdecl _mm512_mask_i64scatter_pd(void* base_addr, __mmask8 k, __m512i vindex, __m512d a,  int scale, int hint);

Scatters float64 elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).


_mm512_i64scatter_ps

extern void __cdecl _mm512_i64scatter_ps(void* base_addr, __m512i vindex, __m512 a,  int scale, int hint);

Scatters float32 elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).


_mm512_mask_i64scatter_ps

extern void __cdecl _mm512_mask_i64scatter_ps(void* base_addr, __mmask8 k, __m512i vindex, __m512 a,  int scale, int hint);

Scatters float32 elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).


_mm512_prefetch_i64scatter_ps

extern void __cdecl _mm512_prefetch_i64scatter_ps(void* base_addr, __m512i vindex, int scale, int hint);

Prefetches float32 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i64scatter_ps

extern void __cdecl _mm512_mask_prefetch_i64scatter_ps(void* base_addr, __mmask8 k, __m512i vindex, int scale, int hint);

Prefetches float32 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not brought into cache when the corresponding mask bit is not set).


_mm512_prefetch_i32scatter_pd

extern void __cdecl _mm512_prefetch_i32gather_pd(__m256i vindex, void const* base_addr, int scale, int hint);

Prefetches float64 elements with intent to write using 32-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i32scatter_pd

extern void __cdecl _mm512_mask_prefetch_i32gather_pd(__m256i vindex, __mmask8 k, void const* base_addr, int scale, int hint);

Prefetches float64 elements with intent to write using 32-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not brought into cache when the corresponding mask bit is not set).


_mm512_prefetch_i64scatter_pd

extern void __cdecl _mm512_prefetch_i64scatter_pd(void* base_addr, __m512i vindex, int scale, int hint);

Prefetches float64 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i64scatter_pd

extern void __cdecl _mm512_mask_prefetch_i64scatter_pd(void* base_addr, __mmask8 k, __m512i vindex, int scale, int hint);

Prefetches float64 elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level hint, where hint is 0 or 1. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not brought into cache when the corresponding mask bit is not set).