Intel® C++ Compiler 16.0 User and Reference Guide

Intrinsics for Integer Gather and Scatter Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file. For purposes of including a header in your code, use immintrin.h .


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_i32gather_epi32, _mm512_mask_i32gather_epi32

Gathers int32 from memory using 32-bit indices.

VPGATHERDD

_mm512_i32gather_epi64, _mm512_mask_i32gather_epi64

Gathers int64 from memory using 32-bit indices.

VPGATHERDQ

_mm512_i64gather_epi32, _mm512_mask_i64gather_epi32

Gathers int32 from memory using 64-bit indices.

VPGATHERQD

_mm512_i64gather_epi64, _mm512_mask_i64gather_epi64

Gathers int64 from memory using 64-bit indices.

VPGATHERQQ

_mm512_i32scatter_epi32, _mm512_mask_i32scatter_epi32

Scatters int32 into memory using 32-bit indices.

VPSCATTERDD

_mm512_i32scatter_epi64, _mm512_mask_i32scatter_epi64

Scatters int64 into memory using 32-bit indices.

VPSCATTERDQ

_mm512_i64scatter_epi32, _mm512_mask_i64scatter_epi32

Scatters int32 into memory using 64-bit indices.

VPSCATTERQD

_mm512_i64scatter_epi64, _mm512_mask_i64scatter_epi64

Scatters int64 into memory using 64-bit indices.

VPSCATTERQQ


variable definition
k

writemask used as a selector

a

first source vector element

src

source element to use based on writemask result

downconv

Where _MM_DOWNCONV_EPI32_ENUM can be one of the following:

  • _MM_DOWNCONV_EPI32_NONE - no conversion
  • _MM_DOWNCONV_EPI32_UINT8 - uint32 => uint8
  • _MM_DOWNCONV_EPI32_SINT8 - sint32 => sint8
  • _MM_DOWNCONV_EPI32_UINT16 - uint32 => uint16
  • _MM_DOWNCONV_EPI32_SINT16 - sint32 => sint16

downconv

Where _MM_DOWNCONV_EPI64_ENUM can be one of the following:

  • _MM_DOWNCONV_EPI64_NONE - no conversion

upconv

Where _MM_UPCONV_EPI32_ENUM can be one of the following:

  • _MM_UPCONV_EPI32_NONE - no conversion
  • _MM_UPCONV_EPI32_FLOAT16 - float16 => float32
  • _MM_UPCONV_EPI32_UINT8 - uint8 => uint32
  • _MM_UPCONV_EPI32_SINT8 - sint8 => sint32
  • _MM_UPCONV_EPI32_UINT16 - uint16 => uint32
  • _MM_UPCONV_EPI32_SINT16 - sint16 => sint32

upconv

Where _MM_UPCONV_EPI64_ENUM can be:

  • _MM_UPCONV_EPI64_NONE - no conversion

scale

Where _MM_INDEX_SCALE_ENUM can be one of the following:

  • _MM_SCALE_1 - 1
  • _MM_SCALE_2 - 2
  • _MM_SCALE_4 - 4
  • _MM_SCALE_8 - 8

hint

Indicates which cache level to bring values into, where _MM_HINT_ENUM can be one of the following:

  • _MM_HINT_NONE 0x0 - Off.
  • _MM_HINT_NT 0x1 - On: Load or store is non-temporal.


_mm512_i32gather_epi32

extern __m512i __cdecl _mm512_i32gather_epi32(__m512i vindex, void const* base_addr, _MM_UPCONV_EPI32_ENUM upconv, int scale, int hint);

Gather int32 from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.



_mm512_mask_i32gather_epi32

extern __m512i __cdecl _mm512_mask_i32gather_epi32(__m512i src, __mmask16 k, __m512i vindex, void const* base_addr, _MM_UPCONV_EPI32_ENUM upconv, int scale, int hint);

Gather int32 from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_i32gather_epi64

extern __m512i __cdecl _mm512_i32gather_epi64(__m512i vindex, void const* base_addr, _MM_UPCONV_EPI64_ENUM upconv, int scale, int hint);

Gathers int64 from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i32gather_epi64

extern __m512i __cdecl _mm512_mask_i32gather_epi64(__m512i vindex, __mmask16k, void const* base_addr, _MM_UPCONV_EPI64_ENUM upconv, int scale, int hint);

Gathers int64 from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_i64gather_epi32

extern __m512i __cdecl _mm512_i64gather_epi32(__m512i vindex, void const* base_addr, _MM_UPCONV_EPI32_ENUM upconv, int scale, int hint);

Gather int32 from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i64gather_epi32

extern __m512i __cdecl _mm512_mask_i64gather_epi32(__m512i src, __mmask8 k, __m512i vindex, void const* base_addr, _MM_UPCONV_EPI32_ENUM upconv, int scale, int hint);

Gather int32 from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_i64gather_epi64

extern __m512i __cdecl _mm512_i64gather_epi64(__m512i vindex, void const* base_addr, _MM_UPCONV_EPI64_ENUM upconv, int scale, int hint);

Gathers int64 from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination.


_mm512_mask_i64gather_epi64

extern __m512i __cdecl _mm512_mask_i64gather_epi64(__m512i src, __mmask8 k, __m512i vindex, void const* base_addr, _MM_UPCONV_EPI64_ENUM upconv, int scale, int hint);

Gathers int64 from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_i32scatter_epi32

extern void __cdecl _mm512_i32scatter_epi32(void* base_addr, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI32_ENUM downconv, int scale, int hint);

Scatters int32 from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).



_mm512_mask_i32scatter_epi32

extern void __cdecl _mm512_mask_i32scatter_epi32(void* base_addr, __mmask16 k, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI32_ENUM downconv, int scale, int hint);

Scatters int32 from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).



_mm512_i32scatter_epi64

extern void __cdecl _mm512_i32scatter_epi64(void* base_addr, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI64_ENUM downconv, int scale, int hint);

Scatters int64 from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).



_mm512_mask_i32scatter_epi64

extern void __cdecl _mm512_mask_i32scatter_epi64(void* base_addr, __mmask8 k, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI64_ENUM downconv, int scale, int hint);

Scatters int64 from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).



_mm512_i64scatter_epi32

extern void __cdecl _mm512_i64scatter_epi32(void* base_addr, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI32_ENUM downconv, int scale, int hint);

Scatters int32 from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).



_mm512_mask_i64scatter_epi32

extern void __cdecl _mm512_mask_i64scatter_epi32(void* base_addr, __mmask8 k, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI32_ENUM downconv, int scale, int hint);

Scatters int32 from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).



_mm512_i64scatter_epi64

extern void __cdecl _mm512_i64scatter_epi64(void* base_addr, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI64_ENUM downconv, int scale, int hint);

Scatters int64 from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64scatter_epi64

extern void __cdecl _mm512_mask_i64scatter_epi64(void* base_addr, __mmask8 k, __m512i a, __m512i vindex, _MM_DOWNCONV_EPI64_ENUM downconv, int scale, int hint);

Scatters int64 from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set).