Intel® C++ Compiler 16.0 User and Reference Guide

_mm512_prefetch_i32[ext]gather_ps/ _mm512_mask_prefetch_i32[ext]gather_ps

Gather prefetch float32 vector with int32 indices. Corresponding instructions are VGATHERPF0DPS and VGATHERPF1DPS. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Syntax

Without Mask

extern void __cdecl _mm512_prefetch_i32extgather_ps(__m512i index, void const* mv, _MM_UPCONV_PS_ENUM conv, int scale, int pf_hint);

extern void __cdecl _mm512_prefetch_i32gather_ps(__m512i index, void const* mv, int scale, int pf_hint);

With Mask

extern void __cdecl _mm512_mask_prefetch_i32extgather_ps(__m512i index, __mmask16 k1, void const* mv, _MM_UPCONV_PS_ENUM conv, int scale, int pf_hint);

extern void __cdecl _mm512_mask_prefetch_i32gather_ps(__m512i index, __mmask16 k1, void const* mv, int scale, int pf_hint);

Parameters

k1

Writemask; Only those elements of the source memory with corresponding bit set to '1' in the k1 writemask are prefetched.

index

int32 vector containing indices in memory mv.

mv

Pointer to base address in memory

conv

Type of upconversion, which can be one of the following:

  • _MM_UPCONV_PS_NONE - no conversion
  • _MM_UPCONV_PS_FLOAT16 - sint8 => float32
  • _MM_UPCONV_PS_UINT8 - uint8 => float32
  • _MM_UPCONV_PS_SINT8 - sint8 => float32
  • _MM_UPCONV_PS_UINT16 - uint16 => float32
  • _MM_UPCONV_PS_SINT16 - sint16 => float32

scale

Scaling factor for calculating address of elements. Takes following values: 1, 2, 4, and 8. The address of the i-th element in memory is calculated as: mv + index[i] * scale

pf_hint

Prefetch hint. Takes one of the following values:

  • _MM_HINT_T0 – prefetch into L1 with T0 hint
  • _MM_HINT_T1 – prefetch into L2 with T1 hint
  • _MM_HINT_T2 – prefetch into L2 with T1 and non-temporal hints
  • _MM_HINT_NTA – prefetch into L1 with T0 and non-temporal hints

Description

A set of 16 memory locations, to which base address mv and int32 index vector index with scale scale point, are prefetched from memory to L1 or L2 level of cache, depending on the pf_hint parameter.

The non-masked variant of the intrinsic is equivalent to the masked variant with full mask (k1=0xffff).

You can use the simplified versions of this intrinsic, without ext in the name, if no up-conversion is required.

Returns

None.