Intel® C++ Compiler 16.0 User and Reference Guide

_mm512_extload_epi32/ _mm512_mask_extload_epi32

Loads/broadcasts/converts int32 vector. Corresponding instruction is VMOVDQA32. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Syntax

Without Mask

extern __m512i __cdecl _mm512_extload_epi32(void const* mt, _MM_UPCONV_EPI32_ENUM conv, _MM_BROADCAST32_ENUM bc, int hint);

With Mask

extern __m512i __cdecl _mm512_mask_extload_epi32(__m512i v1_old, __mmask16 k1, void const* mt, _MM_UPCONV_EPI32_ENUM conv, _MM_BROADCAST32_ENUM bc, int hint);

Arguments

v1_old

Source vector that retains old values of the destination vector; the resulting vector gets corresponding elements from v1_old for zero mask bits

k1

Writemask; only those elements of the source vectors with corresponding bit set to '1' in the k1 mask are computed and stored in the result; elements in the result vector corresponding to zero bit in k1 are copied from corresponding elements of vector v1_old

mt

memory address from where loading occurs

conv

Type of upconversion, which can be one of the following:

  • _MM_UPCONV_EPI32_NONE - no conversion
  • _MM_UPCONV_EPI32_UINT8 - uint8 => uint32
  • _MM_UPCONV_EPI32_SINT8 - sint8 => sint32
  • _MM_UPCONV_EPI32_UINT16 - uint16 => uint32
  • _MM_UPCONV_EPI32_SINT16 - sint16 => sint32

bc

Type of broadcast, which can be one of the following:

  • _MM_BROADCAST32_NONE - identity swizzle/convert
  • _MM_BROADCAST_1X16 - broadcast x 16 ( aaaa aaaa aaaa aaaa )
  • _MM_BROADCAST_4X16 - broadcast x 4 ( dcba dcba dcba dcba )

hint

Hint that indicates to the processor that the data is non-temporal. Takes the value 0 or 1, where:

  • _MM_HINT_NONE = 0
  • _MM_HINT_NT = 1 (Load is non-temporal)

Description

Depending on the bc parameter, loads one (bc=_MM_BROADCAST_1X16), four (bc=_MM_BROADCAST_4X16), or 16 (bc=_MM_BROADCAST32_NONE) elements at memory address mt, converts them to int32 values, and returns the result in a int32 vector. The type and the size of elements read from memory depend on the parameter conv .

The masked variant has two additional arguments: v1_old and k1. Only those elements with the corresponding bit set to one in vector mask k1 are computed. Elements in resulting vector with the corresponding bit clear in k1 obtain values from the v1_old vector.

Note

This intrinsic requires the memory address mt to be aligned to the data size granularity dictated by the bc and conv parameters. If a conversion is done from a 8-bit type (uint8, sint8) then the required alignment is 1, 4, or 16 bytes depending on the broadcast (1x16, 4x16, none). For a conversion from 16-bit types the alignment must be 2, 8, or 32 bytes depending on the broadcast. If no conversion is used, the alignment must be 4, 16, or 64 bytes.

Returns

Returns the result of the load/broadcast/convert operation.