Intel® C++ Compiler 16.0 User and Reference Guide

Overview: Vector Operations

This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Almost all vector intrinsic functions supporting Intel® Initial Many Core Instructions (Intel® IMCI) have the form:

vop v1 {k1}, v2, S(v3/m)

where v1 is a destination operand. The instructions are writemasked, so only those elements with the corresponding bit set in vector mask register k1 are computed and stored into v1. Elements in v1 with the corresponding bit clear in k1 retain their previous values.

This means that the destination vector v1 is also the source vector and it should be passed to the intrinsic function as an additional parameter.

Intrinsics for Vector Operations

The 512-bit vector intrinsics work in an element-wise manner: the first element of the first source vector is operated on together with the first element of the second source vector, and the result is stored in the first element of a destination vector, and so on for the remaining seven or 15 elements.

Note

The contents of a 512-bit vector may be treated as either eight or 16 elements, depending on the intrinsic. For example, in the intrinsic functions:

The vector mask register that serves as the writemask for a vector intrinsic determines which element locations are actually operated upon; the mask can disable the operation and update for any combination of element locations.

Most vector intrinsics have three different vector operands (typically, two sources and one destination) except those instructions that have a single source and thus use only two operands.

In addition, any of the source vectors can be a result of permutation operations on memory registers or vectors.

Unmasked and Masked Variants of Intrinsic Functions

To simplify the usage and to enable compiler optimizations, we provide pairs of intrinsics for each vector instruction - an unmasked variant and a masked variant.

It is important to understand the following points about the variants:

_mm512_<vop>(v2, v3)

Example of Masked Vector Usage

To make the workings of the masked vector k1 clear, here is an example.

Consider an intrinsic that performs an element-by-element addition operation with carry, where the two source vectors are v1 and v3. The vector carry holds the carry over value. Vector k2_old supplies elements to resulting vector under certain circumstances.

For the masked variant of the intrinsic, the vector k1 is a mask of 16 bits. If the bit number '3' in k1 is set to '1' then the third element of the resulting vector will be the result of addition between the third element of v1 vector and the third element of v3 vector, and the third element of carry will be the carry of that sum.

In addition, if bit number two in the mask k1 is '0', then the second element of the resulting vector will be equal to the second element of vector v1, and the second element of carry will be equal to the second element of k2_old.

The code below demonstrates how it works:

for (n=0; n < 16; n++) {
   res[i] = v1[i]
   *carry[i] = k2_old[i]
     if ( k1[i] == 1 ) {
         res[i] = res[i] + v3[i]
         *carry[i] = Carry(v1[i] + v3[i])
       }
}

Note

The v1_old vector is used similarly to the k2_old vector. It supplies elements to the resulting vector when the elements undergoing the operation have corresponding bit set to '0' in the mask k1 vector.

See Also