Intel® C++ Compiler 16.0 User and Reference Guide
Multiply and add float64 vectors. This intrinsic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
Without Mask extern __m512d __cdecl _mm512_fmadd_pd(_m512d v1, __m512d v2, __m512d v3); extern __m512d __cdecl _mm512_fmadd_round_pd(_m512d v1, __m512d v2, __m512d v3, int rc); |
With Mask extern __m512d __cdecl _mm512_mask_fmadd_pd(_m512d v1, __mmask8 k1, __m512d v2, __m512d v3); extern __m512d __cdecl _mm512_mask_fmadd_round_pd(_m512d v1, __mmask8 k1, __m512d v2, __m512d v3, int rc); extern __m512d __cdecl _mm512_mask3_fmadd_pd(_m512d v1, __m512d v2, __m512d v3, __mmask8 k1); extern __m512d __cdecl _mm512_mask3_fmadd_round_pd(_m512d v1, __m512d v2, __m512d v3, __mmask8 k1, int rc); |
Performs an element-by-element multiplication between float64 vector v1 and the float64 vector v2, then adds the result to float64 vector v3. Intermediate values are calculated to infinite precision, and are not truncated or rounded, unless you specify the rc parameter.
The masked variant has one additional argument: k1. Only those elements in source registers with the corresponding bit set in vector mask k1 are used for computing. When a write mask is used, the pass-through values come from the vector parameter immediately preceding the mask parameter. For example, for _mm512_mask_fmadd_pd(v1, k1, v2, v3) the pass-through values come from v1, while for _mm512_mask3_fmadd_pd(v1, v2, v3, k3) the pass-through values come from v3. To get the pass-through values from v2, reverse the order of v1 and v2 in the _mask_ form.
Returns the result of the multiplication-addition operation.