Intel® Advisor Help

Issue: Potential underutilization of FMA instructions

Your current hardware supports the AVX2 instruction set architecture (ISA), which enables the use of fused multiply-add (FMA) instructions. Improve performance by utilizing FMA instructions.

Recommendation: Target the AVX2 ISA

Although static analysis presumes the loop may benefit from FMA instructions available with the AVX2 ISA, no AVX2-specific code executed for this loop. To fix: Use the xCORE-AVX2 compiler option to generate AVX2-specific code, or the axCORE-AVX2 compiler option to enable multiple, feature-specific, auto-dispatch code generation, including AVX2.

Windows* OS Linux* OS
/QxCORE-AVX2 or /QaxCORE-AVX2 -xCORE-AVX2 or -axCORE-AVX2

Read More:

Recommendation: Target a specific ISA instead of using the xHost option

Although static analysis presumes the loop may benefit from FMA instructions available with the AVX2 ISA, no AVX2-specific code executed for this loop. To fix: Instead of using the xHost compiler option, which limits optimization opportunities by the host ISA, use the axCORE-AVX2 compiler option to compile for machines with and without AVX2 support, or the xCORE-AVX2 compiler option to compile for machines with AVX2 support only.

Windows* OS Linux* OS
/QxCORE-AVX2 or /QaxCORE-AVX2 -xCORE-AVX2 or -axCORE-AVX2

Read More:

Recommendation: Explicitly enable FMA generation when using the strict floating-point model

Static analysis presumes the loop may benefit from FMA instructions available with the AVX2 ISA, but the strict floating-point model disables FMA instruction generation by default. To fix: Override this behavior using the fma compiler option.

Windows* OS Linux* OS
/Qfma -fma

Read More:

Recommendation: Force vectorization if possible

The loop contains FMA instructions (so vectorization could be beneficial) but is not vectorized. To fix: Review corresponding compiler diagnostics to check if vectorization enforcement is possible and profitable.

Read More: