Intel® Advisor Help

Issue: Ineffective peeled/remainder loop(s) present

All or some source loop iterations are not executing in the loop body. Improve performance by moving source loop iterations from peeled/ remainder loops to the loop body.

Recommendation: Specify the expected loop trip count

The compiler cannot statically detect the trip count. To fix: Identify the expected number of iterations using a directive: !DIR$ LOOP COUNT.

Example: Iterate through a loop a minimum of three, maximum of ten, and average of five times:

!DIR$ LOOP COUNT (10000)
  do i =1, m
    b(i) = a(i) + 1
    d(i) = c(i) + 1
  enddo

Read More:

Recommendation: Disable unrolling

The trip count after loop unrolling is too small compared to the vector length. To fix: Prevent loop unrolling or decrease the unroll factor using a directive: !DIR$ NOUNROLL or !DIR$ UNROLL.

Example: Disable automatic loop unrolling using !DIR$ SIMD NOUNROLL

!DIR$ NOUNROLL
  do i =1, m
    b(i) = a(i) + 1
    d(i) = c(i) + 1
  enddo

Read More:

Recommendation: Use a smaller vector length

The compiler chose a vector length, but the trip count might be smaller than that vector length. To fix: Specify a smaller vector length using a directive: !DIR$ SIMD VECTORLENGTH.

Example: Specify vector length using !DIR$ SIMD VECTORLENGTH(4)

!DIR$ SIMD VECTORLENGTH(4)
  do i =1, m
    b(i) = a(i) + 1
    d(i) = c(i) + 1
  enddo 

Read More:

Recommendation: Align data

One of the memory accesses in the source loop does not start at an optimally aligned address boundary. To fix: Align the data and tell the compiler the data is aligned. To align data, use __declspec(align()) . To tell the compiler the data is aligned, use __assume_aligned() before the source loop.

Read More:

Recommendation: Add data padding

The trip count is not a multiple of vector length. To fix: Do one of the following:

Windows* OS

Linux* OS

/Qopt-assume-safe-padding

-qopt-assume-safe-padding

Note: These compiler options apply only to Intel® Many Integrated Core Architecture (Intel® MIC Architecture). Option -qopt-assume-safe-padding is the replacement compiler option for-opt-assume-safe-padding, which is deprecated.

When you use one of these compiler options, the compiler does not add any padding for static and automatic objects. Instead, it assumes that code can access up to 64 bytes beyond the end of the object, wherever the object appears in your application. To satisfy this assumption, you must increase the size of static and automatic objects in your application.

Optional: Specify the trip count, if it is not constant, using a directive: !DIR$ LOOP COUNT

Read More:

Recommendation: Collect trip counts data

The Survey Report lacks trip counts data that might generate more precise recommendations. To fix: Run a Trip Counts analysis.

Recommendation: Force vectorized remainder

The compiler did not vectorize the remainder loop, even though doing so could improve performance. To fix: Force vectorization using a directive: !DIR$ SIMD VECREMAINDER or !DIR$ VECTOR VECREMAINDER.

Example: Force the compiler to vectorize the remainder loop using #pragma simd vecremainder
subroutine add(A, N, X)
   integer N, X
   real    A(N)
DIR$ SIMD VECREMAINDER
   do i=x+1, n
      a(i) = a(i) + a(i-x)
   enddo
end

Read More: