Unsupported data type

Causes:

C++ Example:
truct char4 {
    char c1;
    char c2;
    char c3;
    char c4;
};

extern struct char4 *a;
void vecmsg_testcore003 ()
{
    int i;
    const struct char4 n = {0, 0, 0, 0};
    #pragma omp simd
    for(i = 0; i < 1024; i++) {
        a[i] = n;
    }
}

Recommendations

Recommendations
  • Provide struct assignment operators in terms of scalars. For example:
    inline char4 operator=(const char4 &x) {
        char4 temp;
        temp.c1 = x.c1;
        temp.c2 = x.c2;
        temp.c3 = x.c3;
        temp.c4 = x.c4;
        return temp;
    }
  • Use standard data types.
  • Use instruction sets that support wider vectors.
Read More:

Not inner loop

Cause: In nested loop structures, the compiler targets the innermost loop for vectorization. The outer loop, by default, is not a target for vectorization; however, it may be a target for parallelization.
C++ Example:

#include <iostream>
#define N 25
int main()
{
    int a[N][N], b[N], i;
    for(int j = 0; j < N; j++)
    {
        for(int i = 0; i < N; i++)
            a[j][i] = 0;
        b[j] = 1;
    }
    int sum = __sec_reduce_add(a[:][:]) + __sec_reduce_add(b[:]);
    return 0;
}

Recommendation

Recommendation
In some cases it is possible to collapse a nested loop structure into a single loop structure using a directive before the outer loop. The n argument is an integer that specifies how many loops to collapse into one loop for vectorization.
Target ICL/ICC/ICPC Directive IFORT Directive
Outer loop #pragma omp simd collapse(n), #pragma omp simd, or #pragma simd !$OMP SIMD COLLAPSE(n), !$OMP SIMD, or !DIR$ SIMD
Read More C++ Information: Read More Fortran Information:

Remainder loop vectorization possible but seems inefficient

Cause: The compiler vectorizer determined the remainder loop will not benefit from vectorization.
C++ Example:

#include < iostream >
#define N 70
int main() {
    static short tab1[N],
    tab2[N];
    int i, j;
    static short const data[] = {32768, -256, -255, -128, -127, -1, 0, 1, 127, 128, 255, 256, 32767};
    for (j = i = 0; i < N; i++)
    {
        tab1[i] = i;
        tab2[i] = data[j++];
        if (j > 12) j = 0;
    }
    int sum = __sec_reduce_add(tab1[:]) + __sec_reduce_add(tab2[:]);
    return 0;
}

Recommendations

Recommendations
  • Force remainder vectorization using a directive before the loop:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source loop #pragma vector vecremainder !DIR$ SIMD VECREMAINDER
  • Disable remainder vectorization using a directive before the loop:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source loop #pragma vector novecremainder !DIR$ SIMD NOVECREMAINDER
Read More C++ Information: Read More Fortran Information:

Loop vectorization possible but seems inefficient

Cause: The compiler vectorizer determined the loop will not benefit from vectorization. Common reasons include:

C++ Example: The compiler vectorizer determines the cost of creating a vector operand (non-unit stride access in the vector operand creation) is significant when compared to the number/type of computations in which those vector operands are used.
 #include <iostream>
#define N 100
struct s1 {
    int a, b, c;
}
int main() {
    s1 arr[N], sum;
    for(int i = 0; i < N; i++) {
        sum.a += arr[i].a;
        sum.b += arr[i].b;
        sum.c += arr[i].c;
    }
    std::cout << sum.a << "t" << sum.b << "t" << sum.c << "n";
    return 0;
}

Recommendations

Recommendations
  • If you still believe vectorization might result in a speedup, override the compiler cost model using a directive before the loop
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source loop #pragma vector or #pragma vector always !DIR$ VECTOR or !DIR$ VECTOR ALWAYS
    Alternatively, use a compiler option to always vectorize loops. The compiler will still test for dependencies and will not vectorize the loop unless it is safe.
    Windows* OS - ICL and IFORT Option Linux* OS - ICC/ICPC and IFORT Option
    /Qvec-threshold0 -vec-threshold0
  • Require vectorization using a directive before the loop. The compiler will not perform a dependency analysis; it is your responsibility to ensure vectorization is safe:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source loop #pragma simd or #pragma omp simd !DIR$ SIMD or !$OMP SIMD
  • Rewrite the data structure/loop to have more regular memory accesses.
Read More C++ Information: Read More Fortran Information:

Conditional assignment to a scalar

Causes:

C++ Example:
void foo(int *A, int *restrict B, int n, int* x) {
    int i;

    #pragma omp simd
    for (i = 0; i < n; i++)
    {
        if (A[i] > i)
            *x = i;
        else
            B[i] = *x;
    }

    B[i] = *x++;
}

Recommendations

Recommendations
Simplify or remove conditions in the loop by:
  • Dividing the loop into a group of sequential loops
  • Or using multiple temporary variables instead of one scalar variable
Read More:

Assumed dependence between lines

C++ Example: When the compiler tries to vectorize for SSE2 architecture, it chooses a vector length of 4 (because the data type it operates on is int). But when considering a vector operand instead of scalar operands for this loop, there is an overlap between the input vector and output vector. Anti-dependency occurs when the k value is positive; true dependency occurs when k value is negative.
 #include < stdlib.h >
#define N 70
int main(int argc, char *argv[])
{
    int k = atoi(argv[1]);
    int a[N], i;
    for(i = abs(k); i < N; i++)
        a[i] = a[i+k] + 1;
    return 0;
}

Recommendations

Recommendations
  • Rewrite code to remove dependencies.
  • Run a Dependencies analysis to check if the loop has real dependencies.
  • If no dependencies exist, use one of the following to tell the compiler it is safe to vectorize:
    • Directive to prevent all dependencies in the loop
      Target ICL/ICC/ICPC Directive IFORT Directive
      Source Loop #pragma simd or #pragma omp simd !DIR$ SIMD or !$OMP SIMD
    • Directive to ignore only vector dependencies (which is safer)
      Target ICL/ICC/ICPC Directive IFORT Directive
      Source Loop #pragma ivdep !DIR$ IVDEP
    • restrict keyword
  • If anti-dependency exists, use a directive where k is smaller than the distance between dependent items in anti-dependency. This enables vectorization, as dependent items are put into different vectors:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source Loop #pragma simd vectorlength(k) !DIR$ SIMD VECTORLENGTH(k)
Read More C++ Information: Read More Fortran Information:

Non-standard loop is not a vectorization candidate (C++)

Causes:

Below are examples for all three scenarios.
C++ Example 1: There is more than one loop exit point.
void no_vec(float a[], float b[], float c[])
{
    int i = 0.;
    while (i < 100) {
        a[i] = b[i] * c[i];
        // this is a data-dependent exit condition:
        if (a[i] < 0.0)
            break;
        ++i;
    }
}
Exception: Loops searching for an array element, as in the example below, can be automatically vectorized when array a[i] is aligned.
for (i = 0; i < n; ++i) {
    if (a[i] == to_find) {
        index = I;
        break;
    }
}
C++ Example 2: A SIMD loop uses C++ exception handling or an OpenMP critical construct.
#define N 1000
int foo() {
#pragma omp simd
    for (int i = 0; i < N; i++) {
        try {
            printf ("throw exception 11\n");
            throw 11;
        }
        catch (int t) {
            printf ("caught exception %d\n", t);
            if (t != 11) {
#pragma omp critical
                {
                    printf ("TEST FAILED\n");
                    exit (0);
                }
            }
        }
    }
    printf ("TEST PASSED\n");
    exit (0);
}
C++ Example 3: The compiler cannot determine which function is passed as a function parameter.
#include <iostream>
int a[100];
int b[100];

int g(int i, int y) {
    return b[i]+y;
}

__declspec(noinline) void doit1(int x(int,int), int y) {
    int i;
#pragma parallel
    for(i = 0; i < 100; i++)
        a[i] = x(i,y);
}

Recommendations

Recommendations
  • For Example 1, where there is more than one loop exit point: Ensure loops have a single entry and a single exit point.
  • For Example 2, where a SIMD loop uses C++ exception handling or an OpenMP critical construct: Remove C++ exception handling and OpenMP critical recommendations from loops.
  • For Example 3, where the compiler cannot determine which function is passed as a function parameter: There is no resolution unless you can tell the compiler during compile time which function will be called within the loop body.
Read More:

Non-standard loop is not a vectorization candidate (Fortran)

Causes:

Below are examples for the first three scenarios.
Fortran Example 1: There is more than one loop exit point.
subroutine d_15043(a,b,c,n)
    implicit none
    real, intent(in ), dimension(n) :: a, b
    real, intent(out), dimension(n) :: c
    integer, intent(in)             :: n
    integer                         :: i

    do i=1,n
        if(a(i) < 0.) exit
        c(i) = sqrt(a(i)) * b(i)
    enddo
end subroutine d_15043
Fortran Example 2: The iteration count is data dependent.
subroutine d_15043_2(a,b,c,n)
    implicit none
    real, intent(in ), dimension(n) :: a, b
    real, intent(out), dimension(n) :: c
    integer, intent(in)             :: n
    integer                         :: i

    i = 0
    do while (a(i) > 0.)
        c(i) = sqrt(a(i)) * b(i)
        i = i + 1
    enddo
end subroutine d_15043_2
Fortran Example 3: The loop contains a subroutine or function that prevents vectorization.
subroutine d_15043_3(a,b,c,n)
    implicit none
    real, intent(in ), dimension(n) :: a, b
    real, intent(out), dimension(n) :: c
    integer, intent(in)             :: n
    integer                         :: i

    do i=1,n
        call my_sub(a(i),b(i),c(i))
    enddo
end subroutine d_15043_3

Recommendations

Recommendations
  • For Example 1, where there is more than one loop exit point: Ensure:
    • The loop has a single entry and a single exit point.
    • The iteration count is constant and known to the loop on entry.
    This loop can be vectorized if you replace exit with cycle, although the behavior is different.
  • For Example 2, where the iteration count is data dependent: Replace the do while construct with a counted do loop. For example:
    do i=1,n
        if(a(i) > 0.) c(i) = sqrt(a(i)) * b(i)
    enddo
    If necessary, the iteration count can be pre-computed.
  • For Example 3, where the loop contains a subroutine or function call that prevents vectorization: Do one of the following:
    • Inline the subroutine. For example: Use interprocedural optimization.
    • Convert to a SIMD-enabled subroutine. For example: Use the !$OMP DECLARE SIMD directive.

Read More:

Vector dependence prevents vectorization

Cause: The compiler detected or assumed a vector dependence in the loop.
C++ Example:

int foo(float *A, int n) {
    int inx = 0;
    float max = A[0];
    int i;
    for (i=0;i < n;i++) {
        if (max < A[i]) {
            max = A[i];
            inx = i*i;
        }
    }
    return inx;
} 
Fortran Example:
integer function foo(a, n)
    implicit none
    integer, intent(in) :: n
    real, intent(inout) :: a(n)
    real :: max
    integer :: inx, i

    max = a(0)
    do i=1,n
        if (max < a(i)) then
            max = a(i)
            inx = i*i
        endif
    end do

    foo = inx
end function

Recommendations

Recommendations
  • Rewrite code to remove dependencies.
  • Run a Dependencies analysis to check if the loop has real dependencies. There are two types of dependencies:
    • True dependency - Read after write (RAW)
    • Anti-dependency - Write after read (WAR)
  • If no dependencies exist, use one of the following to tell the compiler it is safe to vectorize:
    • Directive to prevent all dependencies in the loop
      Target ICL/ICC/ICPC Directive IFORT Directive
      Source Loop #pragma simd or #pragma omp simd !DIR$ SIMD or !$OMP SIMD
    • Directive to ignore only vector dependencies (which is safer)
      Target ICL/ICC/ICPC Directive IFORT Directive
      Source Loop #pragma ivdep !DIR$ IVDEP
    • restrict keyword
  • If anti-dependency exists, use a directive where k is smaller than the distance between dependent items in anti-dependency. This enables vectorization, as dependent items are put into different vectors:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source Loop #pragma simd vectorlength(k) !DIR$ SIMD VECTORLENGTH(k)
Read More C++ Information: Read More Fortran Information:

Call to function cannot be vectorized (C++)

Causes:

C++ Example:
#include <iostream>
#include <complex>
using namespace std;
int main() {
    float c[10];
    c[:] = 0.f;
    for(int i = 0; i < 10; i++)
        cout << c[i] << "n";
    return 0;
}

Recommendations

Recommendations
If possible, define a vector version for the function using a construct:
Target ICL/ICC/ICPC Construct
Source function #pragma omp declare simd
Source function _declspec(vector) (Windows OS) or _attribute_(vector) (Linux OS)
Read More:

Call to function cannot be vectorized (Fortran)

Cause: A function call inside the loop is preventing auto-vectorization.
Fortran Example:

Program foo
    implicit none
    integer, parameter  :: nx = 100000000
    real(8)             :: x, xp, sumx
    integer             :: i
    interface
        real(8) function bar(x, xp)
            real(8), intent(in) :: x, xp
        end
    end interface

    sumx = 0.
    xp   = 1.
    do i = 1,nx
        x = 1.D-8*real(i,8)
        sumx = sumx + bar(x,xp)
    enddo
    print *, 'Sum =',sumx
end

real(8) function bar(x, xp)
    implicit none
    real(8), intent(in) :: x, xp

    bar = 1. - 2.*(x-xp) + 3.*(x-xp)**2 - 1.5*(x-xp)**3  + 0.2*(x-xp)**4
    bar = bar / sqrt(x**2 + xp**2)
end

Recommendations

Recommendations
If possible, define a vector version for the function using a construct:
Target IFORT Construct
Source function !DIR$ OMP DECLARE SIMD
Source function ELEMENTAL keyword or !DIR$ ATTRIBUTES VECTOR
In this example you can vectorize the loop and function call using OpenMP* 4.0 or Intel® Cilk™ Plus explicit vector programming capabilities.

Add a !DIR$ OMP DECLARE SIMD directive to the function bar() and compile with the /Qopenmp-simd option to generate a vectorized version of bar() . Add the same directive to the interface block for bar() inside program foo . The UNIFORM clause specifies that xp is a non-varying argument and has the same value for each loop iteration in the caller being vectorized. Thus x is the only vector argument. Without UNIFORM , the compiler must determine if xp could also be a vector argument.
real(8) function bar(x, xp)
!$OMP DECLARE SIMD (bar) UNIFORM(xp)
    implicit none
    real(8), intent(in) :: x, xp

    bar = 1. - 2.*(x-xp) + 3.*(x-xp)**2 - 1.5*(x-xp)**3  + 0.2*(x-xp)**4
    bar = bar / sqrt(x**2 + xp**2)
end
The code now generates a vectorized version of function bar() ; however, the loop inside foo is still not vectorized because the compiler sees dependencies between loop iterations carried by both x and
sumx
. Unaided, the compiler could determine how to auto-vectorize a loop with just these dependencies, or vectorize a loop with just the function call, but not both. We can tell the compiler to vectorize the loop with a !$OMP SIMD directive that specifies the properties of x and sumx :
Program foo
    implicit none
    integer, parameter  :: nx = 100000000
    real(8)             :: x, xp, sumx
    integer             :: i

    interface
        nbsp;real(8) function bar(x, xp)
        !$OMP DECLARE SIMD (bar) UNIFORM(xp)
            real(8), intent(in) :: x, xp
        end
    end interface

    sumx = 0.
    xp   = 1.

    !$OMP SIMD  private(x)  reduction(+:sumx)
    do i = 1,nx
        x = 1.D-8*real(i,8)
        sumx = sumx + bar(x,xp)
    enddo
    print *, 'Sum =',sumx
end
The loop now vectorizes successfully, and running the application shows a performance speedup.

For small functions such as bar() , inlining may be a simpler and more efficient way to achieve vectorization of loops containing function calls. When the caller and callee are in separate source files, as above, build the application with interprocedural optimization ( -ipo or /Qipo ). When the caller and callee are in the same source file, inlining of small functions is enabled by default at optimization level O2 and above.

Read More:

Cannot compute loop iteration count before executing the loop (C++)

Causes:

C++ Example 1: The upper bound of the loop iteration count is controlled by
bar()
, whose implementation is available in this compilation unit. Because the loop iteration count is not available before the loop executes, the compiler cannot determine:
void foo(float *A) {
    int i;
    int OuterCount = 90;
    while (OuterCount > 0) {
        for (i = 1; i < bar(int(A[0])); i++) {
            A[i] = i + 4;
        }
        OuterCount--;
    }
}
C++ Example 2: The compiler cannot determine if there is aliasing between all the pointers used inside the loop and loop boundaries.
struct Dim { int x, y, z; };
Dim dim;
double* B;

void foo (double* A) {
    for (int i = 0; i < dim.x; i++) {
        A[i] = B[i];
    }
}

Recommendations

Recommendations
  • For Example 1, where the loop iteration count is not available before the loop executes: If the loop iteration count and iterations lower bound can be calculated for the whole loop:
    • Move the calculation outside the loop using an additional variable.
    • Rewrite the loop to avoid
      goto
      statements or other early exits from the loop that prevent vectorization.
    • Identify the loop iterations lower bound using a constant.
    For example, introduce the new
    limit
    variable:
    void foo(float *A) {
        int i;
        int OuterCount = 90;
        int limit = bar(int(A[0]));
        while (OuterCount > 0) {
            for (i=1; i < limit; i++) {
                A[i] = i + 4;
            }
            OuterCount--;
        }
    }
  • For Example 2, where the compiler cannot determine if there is aliasing between all the pointers used inside the loop and loop boundaries: Assign the loop boundary value to a local variable. In most cases, this is enough for the compiler to determine aliasing may not occur.

    You can use a directive to accomplish the same thing automatically.
    Target ICL/ICC/ICPC Directive
    Source loop #pragma simd or #pragma omp simd
    Do not use global variables or indirect accesses as loop boundaries unless you also use one of the following:
    • Directive to ignore vector dependencies
      Target ICL/ICC/ICPC Directive
      Source loop #pragma ivdep
    • restrict keyword
Read More:

Cannot compute loop iteration count before executing the loop (Fortran)

Cause: The loop iteration count is not available before the loop executes.
Fortran Example:

subroutine foo(a, n)
    implicit none
    integer, intent(in) :: n
    double precision, intent(inout) :: a(n)
    integer :: bar
    integer :: i

    i=0
    100    CONTINUE
    a(i)=0
    i=i+1
    if (i < bar()) goto 100

end subroutine foo

Recommendations

Recommendations
If the loop iteration count and iterations lower bound can be calculated for the whole loop:
  • Move the calculation outside the loop using an additional variable.
  • Rewrite the loop to avoid
    goto
    statements or other early exits from the loop that prevent vectorization.
  • Identify the loop iterations lower bound using a constant.
Read More:

Volatile assignment was not vectorized

Cause: Any usage of volatile variables in the loop causes this diagnostic.
C++ Example:

volatile int32_t x;
int32_t a[c_size];
for (int32_t i = 0; i < c_size; ++i) {
    a[i] = exp(x + i);
    x = a[i];
}

Recommendations

Recommendations
Avoid using volatile variables. For example, reassign them to regular variables.
Read More:

Compile time constraints prevent loop optimization

Cause: Internal time limits for the optimization level prevented the compiler from determining a vectorization approach for this loop.

Recommendations

Recommendations
When specifying code optimization, use the following compiler option to enable the compiler vectorization engine and provide detailed diagnostics about vectorization possibilities for this loop.
Windows* OS - ICL and IFORT Option Linux* OS - ICC/ICPC and IFORT Option
/O3 -O3
Read More C++ Information: Read More Fortran Information:

Inner loop throttling prevents vectorization of this outer loop

Cause: The inner loop has an irregular structure. For example, it may have non-constant lower and higher bounds, a non-constant step for iterations, more than one entry, some assembly parts, volatile variables, long jumps, or complex switch clauses.

Recommendations

Recommendations
See the inner loop message for more details and simplify the inner loop structure.
Read More:

Outer loop was not auto-vectorized

Cause: The compiler vectorizer determined outer loop vectorization is not possible using auto-vectorization.
C++ Example:

void foo(float **a, float **b, int N) {
    int i, j;
#pragma ivdep
    for (i = 0; i < N; i++) {
        float *ap = a[i];
        float *bp = b[i];
        for (j = 0; j < N; j++) {
            ap[j] = bp[j];
        }
    }
}
Fortran Example:
subroutine foo(a, n1, n)
    implicit none
    integer, intent(in) :: n, n1
    real, intent(inout) :: a(n,n1)
    integer :: i, j
    do i=1,n
        do j=1,n
            a(j,i) = a(j-1,i)+1
        end do
    end do
end subroutine foo

Recommendations

Recommendations
  • Run a Dependencies analysis to check if the loop has real dependencies. There are two types of dependencies:
    • True dependency - Read after write (RAW)
    • Anti-dependency - Write after read (WAR)
  • If no dependencies exist, use one of the following to tell the compiler it is safe to vectorize:
    • Directive to prevent all dependencies in the loop
      Target ICL/ICC/ICPC Directive IFORT Directive
      Source Loop #pragma simd or #pragma omp simd !DIR$ SIMD or !$OMP SIMD
    • Directive to ignore only vector dependencies (which is safer)
      Target ICL/ICC/ICPC Directive IFORT Directive
      Source Loop #pragma ivdep !DIR$ IVDEP
    • restrict keyword
  • If anti-dependency exists, use a directive where k is smaller than the distance between dependent items in anti-dependency. This enables vectorization, as dependent items are put into different vectors:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Source Loop #pragma simd vectorlength(k) !DIR$ SIMD VECTORLENGTH(k)
  • If using the O3 compiler option, use a directive before the inner and outer loops to request vectorization of the outer loop:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Inner loop #pragma novector !DIR$ NOVECTOR
    Outer loop #pragma vector always !DIR$ VECTOR ALWAYS
Read More C++ Information: Read More Fortran Information:

Inner loop was already vectorized

Cause: The inner loop in a nested loop is vectorized.
C++ Example:

#define N 1000
float A[N][N];
void foo(int n) {
    int i,j;
    for (i = 0; i < n; i++) {
        for (j = 0; j < n; j++) {
            A[i][j]++;
        }
    }
}
Fortran Example:
subroutine foo(a, n1, n)
    implicit none
    integer, intent(in) :: n, n1
    real, intent(inout) :: a(n1,n1)
    integer :: i, j

    do i=1,n
        do j=1,n
            a(j,i) = a(j,i) + 1
        end do
    end do
end subroutine foo 

Recommendations

Recommendations
Force vectorization of the outer loop:
  • In some cases it is possible to collapse a nested loop structure into a single loop structure using a directive before the outer loop. The n argument is an integer that specifies how many loops to collapse into one loop for vectorization:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Outer loop #pragma omp simd collapse(n), #pragma omp simd, or #pragma simd !$OMP SIMD COLLAPSE(n), !$OMP SIMD, or !DIR$ SIMD
  • If using the O3 compiler option, use a directive before the inner and outer loops to request vectorization of the outer loop:
    Target ICL/ICC/ICPC Directive IFORT Directive
    Inner loop #pragma novector !DIR$ NOVECTOR
    Outer loop #pragma vector always !DIR$ VECTOR ALWAYS
Read More C++ Information: Read More Fortran Information:

Low trip count

Cause: The loop lacks sufficient iterations to benefit from vectorization.
C++ Example:

#define TTT char
TTT A[15];
TTT foo(int n) {
    TTT sum=0;
    int i;
    for (i = 0; i < n; i++) {
        sum+=A[i];
    }
    return sum;
}
Fortran Example:
integer (kind=1) :: A(15), sum, i
sum=0
do i=1,15
    sum=sum+A(i)
end do

Recommendations

Recommendations
  • Rewrite your code to increase the number of loop iterations to fill at least one full vector.
  • Run a Trip Counts analysis to check the number of iterations and loop efficiency. A loop with iterations equal to a power of 2 can vectorize even if the trip count is low.
  • Do not vectorize a loop with so few iterations (because it incurs overhead).
  • Tell the compiler to enforce vectorization using a directive, and compare performance before and after vectorization.
    Target ICL/ICC/ICPC Construct IFORT Construct
    Source loop #pragma omp simd or #pragma simd !$OMP SIMD or !DIR$ SIMD
Read More C++ Information: Read More Fortran Information:

Loop with early exits cannot be vectorized unless it meets search loop idiom criteria

Cause: The compiler did not recognize a search idiom in a loop that may exit early. For example: The loop body contains:

C++ Example:
Early exit
void c15520(float a[], float b[], float c[], int n)
{
    int i;
    for(i=0; i<n; i++)
    {
        if(a[i] < 0.) break;
        c[i] = sqrt(a[i]) * b[i];
    }
}

Exception
// For Compiler 16.1 and higher this example generates Diagnostic 15333 instead
__attribute__((vector)) void f1(double);
int main()
{
    int n = 10000;
    double a[n];
    #pragma simd
    for(int i = 0 ; i < n ; i++)
        f1(a[i]);
}

Fortran Example:
subroutine f15520(a,b,c,n)
  implicit none
  real, intent(in ), dimension(n) :: a, b
  real, intent(out), dimension(n) :: c
  integer, intent(in)             :: n
  integer                         :: i

  do i=1,n
     if(a(i).lt.0.) exit
     c(i) = sqrt(a(i)) * b(i)
  enddo

end subroutine f15520

Recommendations

Recommendations
  • Split the loop into two loops:
    • A search loop that has an early exit but still meets the search idiom criteria
    • A computational loop without early exits
  • Ensure the loop has a single entry and a single exit point.
  • Avoid exceptions within the loop body by marking functions as nothrow .

C++ Example:
Split the loop into a search loop and computational loop.
void c15520(float a[], float b[], float c[])
{
    int i, j;
    for(i=0; i<1000; i++)
    {
        if(a[i] < 0.) break;
    }

    for(j=0; j<i-1; j++)
    {
        c[j] = sqrt(a[j]) * b[j];
    }
}

Mark the function in the loop as nothrow .
__attribute__((vector, nothrow)) void f1(double);
int main()
{
    int n = 10000;
    double a[n];
    #pragma simd
    for(int i = 0 ; i < n ; i++)
        f1(a[i]);
}

Fortran Example:
Split the loop into a search loop and computational loop.
subroutine f15520(a,b,c,n)
    implicit none
    real, intent(in ), dimension(n) :: a, b
    real, intent(out), dimension(n) :: c
    integer, intent(in)             :: n
    integer                         :: i, j

    do i=1,n
        if(a(i).lt.0.) exit
    enddo
         
    do j=1,i-1
        c(j) = sqrt(a(j)) * b(j)
    enddo

end subroutine f15520
Read More C++ Information: Read More Fortran Information:
_______

Exception handling for a call prevents vectorization

Cause: The compiler automatically generates a try block for a program block (that is, code inside {}) when it allocates a large, local object or array on the heap (because the object is too big to allocate on the stack) and a function within the block could throw an exception.
C++ Example:

__attribute__((vector)) void f1(double);
int main()
{
    int n = 10000;
    double a[n];
    #pragma simd
    for(int i = 0 ; i < n ; i++)
        f1(a[i]);
}

Recommendations

Recommendations
Avoid exceptions within a vectorizable loop body by marking functions as nothrow .
__attribute__((vector, nothrow)) void f1(double);

Read More C++ Information:
_______

Non-vectorizable loop instance from multiversioning (C++)

Cause: The compiler doesn't get enough information from the code to create one version of the loop. In the example below, the compiler takes a defensive stand and generates both vectorized and non-vectorized versions of the loop because it assumes memory aliasing (the pointers could be pointing to overlapping memory locations).
C++ Example:

void foo(float *a, float *b, float *c){
    for(int i = 0 ; i < 256; i++)
        c[i] = a[i] * b[i];
    return;
}

Recommendations

Recommendations
If you are sure that there is no memory aliasing, then use __restrict__ keywords to qualify the pointers passed as arguments as non-overlapping in memory.
Read More C++ Information:
_______

Non-vectorizable loop instance from multiversioning (Fortran)

Cause: The compiler doesn't get enough information from the code to create one version of the loop. In the example below, the compiler takes a defensive stand and generates thee versions of the loop, for k=0, k>0k<0. The version for k<0 cannot be safely vectorized because each later iteration may depend on the result of earlier iterations.
Fortran Example:

subroutine add(k, a) 
    integer :: k 
    real :: a(20)
   
      DO i = 1, 20
       a(i) = a(i+k) * 2.0
        end do 
end subroutine add

Recommendations

Recommendations
To override the compiler default behavior, insert the !DIR$ IVDEP directive. The IVDEP directive tells the compiler it can safely ignore potential dependencies, so it does not need to generate special code for the case of k<0.
Read More Fortran Information: