User-mandated or SIMD Vectorization

User-mandated or SIMD vectorization supplements automatic vectorization just like OpenMP* parallelization supplements automatic parallelization. The following figure illustrates this relationship. User-mandated vectorization is implemented as a single-instruction-multiple-data (SIMD) feature and is referred to as SIMD vectorization.

Note

The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors. The vectorization can also be affected by certain options, such as /arch (Windows*), -m (Linux* and OS X*), or [Q]x.

The following figure illustrates how SIMD vectorization is positioned among various approaches that you can take to generate vector code that exploits vector hardware capabilities. The programs written with SIMD vectorization are very similar to those written using auto-vectorization hints. You can use SIMD vectorization to minimize the amount of code changes that you may have to go through in order to obtain vectorized code.

SIMD vectorization uses the !DIR$ SIMD directive to effect loop vectorization. You must add this directive to a loop and recompile for the loop to get vectorized (the option [Q]simd is enabled by default).

Consider an example in Fortran where the compiler does not automatically vectorize the loop due to the unknown data dependence distance "X". You can use the data dependence assertion via the auto-vectorization hint !DIR$ IVDEP, to let the compiler decide to vectorize the loop or not, or you can enforce vectorization of the loop using !DIR$ SIMD.

Example: without !DIR$ SIMD
[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) DO I=X+1, N A(I) = A(I) + A(I-X) ENDDO end
[D:/simd] ifort example1.f -nologo -Qvec-report2 D:\simd\example1.f(6): (col. 9) remark: loop was not vectorized: existence of vector dependence.
Example: with !DIR$ SIMD
[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) !DIR$ SIMD DO I=X+1, N A(I) = A(I) + A(I-X) ENDDO end
[D:/simd] ifort example1.f -nologo -Qvec-report2 -Qsimd D:\simd\example1.f(7): (col. 9) remark: LOOP WAS VECTORIZED.

The one big difference between using the SIMD directive and auto-vectorization hints is that with the SIMD directive, the compiler generates a warning when it is unable to vectorize the loop. With auto-vectorization hints, actual vectorization is still under the discretion of the compiler, even when you use the !DIR$ VECTOR ALWAYS hint.

The SIMD directive has optional clauses to guide the compiler on how vectorization must proceed. Use these clauses appropriately so that the compiler obtains enough information to generate correct vector code. For more information on the clauses, see the !DIR$ SIMD description.

Additional Semantics

Note the following points when using !DIR$ SIMD directive.

A variable may belong to at most one of private, linear, or reduction (or none of them).
Within the vector loop, an expression is evaluated as a vector value if it is private, linear, reduction, or it has a sub-expression that is evaluated to a vector value. Otherwise, it is evaluated as a scalar value (that is, broadcast the same value to all iterations). Scalar value does not necessarily mean loop invariant, although that is the most frequently seen usage pattern of scalar value.
A vector value may not be assigned to a scalar L-value. It is an error.
A scalar L-value may not be assigned under a vector condition. It is an error.
The computed GOTO statement is not supported.

Using vector Declaration

Consider the following Intel® Visual Fortran example code for a program to compare serial and vector computations using a user-defined function, foo().

Note

All code examples in this section are applicable for Fortran on Windows* only.

Example: Where user-defined function is not vectorized
!! file simdmain.f90 program simdtest ! Test vector function in external file. implicit none interface integer function foo(a, b) integer a, b end function foo end interface integer, parameter :: M = 48, N = 64 integer i, j integer, dimension(M,N) :: a1 integer, dimension(M,N) :: a2 integer, dimension(M,N) :: s_a3 integer, dimension(M,N) :: v_a3 logical :: err_flag = .false. ! compute random numbers for arrays do j = 1, N do i = 1, M a1(i,j) = rand() * M a2(i,j) = rand() * M end do end do ! compute serial results do j = 1, N !dir$ novector do i = 1, M s_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compute vector results do j = 1, N do i = 1, M v_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compare serial and vector results do j = 1, N do i = 1, M if (s_a3(i,j) .ne. v_a3(i,j)) then err_flag = .true. print , s_a3(i, j), v_a3(i,j) end if end do end do if (err_flag .eq. .true.) then write(,) "FAILED" else write(,*) "PASSED" end if end program !! file: vecfoo.f90 integer function foo(a, b) implicit none integer, intent(in) :: a, b foo = a - b end function
[49 C:/temp] ifort -Qvec-report simdmain.f90 vecfoo.f90 simdmain.f90 vecfoo.f90 C:\temp\simdmain.f90(3): (col. 3) remark: loop was not vectorized: existence of vector dependence. C:\temp\vecfoo.f90(3): (col. 3) remark: function was not vectorized.

Example: Where user-defined function is not vectorized

!! file simdmain.f90 
program simdtest 
! Test vector function in external file.
 implicit none
 interface
   integer function foo(a, b)
   integer a, b
   end function foo
 end interface
 
 integer, parameter :: M = 48, N = 64
 
  integer  i, j
  integer, dimension(M,N) :: a1  
  integer, dimension(M,N) :: a2
  integer, dimension(M,N) :: s_a3
  integer, dimension(M,N) :: v_a3 
logical :: err_flag = .false.
 
! compute random numbers for arrays
 do j = 1, N
  do i = 1, M
   a1(i,j) = rand() * M
   a2(i,j) = rand() * M
  end do
 end do
 
 ! compute serial results
 do j = 1, N 
!dir$ novector 
  do i = 1, M
   s_a3(i,j) = foo(a1(i,j), a2(i,j))
  end do
 end do
 
 ! compute vector results
  do j = 1, N 
   do i = 1, M
    v_a3(i,j) = foo(a1(i,j), a2(i,j))
   end do
  end do
 
 ! compare serial and vector results
 do j = 1, N 
  do i = 1, M
   if (s_a3(i,j) .ne. v_a3(i,j)) then
    err_flag = .true. 
    print *, s_a3(i, j), v_a3(i,j)
   end if
  end do 
 end do
 if (err_flag .eq. .true.) then
  write(*,*) "FAILED"
   else
  write(*,*) "PASSED"
 end if 
end program
 
!! file: vecfoo.f90 
integer function foo(a, b)
 implicit none
 integer, intent(in) :: a, b
  foo = a - b 
end function

[49 C:/temp] ifort -Qvec-report simdmain.f90 vecfoo.f90 simdmain.f90 vecfoo.f90 
  C:\temp\simdmain.f90(3): (col. 3) remark: loop was not vectorized: existence of vector dependence. 
  C:\temp\vecfoo.f90(3): (col. 3) remark: function was not vectorized.

When you compile the above code, the loop containing the foo() function is not auto-vectorized because the auto-vectorizer does not know what foo() does unless it is inlined to this call site.

In such cases where the function call is not inlined, you can use the !DIR$ attributes vector::function-name-list declaration to vectorize the loop and the function foo(). All you need to do is add the vector declaration to the function declaration, and recompile the code. The loop and function are vectorized.

Example: Where loop with user-defined function with vector declaration is auto-vectorized
!! file simdmain.f90 program simdtest ! Test vector function in external file. implicit none interface integer function foo(a, b) !dir$ attributes vector :: foo integer a, b end function foo end interface integer, parameter :: M = 48, N = 64 integer i, j integer, dimension(M,N) :: a1 integer, dimension(M,N) :: a2 integer, dimension(M,N) :: s_a3 integer, dimension(M,N) :: v_a3 logical :: err_flag = .false. ! compute random numbers for arrays do j = 1, N do i = 1, M a1(i,j) = rand() * M a2(i,j) = rand() * M end do end do ! compute serial results do j = 1, N !dir$ novector do i = 1, M s_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compute vector results do j = 1, N do i = 1, M v_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compare serial and vector results do j = 1, N do i = 1, M if (s_a3(i,j) .ne. v_a3(i,j)) then err_flag = .true. print , s_a3(i, j), v_a3(i,j) end if end do end do if (err_flag .eq. .true.) then write(,) "FAILED" else write(,*) "PASSED" end if end program !! file: vecfoo.f90 integer function foo(a, b) !dir$ attributes vector :: foo implicit none integer, intent(in) :: a, b foo = a - b end function
[49 C:/temp] ifort -Qvec-report simdmain.f90 vecfoo.f90 simdmain.f90 vecfoo.f90 C:\temp\simdmain.f90(3): (col. 3) remark: LOOP WAS VECTORIZED. C:\temp\vecfoo.f90(3): (col. 3) remark: FUNCTION WAS VECTORIZED.

Example: Where loop with user-defined function with vector declaration is auto-vectorized

!! file simdmain.f90 
program simdtest 
! Test vector function in external file.
 implicit none
 interface
   integer function foo(a, b) 
!dir$ attributes vector :: foo
   integer a, b
   end function foo
 end interface
 
 integer, parameter :: M = 48, N = 64
 
  integer  i, j
  integer, dimension(M,N) :: a1  
  integer, dimension(M,N) :: a2
  integer, dimension(M,N) :: s_a3
  integer, dimension(M,N) :: v_a3 
logical :: err_flag = .false.
 
! compute random numbers for arrays
 do j = 1, N
  do i = 1, M
   a1(i,j) = rand() * M
   a2(i,j) = rand() * M
  end do
 end do
 
 ! compute serial results
 do j = 1, N 
!dir$ novector 
  do i = 1, M
   s_a3(i,j) = foo(a1(i,j), a2(i,j))
  end do
 end do
 
 ! compute vector results
  do j = 1, N 
   do i = 1, M
    v_a3(i,j) = foo(a1(i,j), a2(i,j))
   end do
  end do
 
 ! compare serial and vector results
 do j = 1, N 
  do i = 1, M
   if (s_a3(i,j) .ne. v_a3(i,j)) then
    err_flag = .true. 
    print *, s_a3(i, j), v_a3(i,j)
   end if
  end do 
 end do
 if (err_flag .eq. .true.) then
  write(*,*) "FAILED"
   else
  write(*,*) "PASSED"
 end if 
end program
 
!! file: vecfoo.f90 
integer function foo(a, b) 
!dir$ attributes vector :: foo
 implicit none
 integer, intent(in) :: a, b
  foo = a - b 
end function

[49 C:/temp] ifort -Qvec-report simdmain.f90 vecfoo.f90 simdmain.f90 vecfoo.f90 
  C:\temp\simdmain.f90(3): (col. 3) remark: LOOP WAS VECTORIZED. 
  C:\temp\vecfoo.f90(3): (col. 3) remark: FUNCTION WAS VECTORIZED.

Restrictions on Using `vector` declaration

Vectorization depends on two major factors: hardware and the style of source code. When using the vector declaration, the following features are not allowed:

Thread creation and joining through _Cilk_spawn, _Cilk_for, OpenMP* parallel/for/sections/task, and explicit threading API calls
Using setjmp, longjmp, EH, SEH
Inline ASM code and VML
Calling non-vector functions (note that all SVML functions are considered vector functions)
Locks, barriers, atomic construct, critical sections (presumably this is a special case of the previous one).
The GOTO statement
Intrinsics (for example, SVML intrinsics)
Function call through function pointer and virtual function
Any loop/array notation constructs
Struct access
The computed GOTO statement is not supported

Formal parameters must be of the following data types:

(un)signed 8, 16, 32, or 64-bit integer
32- or 64-bit floating point
64- or 128-bit complex

User-mandated or SIMD Vectorization

Note

Additional Semantics

Using vector Declaration

Note

Restrictions on Using vector declaration

See Also

Restrictions on Using `vector` declaration