Intel® Fortran Compiler 16.0 User and Reference Guide
User-mandated or SIMD vectorization supplements automatic vectorization just like OpenMP* parallelization supplements automatic parallelization. The following figure illustrates this relationship. User-mandated vectorization is implemented as a single-instruction-multiple-data (SIMD) feature and is referred to as SIMD vectorization.
The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors. The vectorization can also be affected by certain options, such as /arch (Windows*), -m (Linux* and OS X*), or [Q]x.
The following figure illustrates how SIMD vectorization is positioned among various approaches that you can take to generate vector code that exploits vector hardware capabilities. The programs written with SIMD vectorization are very similar to those written using auto-vectorization hints. You can use SIMD vectorization to minimize the amount of code changes that you may have to go through in order to obtain vectorized code.
SIMD vectorization uses the !DIR$ SIMD directive to effect loop vectorization. You must add this directive to a loop and recompile for the loop to get vectorized (the option [Q]simd is enabled by default).
Consider an example in Fortran where the compiler does not automatically vectorize the loop due to the unknown data dependence distance "X". You can use the data dependence assertion via the auto-vectorization hint !DIR$ IVDEP, to let the compiler decide to vectorize the loop or not, or you can enforce vectorization of the loop using !DIR$ SIMD.
Example: without !DIR$ SIMD |
---|
[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) DO I=X+1, N A(I) = A(I) + A(I-X) ENDDO end |
[D:/simd] ifort example1.f -nologo -Qvec-report2 D:\simd\example1.f(6): (col. 9) remark: loop was not vectorized: existence of vector dependence. |
Example: with !DIR$ SIMD |
[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) !DIR$ SIMD DO I=X+1, N A(I) = A(I) + A(I-X) ENDDO end |
[D:/simd] ifort example1.f -nologo -Qvec-report2 -Qsimd D:\simd\example1.f(7): (col. 9) remark: LOOP WAS VECTORIZED. |
The one big difference between using the SIMD directive and auto-vectorization hints is that with the SIMD directive, the compiler generates a warning when it is unable to vectorize the loop. With auto-vectorization hints, actual vectorization is still under the discretion of the compiler, even when you use the !DIR$ VECTOR ALWAYS hint.
The SIMD directive has optional clauses to guide the compiler on how vectorization must proceed. Use these clauses appropriately so that the compiler obtains enough information to generate correct vector code. For more information on the clauses, see the !DIR$ SIMD description.
Note the following points when using !DIR$ SIMD directive.
Consider the following Intel® Visual Fortran example code for a program to compare serial and vector computations using a user-defined function, foo().
All code examples in this section are applicable for Fortran on Windows* only.
Example: Where user-defined function is not vectorized |
---|
!! file simdmain.f90 program simdtest ! Test vector function in external file. implicit none interface integer function foo(a, b) integer a, b end function foo end interface integer, parameter :: M = 48, N = 64 integer i, j integer, dimension(M,N) :: a1 integer, dimension(M,N) :: a2 integer, dimension(M,N) :: s_a3 integer, dimension(M,N) :: v_a3 logical :: err_flag = .false. ! compute random numbers for arrays do j = 1, N do i = 1, M a1(i,j) = rand() * M a2(i,j) = rand() * M end do end do ! compute serial results do j = 1, N !dir$ novector do i = 1, M s_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compute vector results do j = 1, N do i = 1, M v_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compare serial and vector results do j = 1, N do i = 1, M if (s_a3(i,j) .ne. v_a3(i,j)) then err_flag = .true. print *, s_a3(i, j), v_a3(i,j) end if end do end do if (err_flag .eq. .true.) then write(*,*) "FAILED" else write(*,*) "PASSED" end if end program !! file: vecfoo.f90 integer function foo(a, b) implicit none integer, intent(in) :: a, b foo = a - b end function |
[49 C:/temp] ifort -Qvec-report simdmain.f90 vecfoo.f90 simdmain.f90 vecfoo.f90 C:\temp\simdmain.f90(3): (col. 3) remark: loop was not vectorized: existence of vector dependence. C:\temp\vecfoo.f90(3): (col. 3) remark: function was not vectorized. |
When you compile the above code, the loop containing the foo() function is not auto-vectorized because the auto-vectorizer does not know what foo() does unless it is inlined to this call site.
In such cases where the function call is not inlined, you can use the !DIR$ attributes vector::function-name-list declaration to vectorize the loop and the function foo(). All you need to do is add the vector declaration to the function declaration, and recompile the code. The loop and function are vectorized.
Example: Where loop with user-defined function with vector declaration is auto-vectorized |
---|
!! file simdmain.f90 program simdtest ! Test vector function in external file. implicit none interface integer function foo(a, b) !dir$ attributes vector :: foo integer a, b end function foo end interface integer, parameter :: M = 48, N = 64 integer i, j integer, dimension(M,N) :: a1 integer, dimension(M,N) :: a2 integer, dimension(M,N) :: s_a3 integer, dimension(M,N) :: v_a3 logical :: err_flag = .false. ! compute random numbers for arrays do j = 1, N do i = 1, M a1(i,j) = rand() * M a2(i,j) = rand() * M end do end do ! compute serial results do j = 1, N !dir$ novector do i = 1, M s_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compute vector results do j = 1, N do i = 1, M v_a3(i,j) = foo(a1(i,j), a2(i,j)) end do end do ! compare serial and vector results do j = 1, N do i = 1, M if (s_a3(i,j) .ne. v_a3(i,j)) then err_flag = .true. print *, s_a3(i, j), v_a3(i,j) end if end do end do if (err_flag .eq. .true.) then write(*,*) "FAILED" else write(*,*) "PASSED" end if end program !! file: vecfoo.f90 integer function foo(a, b) !dir$ attributes vector :: foo implicit none integer, intent(in) :: a, b foo = a - b end function |
[49 C:/temp] ifort -Qvec-report simdmain.f90 vecfoo.f90 simdmain.f90 vecfoo.f90 C:\temp\simdmain.f90(3): (col. 3) remark: LOOP WAS VECTORIZED. C:\temp\vecfoo.f90(3): (col. 3) remark: FUNCTION WAS VECTORIZED. |
Vectorization depends on two major factors: hardware and the style of source code. When using the vector declaration, the following features are not allowed:
Thread creation and joining through _Cilk_spawn, _Cilk_for, OpenMP* parallel/for/sections/task, and explicit threading API calls
Using setjmp, longjmp, EH, SEH
Inline ASM code and VML
Calling non-vector functions (note that all SVML functions are considered vector functions)
Locks, barriers, atomic construct, critical sections (presumably this is a special case of the previous one).
The GOTO statement
Intrinsics (for example, SVML intrinsics)
Function call through function pointer and virtual function
Any loop/array notation constructs
Struct access
The computed GOTO statement is not supported
Formal parameters must be of the following data types: