Intel® Fortran Compiler 16.0 User and Reference Guide

Using Automatic Vectorization

Automatic vectorization is supported on IA-32 and Intel® 64 architectures. The information below will guide you in setting up the auto-vectorizer.

Vectorization Speed-up

Where does the vectorization speedup come from? Consider the following sample code fragment, where a, b and c are integer arrays:

Sample Code Fragment

do I=1,MAX
    C(I)=A(I)+B(I)
    end do

If vectorization is not enabled (that is, you compile using O1 or [Q]vec- options), for each iteration, the compiler processes the code such that there is a lot of unused space in the SIMD registers, even though each of the registers could hold three additional integers. If vectorization is enabled (compiled using O2 or higher options), the compiler may use the additional registers to perform four additions in a single instruction. The compiler looks for vectorization opportunities whenever you compile at default optimization (O2) or higher.

Note

Using this option enables vectorization at default optimization levels for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors. The vectorization can also be affected by certain options, such as /arch (Windows*), -m (Linux* and OS X*), or [Q]x.

Tip

To allow comparisons between vectorized and not-vectorized code, disable vectorization using the /Qvec- (Windows*) or -no-vec (Linux* or OS X*) option; enable vectorization using the O2 option.

To get information on whether a loop was vectorized or not, enable generation of the optimization report using the options Qopt-report:1 Qopt-report-phase:vec (Windows) or qopt-report=1 qopt-report-phase=vec (Linux and OS X) options. These options generate a separate report in an *.optrpt file that includes optimization messages. In Visual Studio, the program source is annotated with the report's messages, or you can read the resulting .optrpt file using a text editor. A message appears for every loop that is vectorized, such as:

Example: Vectorization Report

> ifort /Qopt-report1 matvec.f90
> type matvec.optrpt
…
   LOOP BEGIN at C:\Projects\vec_samples\matvec.f90(38,6)
      remark #15300: LOOP WAS VECTORIZED
   LOOP END

The source line number (38 in the above example) refers to either the beginning or the end of the loop.

To get details about the type of loop transformations and optimizations that took place, use the [Q]opt-report-phase option by itself or along with the [Q]opt-report option.

How significant is the performance enhancement? To evaluate performance enhancement yourself, run vec_samples:

  1. Open an Intel® Compiler command line window.

    • On Windows*: Under the Start menu item for your Intel product, select an icon under Compiler and Performance Libraries > Command Prompt with Intel Compiler

    • On Linux* and OS X*: Source an environment script such as compilervars.sh or the compilervars.csh in the <installdir>/bin directory and use the attribute appropriate for the architecture.

  2. Navigate to the <install-dir>\Samples\<locale>\Fortran\ directory. On Windows, unzip the sample project vec_samples.zip to a writable directory. This small application multiplies a vector by a matrix using the following loop:

    Example: Vector Matrix Multiplication

         do i=1,size1
            c(i) = c(i) + a(i,j) * b(j)
         end do
  3. Build and run the application, first without enabling auto-vectorization. The default O2 optimization enables vectorization, so you need to disable it with a separate option. Note the time taken for the application to run.

    Example: Building and Running an Application without Auto-vectorization

    // (Linux* and OS X* with EDG compiler)
    ifort -no-vec  driver.f90 matvec.f90 -o NoVectMult
    ./NoVectMult
    // (Windows*)
    ifort /Qvec- driver.f90 matvec.f90 /exe:NoVectMult
    NoVectMult
  4. Now build and run the application, this time with auto-vectorization. Note the time taken for the application to run.

    Example: Building and Running an Application with Auto-vectorization

    // (Linux* and OS X* with EDG compiler)
    ifort driver.f90 matvec.f90 -o VectMult 
    ./VectMult
    // (Windows*)
    ifort driver.f90 matvec.f90 /exe:VectMult
    VectMult

When you compare the timing of the two runs, you may see that the vectorized version runs faster. The time for the non-vectorized version is only slightly faster than would be obtained by compiling with the O1 option.

Obstacles to Vectorization

The following do not always prevent vectorization, but frequently either prevent it or cause the compiler to decide that vectorization would not be worthwhile.

See Also