Issue: OpenMP function call(s) present

OpenMP* function call(s) in the loop body are preventing the compiler from effectively vectorizing the loop.

Recommendation: Move OpenMP call(s) outside the loop body

OpenMP calls prevent automatic vectorization when the compiler cannot move the calls outside the loop body, such as when OpenMP calls are not invariant. To fix:

Split the OpenMP parallel loop section into two using directives.

Target	Directive
Outer section	!$OMP PARALLEL SECTIONS
Inner section	!$OMP DO NOWAIT

Move the OpenMP calls outside the loop when possible.

Example:

Original code:

!$OMP PARALLEL DO PRIVATE(tid, nthreads)
do k = 1, N
   tid = omp_get_thread_num() ! this call inside loop prevents vectorization
   nthreads = omp_get_num_threads() ! this call inside loop prevents vectorization
   ...
enddo

Revised code:

!$OMP PARALLEL PRIVATE(tid, nthreads)
   ! Move OpenMP calls here
   tid = omp_get_thread_num()
   nthreads = omp_get_num_threads()

   $!OMP DO NOWAIT
   do k = 1, N
      ...
   enddo
!$OMP END PARALLEL

Read More:

Getting Started with Intel Compiler Pragmas and Directives and Vectorization Resources for Intel® Advisor XE Users

Recommendation: Remove OpenMP lock functions

Locking objects slows loop execution. To fix: Rewrite the code without OpenMP lock functions. For example, allocating separate arrays for each thread and then merging them after a parallel section may improve speed (but consume more memory).

Read More:

Getting Started with Intel Compiler Pragmas and Directives and Vectorization Resources for Intel® Advisor XE Users