Intel® Advisor Help
OpenMP* function call(s) in the loop body are preventing the compiler from effectively vectorizing the loop.
OpenMP calls prevent automatic vectorization when the compiler cannot move the calls outside the loop body, such as when OpenMP calls are not invariant. To fix:
Target | Directive |
---|---|
Outer section | !$OMP PARALLEL SECTIONS |
Inner section | !$OMP DO NOWAIT |
Example:
Original code:
!$OMP PARALLEL DO PRIVATE(tid, nthreads) do k = 1, N tid = omp_get_thread_num() ! this call inside loop prevents vectorization nthreads = omp_get_num_threads() ! this call inside loop prevents vectorization ... enddo
Revised code:
!$OMP PARALLEL PRIVATE(tid, nthreads) ! Move OpenMP calls here tid = omp_get_thread_num() nthreads = omp_get_num_threads() $!OMP DO NOWAIT do k = 1, N ... enddo !$OMP END PARALLEL
Read More:
Locking objects slows loop execution. To fix: Rewrite the code without OpenMP lock functions. For example, allocating separate arrays for each thread and then merging them after a parallel section may improve speed (but consume more memory).
Read More: