Intel® Fortran Compiler 16.0 User and Reference Guide

BLOCK_LOOP and NOBLOCK_LOOP

General Compiler Directives: Enables or disables loop blocking for the immediately following nested DO loops. BLOCK_LOOP enables loop blocking for the nested loops. NOBLOCK_LOOP disables loop blocking for the nested loops.

!DIR$ BLOCK_LOOP [clause[[,] clause]...]

!DIR$ NOBLOCK_LOOP

clause

Is one or more of the following:

  • FACTOR(expr)

    expr

    Is a positive scalar constant integer expression representing the blocking factor for the specified loops.

    This clause is optional. If the FACTOR clause is not present, the blocking factor will be determined based on processor type and memory access patterns and will be applied to the specified levels in the nested loop following the directive.

    At most only one FACTOR clause can appear in a BLOCK_LOOP directive.

  • LEVEL(level [, level]...])

    level

    Is specified in the form:

    const1 or const1:const2

    where const1 is a positive integer constant m <= 8 representing the loop at level m, where the immediate following loop is level 1.

    The const2 is a positive integer constant n <= 8 representing the loop at level n, where n > m: const1:const2 represents the nested loops from level const1 through const2.

    This clause is optional. If the LEVEL clause is not present, the specified blocking factor is applied to all levels of the immediately following nested loops.

    At most only one LEVEL clause can appear in a BLOCK_LOOP directive.

The clauses can be specified in any order. If you do not specify any clause, the compiler chooses the best blocking factor to apply to all levels of the immediately following nested loop.

The BLOCK_LOOP directive lets you exert greater control over optimizations on a specific DO loop inside a nested DO loop.

Using a technique called loop blocking, the BLOCK_LOOP directive separates large iteration counted DO loops into smaller iteration groups. Execution of these smaller groups can increase the efficiency of cache space use and augment performance.

If there is no LEVEL and FACTOR clause, the blocking factor will be determined based on the processor's type and memory access patterns and it will apply to all the levels in the nested loops following this directive.

You can use the NOBLOCK_LOOP directive to tune the performance by disabling loop blocking for nested loops.

Note

The loop-carried dependence is ignored during the processing of BLOCK_LOOP directives.

Example

!dir$ block_loop factor(256) level(1)       ! applies blocking factor 256 to
!dir$ block_loop factor(512) level(2)       !  the top level loop in the following 
                                            !  nested loop and blocking factor 512 to 
                                            !  the 2nd level {1st nested} loop 

!dir$ block_loop factor(256) level(2) 
!dir$ block_loop factor(512) level(1)       ! levels can be specified in any order 

!dir$ block_loop factor(256) level(1:2)     ! adjacent loops can be specified as a range 

!dir$ block_loop factor (256)     ! the blocking factor applies to all levels of loop nest 

!dir$ block_loop                  ! the blocking factor will be determined based on 
                                  !  processor type and memory access patterns and will 
                                  !  be applied to all the levels in the nested loop 
                                  !  following the directive     

!dir$ noblock_loop                ! None of the levels in the nested loop following this 
                                  !  directive will have a blocking factor applied

Consider the following:

!dir$ block_loop factor(256) level(1:2)
do  j = 1,n 
        f  = 0 
       do  i =1,n 
            f  =  f +   a (i) *  b (i) 
        enddo 
         c(j) = c(j) + f 
enddo 

The above code produces the following result after loop blocking:

do jj=1,n/256+1
   do ii = 1,n/256+1
        do j = (jj-1)*256+1, min(jj*256, n)
             f = 0
             do i = (ii-1)*256+1, min(ii*256,n)
                 f = f + a(i) * b(i)
              enddo
              c(j) = c(j) + f
         enddo
    enddo
enddo

See Also