Intel® C++ Compiler 16.0 User and Reference Guide
Lets you specify an alternative loop unroll sequence for gather and scatter loops. Option -qopt-gather-scatter-unroll is the replacement option for -opt-gather-scatter-unroll, which is deprecated.
Only available on Intel® 64 architecture targeting the Intel® Xeon Phi™ coprocessor x100 product family (formerly code name Knights Corner)
Linux: | -qopt-gather-scatter-unroll=n -qno-opt-gather-scatter-unroll |
OS X: | None |
Windows: | /Qopt-gather-scatter-unroll:n /Qopt-gather-scatter-unroll- |
n |
Is the unroll factor for the gather and scatter loops. It must be an integer between 0 and 8. If you specify value 0 for n, it is the same as specifying the negative form of the option. |
-qno-opt-gather-scatter-unroll or /Qopt-gather-scatter-unroll |
The compiler uses default heuristics when unrolling gather and scatter loops. |
This option lets you specify an alternative loop unroll sequence for gather and scatter loops.
This option may improve performance of gather/scatter operations.
The value of n that provides the best performance is data-dependent.
In cases where the gather/scatter operation accesses data in a small number of cache-lines (such as 1 or 2), the default sequence (using a small value for n) works best. In cases where each individual data item falls in a different cache-line, it may be better to use a large value for n.
None
Normally, there are no "one-shot" gather/scatter instructions, so the compiler generates a loop to perform complete gather/scatter. By default, the loop looks as follows:
L1: gather jkz L2 gather jknz L1 L2:
For some applications, this loop would be faster if it was unrolled; and different applications may benefit from different unroll factors. Also, when the loop is unrolled, adding gather/scatter hint instructions before the loop provides additional benefits.
If you specify option [q or Q]opt-gather-scatter-unroll, the compiler will generate a similar loop unrolled by the number specified in n.
The following example shows what happens when the -qopt-gather-scatter-unroll=3 (Linux*) or /Qopt-gather-scatter-unroll:3 option (Windows*) is specified. Notice that the alternate sequence also generates two gather/scatter hint instructions preceding the loop:
gather hint gather hint nop L1: gather jkz L2 gather | gather | -> the number of gathers specified by n gather | jknz L1 L2: