Intel® C++ Compiler 16.0 User and Reference Guide

Controlling Thread Group and Thread Space Shape

Internally the threads executing a kernel’s loop nest iterations in parallel are organized as thread groups and threads within groups. In the underlying software stack thread groups are two-dimensional and there are options of how to shape thread groups. The shape is fixed – the runtime creates threads of the same shape for all kernels. However, performance of some kernels might depend largely on the thread group shape. For example, 2x8 thread groups execute the kernel much faster than 1x4 thread groups.

Each thread has a 2-dimensional hardware coordinate that you can get using one of the following functions:

Thread groups, in turn, are organized into a 2 dimensional array called the thread space. The function _GFX_set_thread_space_config enables you to shape the thread space as needed, and there are four environment variables that override this function:

For example:

set GFX_THREAD_GROUP_WIDTH=2
set GFX_THREAD_GROUP_HEIGHT=8

To successfully override the default shape, specify both width and height. The product of width x height must not exceed the hardware-specific limit for the size of a thread group. If you exceed the thread group size limit, the runtime issues an error message and aborts execution.

On 4th generation Intel® Core™ Processors and Intel® Xeon® v3 Processors, the limit is 64.

Note

The functions and environment variables described in this topic override the environment variable GFX_MAX_THREAD_COUNT.