Intel® C++ Compiler 16.0 User and Reference Guide
Creates a device data environment and then executes the construct on that device. This pragma only applies to Intel® MIC Architecture and Intel® Graphics Technology.
#pragma omp target [clause[, clause, ...]] |
structured-block
clause |
Can be any of the following clauses:
|
This pragma creates a device data environment and then executes the computation in the structured block on the device, using the device data environment. The encountering task waits for the computation to complete on the device before it proceeds. When an if(scalar-expression) evaluates to false, the structured block is executed on the host. By default, program execution will wait until the offloaded task is complete. If the nowait clause is present execution may continue without waiting for offloaded task to complete. The depend clause creates a dependency on previous tasks. See the description in the task pragma documentation for more information.
This pragma is supported for Intel® Graphics Technology when the structured block is a parallel loop specified as omp parallel for. This pragma offloads the parallel loop to the processor graphics. When an if(scalar-expr) expression evaluates to false, or when the target device is not available, the structured block executes the loop on the host side.
The mapping is defined by one or more map clauses. The map-type can be any of the following:
alloc
A new variable with an undefined value that corresponds to each list item created on the device.
to
A new variable corresponding to each list item is created on the device, initialized from the list item on the host.
from
The value of the device's version of each list item is copied back from the device to the host.
tofrom
A variable initialized from the variable on the host that corresponds to each list item is created on the device. The variable on the device at the end of the target region is then copied back to the variable on the host.
For Intel® Graphics Technology, physical memory is shared between the CPU and the processor graphics. The map-type values from and tofrom map to the nocopy implementation, except for direct access to global data objects. On entry to the device region the physical memory for the list items are saved in memory until exit from the device region. This reduces the offload overhead by avoiding the copy and maintains the behavior. Any CPU access to the same memory must be synchronized to avoid race conditions for both the copy and nocopy cases.
Example: Creating a device data environment and executing the structured block on the device |
---|
#pragma omp target map(double dist, double x1, double y1, double x2, double y2) { dist = sqrt((x2 – x1)**2 + (y2 – y1)**2 ); } |
Example: Creating a device data environment and executing the structured block on the device asynchronously after its dependence is satisfied. |
---|
#pragma omp target map(double dist, double x1, double y1, double x2, double y2) depend(in: a) nowait { dist = sqrt((x2 – x1)**2 + (y2 – y1)**2 ); } |
The above example demonstrates how to use this pragma to offload a region to be asynchronously executed on the device after its dependence a is satisfied from a previous target or task pragma. If there is no previous target or task pragma the region is offloaded for immediate execution.