Intel® C++ Compiler 16.0 User and Reference Guide
This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
By default, the offload pragma causes the CPU thread that encounters the pragma to wait for completion of the offload before continuing to the next statement. You can execute an asynchronous offload computation, which enables the CPU to initiate the offload and immediately continue to the next statement.
To specify an asynchronous offloaded computation, specify a signal clause in the offload pragma to initiate the computation, and subsequently use the offload_wait pragma to wait for completion of the offloaded computation.
Alternatively, you can use the non-blocking API _Offload_signaled() to also determine if a section of offloaded code has completed running on a specific target device.
The signal and wait clauses, the offload_wait construct and the _Offload_signaled() API refer to a specific target device, so you must specify target-number in the target() clause.
Querying a signal before the signal has been initiated results in undefined behavior, and a runtime abort of the application. For example, consider a query of a signal (SIG1) on target device 0, where the signal was actually initiated for target device 1. The signal was initiated for target device 1, so there is no signal (SIG1) associated with target device 0, and therefore the application aborts.
The following example enables the CPU to issue offloaded computations and continue concurrent activity without using any additional CPU threads:
char signal_var; do { #pragma offload target (mic:0) signal(&signal_var) { long_running_mic_compute(); } concurrent_cpu_activity(); #pragma offload_wait target (mic:0) (&signal_var) } while (1);