Intel® C++ Compiler 16.0 User and Reference Guide

Offload Using Streams

This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Streams can be used to offload multiple concurrent computations to a device on Intel® MIC Architecture from a single CPU thread.

A stream is a logical queue of offloads. Offloads in any one stream complete in the order in which they were issued to the stream.

To use this feature, specify the stream clause in #pragma offload or #pragma offload_transfer.

To specify a wait for all offloads to the stream, specify the stream clause in #pragma offload_wait .

Defining Streams

The following API creates a stream and specify the number of threads allocated to it:

OFFLOAD_STREAM* handle =
    _Offload_stream_create(
        int device,                  // Intel® MIC Architecture device number
        int number_of_cpus);         // Threads allocated to the stream

After a stream has been created, it is tied to a target device. So, there is no need to specify a device whenever offloading to a stream; this is in contrast to non-stream offloads, which always require a target device specification.

Destroying Streams

The following API destroys a stream and returns the device threads to the pool for future streams:

int _Offload_stream_destroy(
        _Offload_stream stream);           // The stream

This API returns true if the stream was successfully destroyed.

Offloading to a Stream

Offloads can be issued to a stream using the following syntax:

// Issue an offload to a stream
#pragma offload … stream(handle)

You can use a signal clause to identify a particular offload issued to a stream. The signal identifier can later be used to wait for completion of that specific offload; for example

// Issue offload to a stream and identify with a signal
#pragma offload … stream(handle) signal(s)

Waiting for Offload Completion

A wait can be specified for all offloads in a stream or for a particular offload issued to a stream.

The following example shows how to specify a wait for completion of all offloads in a stream:

// Issue offload to a stream
#pragma offload … stream(handle)
{ … }
…
// Issue another offload to that stream
#pragma offload … stream(handle)
{ … }
…
// Wait for all offloads in that stream to complete
#pragma offload_wait stream(handle)

The following example shows how to specify a wait for completion of a particular offload in a stream:

// Issue offload to a stream and identify with signal value s1
#pragma offload … stream(handle) signal(s1)
{ … }
…
// Issue offload to a stream and identify with signal value s2
#pragma offload … stream(handle) signal(s2)
{ … }
…
// Wait for offload with signal value s1 to complete
#pragma offload_wait stream(handle) wait(s1)

The following example shows how to specify a wait for completion of all offloads in all streams:

// Issue offload to a stream 1
#pragma offload … stream(handle1)
{ … }
…
// Issue offload to a stream 2

#pragma offload … stream(handle2)
{ … }
…
// Wait for completion of all offloads in all streams using handle 0
#pragma offload_wait stream(0)

Testing for Offload Completion

This feature includes non-blocking APIs that return a Boolean value to test whether:

The following function tests whether all offloads to the specified stream have completed. Specifying a 0 for stream will test whether offloads to all streams have completed.

int _Offload_stream_completed(
        _Offload_stream stream);           // The stream

The following function tests whether all offloads to the specified device have completed. Specifying -1 for the device will test whether all stream offloads on all devices have completed.

int _Offload_device_streams_completed(
        int device);                 // Intel® MIC Architecture device number

The following example shows how to check for completion of all offloads in a stream:

// Issue offload to a stream
#pragma offload … stream(handle)
{ … }
…
// Issue another offload to that stream
#pragma offload … stream(handle)
{ … }
…
// Check if all offloads in that stream have completed
if (_Offload_stream_completed(handle)) …

The following example shows how to check for a particular offload in a stream:

// Issue offload to a stream and identify with signal value s1
#pragma offload … stream(handle) signal(s1)
{ … }
…
// Issue offload to a stream and identify with signal value s2
#pragma offload … stream(handle) signal(s2)
{ … }
…
// Check if offload with signal value s1 has completed
if (_Offload_signaled(s1)) …

The following example shows how to check for completion of all offload in all streams:

// Issue offload to a stream 1
#pragma offload … stream(handle1)
{ … }
…
// Issue offload to a stream 2
#pragma offload … stream(handle2)
{ … }
… 
// Check for completion of all offloads in all streams using handle 0
if (_Offload_stream_completed(0)) …

The following example shows how to check for completion of all offloads on a device:

// Issue offload to a stream 1
#pragma offload … stream(handle1)
{ … }
…
// Issue offload to a stream 2
#pragma offload … stream(handle2)
{ … }
… 
// Check for completion of all stream offloads on device 2
if (_Offload_device_streams_completed(2) …

The following example shows how to check for completion of all offloads on all devices:

// Issue offload to a stream 1
#pragma offload … stream(handle1)
{ … }
…
// Issue offload to a stream 2
#pragma offload … stream(handle2)
{ … }
… 
// Check for completion of all stream offloads on device 2
if (_Offload_device_streams_completed(-1) …
OR
if (_Offload_stream_completed(0) …

See Also