Intel® C++ Compiler 16.0 User and Reference Guide

Mapping Strands to Workers

An earlier DAG illustrated the serial/parallel structure of an Intel® Cilk™ Plus program. Recall that the DAG does not depend on the number of processors. The execution model describes how the runtime scheduler maps strands to workers.

When parallelism is introduced, multiple strands may execute in parallel. However, in an Intel® Cilk™ Plus program, strands that may execute in parallel are not required to execute in parallel. The scheduler makes this decision dynamically.

Consider the following program fragment:

do_init_stuff(); // execute strand 1
cilk_spawn func3();     // spawn strand 3 (the "child")
do_more_stuff();         // execute strand 2 (the "continuation")
cilk_sync;
do_final_stuff;          // execute strand 4

Here is the simple DAG for the code:



Recall that a worker is an operating system thread that executes an Intel® Cilk™ Plus program. If there is more than one worker available, there are two ways that this program may execute:

In order to guarantee serial semantics, the function that is spawned (the child, or strand (3) in this example) is always executed on the same worker that is executing the strand that enters the spawn. Thus, in this case, strand (1) and strand (3) are guaranteed to run on the same worker.

If there is a worker available, then strand (2) (the"continuation") may execute on a different worker. This is known as a steal, and the situation can be described by saying that the continuation was stolen by the new worker.

To illustrate these two execution options, a new diagram is helpful. The diagram illustrates the execution on a single worker:



If a second worker is scheduled, the second worker will begin executing the continuation, strand (2). The first worker will proceed to the sync at (B). In the following diagram, the second worker is indicated by showing strand (2) as a dotted line. After the sync, strand (4) may continue on either worker. In the current implementation, strand (4) will execute on the last worker that reaches the sync.



The details of the execution model have several implications that will be described in the section on the interaction between workers and system threads, and also the section on reducers. For now, the key ideas are:

As a program executes on multiple workers, each worker tracks the pedigree of the strand it is currently executing. Intel® Cilk™ Plus provides an API that enables each worker to query the pedigree of its currently executing strand. The API also provides a call for creating a new maximal strand boundary. This call terminates the current maximal strand, and begins a new maximal strand with a different pedigree.