Intel® C++ Compiler 16.0 User and Reference Guide
An earlier DAG illustrated the serial/parallel structure of an Intel® Cilk™ Plus program. Recall that the DAG does not depend on the number of processors. The execution model describes how the runtime scheduler maps strands to workers.
When parallelism is introduced, multiple strands may execute in parallel. However, in an Intel® Cilk™ Plus program, strands that may execute in parallel are not required to execute in parallel. The scheduler makes this decision dynamically.
Consider the following program fragment:
do_init_stuff(); // execute strand 1 cilk_spawn func3(); // spawn strand 3 (the "child") do_more_stuff(); // execute strand 2 (the "continuation") cilk_sync; do_final_stuff; // execute strand 4
Here is the simple DAG for the code:
Recall that a worker is an operating system thread that executes an Intel® Cilk™ Plus program. If there is more than one worker available, there are two ways that this program may execute:
The entire program may execute on a single worker
The scheduler may choose to execute strands (2) and (3) on different workers
In order to guarantee serial semantics, the function that is spawned (the child, or strand (3) in this example) is always executed on the same worker that is executing the strand that enters the spawn. Thus, in this case, strand (1) and strand (3) are guaranteed to run on the same worker.
If there is a worker available, then strand (2) (the"continuation") may execute on a different worker. This is known as a steal, and the situation can be described by saying that the continuation was stolen by the new worker.
To illustrate these two execution options, a new diagram is helpful. The diagram illustrates the execution on a single worker:
If a second worker is scheduled, the second worker will begin executing the continuation, strand (2). The first worker will proceed to the sync at (B). In the following diagram, the second worker is indicated by showing strand (2) as a dotted line. After the sync, strand (4) may continue on either worker. In the current implementation, strand (4) will execute on the last worker that reaches the sync.
The details of the execution model have several implications that will be described in the section on the interaction between workers and system threads, and also the section on reducers. For now, the key ideas are:
After a cilk_spawn, the child will always execute on the same worker (the system thread) as the caller.
After a cilk_spawn, the continuation may execute on a different worker. If this occurs, the continuation is said to be stolen by another worker.
After a cilk_sync, execution may proceed on any worker that executed a strand that entered the sync.
As a program executes on multiple workers, each worker tracks the pedigree of the strand it is currently executing. Intel® Cilk™ Plus provides an API that enables each worker to query the pedigree of its currently executing strand. The API also provides a call for creating a new maximal strand boundary. This call terminates the current maximal strand, and begins a new maximal strand with a different pedigree.