Intel® C++ Compiler 16.0 User and Reference Guide

How Reducers Work

An explanation of the mechanisms and semantics of reducers should help the more advanced programmer understand the rules governing the use of reducers as well as provide the background needed to write custom reducers.

Reducers are designed to support the parallel execution of a "reduction" algorithm:

  1. There is an accumulator variable x with an initial value a0 .

  2. There is a reduction operation OP.

  3. OP is an associative operation, i.e., x OP y = y OP x.

  4. OP has an identity value I, i.e., x OP I = I OP x = x.

  5. The code repeatedly updates the variable, with each update having the form x = x OP ai . After N updates, x contains a0 OP a1 OP a2 OP … OP aN .

Common operations that fit this pattern include addition and multiplication, bitwise AND and OR, set union, list append, and string concatenation. (In mathematics, the combination of a type, an associative operation on that type, and an identity for that operation is called a monoid.)

A reducer object automatically manages multiple copies of an accumulator variable, called the "view," as parallel strands of execution are spawned and synced. Each strand executing in parallel gets its own view, so there cannot be data races between modifications of the accumulator variable in different strands:

  1. Initially, a reducer contains a single view, called the "leftmost" view.
  2. The reducer creates a new instance of the view for each strand that executes in parallel with the leftmost strand. The new view instance is initialized to the reduction operation's identity value by the reducer's identity() function.

  3. All accesses to the reducer's view in a strand apply to the view instance that the reducer created for that strand.

  4. When a spawned strand finishes executing and merges back into its parent, the reducer's reduce() function combines the values of the two strands' views with the reduction operation, leaving the result in the surviving strand's view instance. The reducer then destroys the terminated strand's view.

  5. When all spawned strands have finished executing, the final result of the computation remains in the leftmost view.

As a result of this process, each strand computes a subsequence of the total sequence of operations (I OP ai OP ai+1 OP … OP ai+m = ai OP ai+1 OP … OP ai+m ) in its view, and when the computation is finished, the leftmost view contains the expected result (a0 OP a1 OP a2 OP … OP aN ). The order of the values is the same as in the serial computation, and since the operation is associative, it doesn't matter what order the subsequence totals are combined in.

Lazy View Creation

The reason for creating private view instances for strands is to eliminate data races. A strand needs its own instance of a view only if it accesses the view and also executes in parallel with some other strand. Creating, managing, and merging views is not expensive, but it isn't free, so the scheduler avoids creating views unnecessarily. The rule is that:

  1. A reducer does not create a new view instance for a strand unless the strand has been stolen. If the strand is not stolen, it can safely use the same view as its child.
  2. The reducer does not create a new view instance until the first time the view is accessed in a stolen strand. If the view is never accessed in the strand, a new instance is not created.

Example: Using reducer<op_list_append<>>

This example is a simple program that uses reducer<op_list_append<>> to accumulate a list of strings in parallel. For list appends, the identity value is the empty list, and the reduction function concatenates the left and right lists.

 1 #include <cilk/cilk.h>
 2 #include <cilk/reducer_list.h>
 3
 4 cilk::reducer< cilk::op_list_append<std::string> > lr();
 5
 6 void depart()
 7 {
 8   lr->push_back("leave");
 9 }
10
11 int main()
12 {
13   lr->push_back("Don’t ");
14   cilk_spawn depart();
15   lr->push_back(" the path!");
16   cilk_sync;
17   std::list<std::string> result = list.get_value();
18 }

First, consider the serial case when the execution occurs on a single processor.

  1. The leftmost view is initialized to an empty list when the reducer is created.
  2. The string "Don't " is appended to the list at line 13.

  3. At the spawn, execution continues in the depart() strand, and "leave" is appended to the list at line 8.

  4. With only one worker, there is no steal, so a new view is not created for the continuation strand, and " the path!" is appended to the list at line 15.

  5. Since there is only one view, there is nothing to do at the sync, and the final content of the leftmost view is {"Don't ", "leave", " the path!"}.

Now consider what happens when the program is run with two workers.

  1. The leftmost view is initialized to an empty list when the reducer is created.
  2. The string "Don't " is appended to the list at line 13.

  3. At the spawn, execution continues in the depart() strand, and "leave" is appended to the list at line 8.

  4. Meanwhile, the second worker steals the continuation strand, so when the view is accessed at line 15, the reducer creates a new view and initializes it to the empty list.

  5. " the path!" is appended to the list in the private view of the continuation strand at line 15.

  6. At the sync, the content of the right view, {"the path!"} , is appended to the content of the left view, {"Don't", "leave" }, so the content of the left most view becomes {"Don't ", "leave", " the path!" }, and the right view is deleted.