Intel® C++ Compiler 16.0 User and Reference Guide
This example illustrates use of reducers in accumulating a sum in parallel. Consider the following serial program, which repeatedly calls a compute() function and accumulates the answers into the total variable.
#include <iostream> unsigned int compute(unsigned int i) { return i; // return a value computed from i } int main(int argc, char* argv[]) { unsigned long long int n = 1000000; unsigned long long int total = 0; // Compute the sum of integers 1..n for(unsigned int i = 1; i <= n; ++i) { total += compute(i); } // the sum of the first n integers should be n * (n+1) / 2 unsigned long long int correct = (n * (n+1)) / 2; if (total == correct) std::cout << "Total (" << total << ") is correct" << std::endl; else std::cout << "Total (" << total << ") is WRONG, should be " << correct << std::endl; return 0; }
Converting this program to an Intel® Cilk™ Plus program and changing the for to a cilk_for causes the loop to run in parallel, but creates a data race on the total variable. To resolve the race, you can make total a reducer; specifically, a reducer<op_add>, defined for types that have an associative + operator. The changes are shown below.
#include <cilk/cilk.h> #include <cilk/reducer_opadd.h> #include <iostream> unsigned int compute(unsigned int i) { return i; // return a value computed from i } int main(int argc, char* argv[]) { unsigned long long int n = 1000000; cilk::reducer< cilk::op_add<unsigned long long int> > total (0); // Compute 1..n cilk_for(unsigned int i = 1; i <= n; ++i) { *total += compute(i); } // the sum of the first N integers should be n * (n+1) / 2 unsigned long long int correct = (n * (n+1)) / 2; if ( total.get_value() == correct) std::cout << "Total (" << total.get_value() << ") is correct" << std::endl; else std::cout << "Total (" << total.get_value() << ") is WRONG, should be " << correct << std::endl; return 0; }
The following changes in the serial code show how to use a reducer:
Include the appropriate reducer header file (cilk/reducer_opadd.h).
Declare the reduction variable as a reducer< op_kind<TYPE> > rather than as a TYPE.
Introduce parallelism, in this case by changing the for loop to a cilk_for loop.
In the parallel code, change references to the original variable to dereferences of the reducer variable (*total).
Retrieve the reducer's final value after all parallel strands have synchronized; in this case, after the cilk_for loop is complete (total.get_value()).