Intel® C++ Compiler 16.0 User and Reference Guide
Divide and conquer is an effective parallelization strategy, creating a good mix of large and small sub-problems. The work-stealing scheduler can allocate chunks of work efficiently to the cores, provided that there are not too many very large chunks or too many very small chunks. If the work is divided into just a few large chunks, there may not be enough parallelism to keep all the cores busy. If the chunks are too small, then scheduling overhead may overwhelm the advantages of parallelism.
Granularity can be an issue with parallel programs using cilk_for or cilk_spawn. If you are using cilk_for, you can control the granularity by setting the grain size of the loop. In addition, if you have nested loops, the nature of your computation will determine whether you achieve the best performance using cilk_for for inner or outer loops, or both. If you are using cilk_spawn, be careful not to spawn very small chunks of work. While the overhead of cilk_spawn is relatively small, performance will suffer if you spawn very small amounts of work.