Intel® C++ Compiler 16.0 User and Reference Guide
Parallel strands are not able to run in parallel if they concurrently attempt to access a shared lock. In some programs, locks can eliminate virtually all of the performance benefit of parallelism. In extreme cases, such programs can even run significantly slower than the corresponding single-processor serial program. Consider using a reducer if possible.
If you must use locks, consider observing these guidelines:
Hold a synchronization object (lock) for as short a time as possible. Acquire the lock, update the data, and release the lock. Do not perform extraneous operations while holding the lock. If the application must hold a synchronization object for a long time, then reconsider whether it is a good candidate for parallelization. This guideline also helps to ensure that the acquiring strand always releases the lock.
Release a lock at the same scope level as it was acquired. Separating the acquisition and release of a synchronization object obfuscates the duration that the object is being held and can lead to failure to release a synchronization object and deadlocks. This guideline also ensures that the acquiring strand also releases the lock.
Do not hold a lock across a cilk_spawn or cilk_sync boundary. This includes across a cilk_for loop. See the following section for more explanation.
Avoid deadlocks by ensuring that a lock sequence is always acquired in the same order. Releasing the locks in the opposite order is not necessary but can improve performance.