To build the OpenMP* version, you will modify the sample application to use OpenMP* parallelization and then compile the modified code. You will then start the application and compare the time with the baseline performance time.
Set the build_with_openmp project as the startup project.
For project build_with_openmp, change the compiler to the Intel® C++ Compiler (Project > Intel Compiler > Use Intel C++).
For the project build_with_openmp, make sure the /Qopenmp compiler option is set (Project > Properties > Configuration Properties > C/C++ > Language > OpenMP Support = Generate Parallel Code (/Qopenmp)). This option is required to enable the OpenMP* extension in the compiler
Do the following in the draw_task function:
Remove the comment marks from the OpenMP* #pragma omp parallel and #pragma omp for schedule(dynamic) directives to create the parallel regions and distribute the execution of the for loop iterations to the team of threads . The schedule(dynamic) clause which describes that each thread will receive a small number of loop iterations to execute, and then when finished takes on another small number of loop iterations. The clause improves performance by load balancing the application so that each thread remains busy.
Add comment marks to the line return; inside the parallel region. OpenMP parallel regions cannot branch out of the parallel region
Remove the comment marks from the line ison=0; inside the parallel region. Setting this variable to zero stops frames from being rendered.
Remove the comment marks from the line return; at the end of the function to replace the return; line inside the parallel region that you added comment marks.
Start the sample application.
Compare the time to render the image to the baseline performance time.
If you wish to explicitly set the number of threads, you can set the environment variable OMP_NUM_THREADS=N where N is the number of threads. Alternatively, you can use the function void omp_set_num_threads(int nthreads) that is declared in omp_lib.h. Make sure to call this function before the parallel region is defined.
Options that use OpenMP* are available for both Intel and non-Intel microprocessors, but these options may perform additional optimizations on Intel® microprocessors than they perform on non-Intel microprocessors. The list of major, user-visible OpenMP* constructs and features that may perform differently on Intel versus non-Intel microprocessors includes:
Internal and user visible locks
The SINGLE construct
Explicit and implicit barriers
Parallel loop scheduling
Reductions
Memory allocation
Thread affinity and binding