Building the Intel® Threading Building Blocks (TBB) Version

To build the Intel® TBB version, you will modify the sample application to use Intel® TBB and then compile the modified code. You will then run the application and then compare the time with the baseline performance time.

  1. Remove all of the files that were created when you build the serial version by running the following command:

    %make clean

  2. Open the source file src/build_with_tbb/build_with_tbb.cpp in your favorite code editor.

  3. Remove the comment marks for the TBB headers to declare the TBB functions that will be used in the sample application.

    #include "tbb/task_scheduler_init.h"
    #include "tbb/parallel_for.h"
    #include "tbb/blocked_range2d.h"

  4. Remove the comment marks from the draw_task class definition. This class replaces the draw_task() function with a class that defines a function object. Note the similarity of the code in the operator() member function in the draw_task class and the draw_task() function. The operator() member function takes the argument const tbb:blocked_range2d<int> &r. This argument is the iteration range passed to the Intel® TBB thread executing this function object.

  5. Add comment marks to the draw_task() function since the tasks in this function are now defined in the draw_task class.

  6. Remove the comment marks from the following in the thread_trace() function:

    • The lines regarding Intel® TBB schedule and number of threads. these lines allow you to manually define the number of threads in the environment variable TACHYON_NUM_THREADS.

    • The lines regarding the grain size if you want to experiment by changing the grain size manually in the environment variable TACHYON_GRAINSIZE. The grain size is the lower limit on the number of loop iterations that can be divided among the threads.

    • The lines regarding the partitioner used. You can control this with the environment variable TACHYON_PARTITIONER, by using simp for the simple_partitioner or to aff for the affinity_partitioner.

    • The Intel® TBB parallel_for function. This function is where the parallelization call happens. The first argument in this function defines the two-dimensional iteration space the loop executes along from starty to stopy and from from startx to stopx. If you want to manually specify a grain size, you would pass grain_size to the tbb::blocked_range2d constructor as the third and sixth parameter. The second parameter is the draw_task() function object. The last parameter is the Intel® TBB auto_partitioner which controls the loop iteration partitioning automatically.

  7. Add comment marks to the draw_task() function call since the parallel_for() function replaces the draw_task() function.

  8. Build the sample by running the following command:

    %make tbb

  9. Run the sample application.

Compare the time to render the image to the baseline performance time.

Open the Building the Building the Intel Cilk Plus Version topic