Intel® Fortran Compiler 16.0 User and Reference Guide

OpenMP* Advanced Issues

This topic discusses how to use the OpenMP* library functions and environment variables and discusses some guidelines for enhancing performance with OpenMP*.

OpenMP* provides specific function calls, and environment variables. See the following topics to refresh you memory about the primary functions and environment variable used in this topic:

To use the function calls, include the omp_lib.h header file or specify use omp_lib to use the Fortran90 module file. These files are installed in the INCLUDE directory during the compiler installation, and compile the application using the[Q]openmp option.

The following example, which demonstrates how to use the OpenMP* functions to print the alphabet, also illustrates several important concepts:

  1. When using functions instead of directives, your code must be rewritten; rewrites can mean extra debugging, testing, and maintenance efforts.
  2. It becomes difficult to compile without OpenMP* support.
  3. it is very easy to introduce simple bugs, as in the loop (below) that fails to print all the letters of the alphabet when the number of threads is not a multiple of 26.
  4. You lose the ability to adjust loop scheduling without creating your own work-queue algorithm, which is a lot of extra effort. You are limited by your own scheduling, which is mostly likely static scheduling as shown in the example.

Example

include "omp_lib.h" 
integer i 
integer LettersPerThread, ThisThreadNum, StartLetter, EndLetter 

call omp_set_num_threads(4) 
 	!$OMP PARALLEL PRIVATE(i) 

		! OMP_NUM_THREADS is not a multiple of 26, 
		! which can be considered a bug in this code. 
		LettersPerThread = 26 / omp_get_num_threads() 
		ThisThreadNum = omp_get_thread_num() 
		StartLetter = 'a'+ThisThreadNum*LettersPerThread 
		EndLetter = 'a'+ThisThreadNum*LettersPerThread+LettersPerThread 

		DO i = StartLetter, EndLetter - 1 
	 		write( *,FMT='(A)',ADVANCE='NO') char(i) 
		END DO 

	 !$OMP END PARALLEL 
write(*,*) 
end

Debugging threaded applications is a complex process because debuggers change the run-time performance, which can mask race conditions. Even print statements can mask issues, because they use synchronization and operating system functions. OpenMP* itself also adds some complications, because it introduces additional structure by distinguishing private variables and shared variables, and inserts additional code. A debugger that supports OpenMP* can help you to examine variables and step through threaded code. You can use Intel® Inspector to detect many hard-to-find threading errors analytically. Sometimes, a process of elimination can help identify problems without resorting to sophisticated debugging tools.

Remember that most mistakes are race conditions. Most race conditions are caused by shared variables that really should have been declared private. Start by looking at the variables inside the parallel regions and make sure that the variables are declared private when necessary. Next, check functions called within parallel constructs.

The DEFAULT(NONE) clause, shown below, can be used to help find those hard-to-spot variables. If you specify DEFAULT(NONE), then every variable must be declared with a data-sharing attribute clause.

Example

!$OMP PARALLEL DO DEFAULT(NONE) PRIVATE(x,y) SHARED(a,b)

Another common mistake is using uninitialized variables. Remember that private variables do not have initial values upon entering a parallel construct. Use the FIRSTPRIVATE and LASTPRIVATE clauses to initialize them only when necessary, because doing so adds extra overhead.

If you still can't find the bug, then consider the possibility of reducing the scope. Try a binary-hunt. Another method is to force large chunks of a parallel region to be critical sections. Pick a region of the code that you think contains the bug and place it within a critical section. Try to find the section of code that suddenly works when it is within a critical section and fails when it is not. Now look at the variables, and see if the bug is apparent. If that still doesn't work, try setting the entire program to run in serial by setting the compiler-specific environment variable KMP_LIBRARY=serial.

If the code is still not working, and you are not using any OpenMP* API function calls, compile it without the [Q]openmp option to make sure the serial version works. If you are using OpenMP* API function calls, use the [Q]openmp-stubs option.

Performance

OpenMP* threaded application performance is largely dependent upon the following things:

Performance always begins with a properly constructed parallel algorithm or application. For example, parallelizing a bubble-sort, even one written in hand-optimized assembly language, is not a good place to start. Keep scalability in mind; creating a program that runs well on two CPUs is not as efficient as creating one that runs well on n CPUs. With OpenMP*, the number of threads is chosen by the compiler, so programs that work well regardless of the number of threads are highly desirable. Producer/consumer architectures are rarely efficient, because they are made specifically for two threads.

Once the algorithm is in place, make sure that the code runs efficiently on the targeted Intel® architecture; a single-threaded version can be a big help. Turn off the [Q]openmp option to generate a single-threaded version, or build with the [Q]openmp-stubs option, and run the single-threaded version through the usual set of optimizations.

Once you have gotten the single-threaded performance, it is time to generate the multi-threaded version and start doing some analysis.

Optimizations are really a combination of patience, experimentation, and practice. Make little test programs that mimic the way your application uses the computer resources to get a feel for what things are faster than others. Be sure to try the different scheduling clauses for the parallel sections of code. If the overhead of a parallel region is large compared to the compute time, you may want to use an if clause to execute the section serially.

Interoperability with OpenMP* in C/C++

The Intel® Fortran Compiler does not support Fortran calling C/C++ or C/C++ calling Fortran when both caller and callee are using OpenMP* constructs.

Fortran code that uses OpenMP* constructs can call C/C++ as long as the C/C++ code does not use OpenMP* constructs.

See Also