Programming Tradeoffs in Floating-point Applications

In general, the programming objectives for floating-point applications fall into the following categories:

Accuracy: The application produces results that are close to the correct result.
Reproducibility and portability: The application produces consistent results across different runs, different sets of build options, different compilers, different platforms, and different architectures.
Performance: The application produces fast, efficient code.

Based on the goal of an application, you will need to make tradeoffs among these objectives. For example, if you are developing a 3D graphics engine, then performance may be the most important factor to consider, and reproducibility and accuracy may be your secondary concerns.

The Intel® Fortran Compiler provides several compiler options that allow you to tune your applications based on specific objectives. Broadly speaking, there are the floating-point specific options, such as the -fp-model (Linux* and OS X*) or /fp (Windows*) option, and the fast-but-low-accuracy options, such as the [Q]imf-max-error option. The compiler optimizes and generates code differently when you specify these different compiler options. You should select appropriate compiler options by carefully balancing your programming objectives and making tradeoffs among these objectives. Some of these options may influence the choice of math routines that will be invoked.

Many routines in the libirc, libm, and svml library are more highly optimized for Intel microprocessors than for non-Intel microprocessors.

Using Floating-point Options

Take the following code as an example:

Example
REAL(4):: t0, t1, t2 ... t0=t1+t2+4.0+0.1

If you specify the -fp-model extended (Linux* and OS X*) or /fp:extended (Windows*) option in favor of accuracy, the compiler generates the following assembly code:

fld       DWORD PTR _t1 
fadd      DWORD PTR _t2 
fadd      DWORD PTR _Cnst4.0 
fadd      DWORD PTR _Cnst0.1 
fstp      DWORD PTR _t0

The above code maximizes accuracy because it utilizes the highest mantissa precision available on the target platform. However, the code might suffer in performance due to the overhead of managing the x87 stack and it might yield results that cannot be reproduced on other platforms that do not have an equivalent extended precision type.

If you specify the -fp-model source (Linux* and OS X*) or /fp:source (Windows*) option in favor of reproducibility and portability, the compiler generates the following assembly code:

movss     xmm0, DWORD PTR _t1 
addss     xmm0, DWORD PTR _t2 
addss     xmm0, DWORD PTR _Cnst4.0 
addss     xmm0, DWORD PTR _Cnst0.1 
movss     DWORD PTR _t0, xmm0

The above code maximizes portability by preserving the original order of the computation and by using the well-defined IEEE single-precision type for all computations. It is not as accurate as the previous implementation because the intermediate rounding error is greater compared to extended precision. And it is not the highest performance implementation because it does not take advantage of the opportunity to pre-compute 4.0 + 0.1.

If you specify the -fp-model fast (Linux* and OS X*) or /fp:fast (Windows*) option in favor of performance, the compiler generates the following assembly code:

movss     xmm0, DWORD PTR _Cnst4.1 
addss     xmm0, DWORD PTR _t1 
addss     xmm0, DWORD PTR _t2 
movss     DWORD PTR _t0, xmm0

The above code maximizes performance by using Intel® SSE instructions and pre-computing 4.0 + 0.1. It is not as accurate as the first implementation, again due to greater intermediate rounding error. It will not provide reproducible results like the second implementation because it must reorder the addition in order to pre-compute 4.0 + 0.1, and you cannot expect that all compilers, on all platforms, at all optimization levels will reorder the addition in the same way.

For many other applications, the considerations may be more complicated.

Using Fast-but-low-accuracy Options

The fast-but-low-accuracy options provide an easy way to control the accuracy of mathematical functions and utilize performance/accuracy tradeoffs offered by the Intel® Math Libraries. You can specify accuracy, via a command line interface, for all math functions or a selected set of math functions at the level more precise than low, medium or high.

You specify the accuracy requirements as a set of function attributes that the compiler uses for selecting an appropriate function implementation in the math libraries. Examples using the attribute, max-error, is presented here. For example, use the following option:

-fimf-max-error=2

to specify relative error of two ulps (unit in the last place) for all single, double, long double, and quad precision functions.

To specify twelve bits of accuracy for a sin function, use:

–fimf-sin-accuracy-bits=12

To specify relative error of ten ulps for a sin function, and four ulps for other math functions called in the source file you are compiling, use:

-fimf-sin-max-error=10 -fimf-max-error=4

The Intel® Fortran Compiler defines the default value for the max-error attribute depending on the /fp option and /Qfast-transcendentals settings. In /fp:fast mode or if fast but less accurate math functions are explicitly enabled by /Qfast-transcendentals-, then the Intel® Fortran Compiler sets max-error=4.0 for the call. Otherwise, it sets max-error=0.6.

Programming Tradeoffs in Floating-point Applications

Using Floating-point Options

Using Fast-but-low-accuracy Options

See Also