Intel® Fortran Compiler 16.0 User and Reference Guide
Interprocedural Optimization (IPO) is an automatic, multi-step process that allows the compiler to analyze your code to determine where you can benefit from specific optimizations.
The compiler may apply the following optimizations:
address-taken analysis
array dimension padding
alias analysis
automatic array transposition
automatic memory pool formation
common block variable coalescing
common block splitting
constant propagation
dead call deletion
dead formal argument elimination
dead function elimination
formal parameter alignment analysis
forward substitution
indirect call conversion
inlining
mod/ref analysis
partial dead call elimination
passing arguments in registers to optimize calls and register usage
routine key-attribute propagation
specialization
stack frame alignment
structure splitting and field reordering
symbol table data promotion
un-referenced variable removal
whole program analysis
IPO supports two compilation models - single-file compilation and multi-file compilation.
Single-file compilation uses the [Q]ip compiler option, and results in one, real object file for each source file being compiled. During single-file compilation the compiler performs inline function expansion for calls to procedures defined within the current source file.
The compiler performs some single-file interprocedural optimization at the O2 default optimization level; additionally the compiler may perform some inlining for the O1 optimization level, like inlining functions marked with inlining directives.
Multi-file compilation uses the [Q]ipo option, and results in one or more mock object files rather than normal object files. (See the Compilation section below for information about mock object files.) Additionally, the compiler collects information from the individual source files that make up the program. Using this information, the compiler performs optimizations across functions and procedures in different source files.
Inlining and other optimizations are improved by profile information. For a description of how to use IPO with profile information for further optimization, see Profile an Application.
OS X*: Intel®-based systems running OS X do not support a multiple object compilation model.
As each source file is compiled with IPO, the compiler stores an intermediate representation (IR) of the source code in a mock object file. The mock object files contain the IR instead of the normal object code. Mock object files can be ten times or more larger than the size of normal object files.
During the IPO compilation phase only the mock object files are visible.
When you link with the [Q]ipo compiler option the compiler is invoked a final time. The compiler performs IPO across all mock object files. The mock objects must be linked with the Intel compiler or by using the Intel linking tools. While linking with IPO, the Intel compilers and other linking tools compile mock object files as well as invoke the real/true object files linkers provided on the user's platform.
Link-time optimization using the -ffat-lto-objects compiler option is provided for GCC compatibility. During IPO compilation, you can specify -ffat-lto-objects option, for the compiler to generate a fat link-time optimization (LTO) object that has both a real/true object and a discardable intermediate language section. This enables both link-time optimization (LTO) linking and normal linking.
You can specify the -fno-fat-lto-objects option for the compiler to generate a link-time optimization (LTO) object that only has a discardable intermediate language section; no real/true object is generated. These files are inserted into archives in the form in which they were created. Using this option may improve compilation time and save space for objects.
If you use ld rather than xild to link objects or ar instead of xiar to create an archive, the real/true object, generated during fat link-time optimization guarantees that there will be no impediment to linking/building the archive. However, cross-file optimizations are lost in this case. The extra true object also takes additional space and takes compile time to generate it, so using -fno-fat-lto-objects compiler option is an advantage provided that you link the IPO mock object files with xild and archive them with xiar.
The compiler supports a large number of IPO optimizations that can be applied or have its effectiveness greatly increased when the whole program condition is satisfied.
During the analysis process, the compiler reads all Intermediate Representation (IR) in the mock file, object files, and library files to determine if all references are resolved and whether or not a given symbol is defined in a mock object file. Symbols that are included in the IR in a mock object file for both data and functions are candidates for manipulation based on the results of whole program analysis.
There are two types of whole program analysis - object reader method and table method. Most optimizations can be applied if either type of whole program analysis determines that the whole program conditions exists; however, some optimizations require the results of the object reader method, and some optimizations require the results of table method.
Object reader method
In the object reader method, the object reader emulates the behavior of the native linker and attempts to resolve the symbols in the application. If all symbols are resolved, the whole program condition is satisfied. This type of whole program analysis is more likely to detect the whole program condition.
Table method
In the table method the compiler analyzes the mock object files and generates a call-graph.
The compiler contains detailed tables about all of the functions for all important language-specific libraries, like the Fortran runtime libraries. In this second method, the compiler constructs a call-graph for the application. The compiler then compares the function table and application call-graph. For each unresolved function in the call-graph, the compiler attempts to resolve the calls by finding each an entry for each unresolved function in the compiler tables. If the compiler can resolve the functions call, the whole program condition exists.