Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
When a branch mispredicts, some instructions from the mispredicted path still move through the pipeline. All work performed on these instructions is wasted since they would not have been executed had the branch been correctly predicted. This metric represents slots fraction the CPU has wasted due to Branch Misprediction. These slots are either wasted by uOps fetched from an incorrectly speculated program path, or stalls when the out-of-order part of the machine needs to recover its state from a speculative path.
A significant proportion of branches are mispredicted, leading to excessive wasted work or Backend stalls due to the machine need to recover its state from a speculative path. Start with identifying heavily mispredicted branches. Account for skid. Consider ways to make your algorithm more predictable or to reduce the number of branches. You can move 'if' statements as high as possible in the code flow (that is, as early as possible, and containing as much as possible). If using 'switch' or 'case' statements, put the most commonly executed cases first. Avoid using virtual function pointers for heavily executed calls. Using profile-guided optimization in the compiler may help. See the Intel 64 and IA-32 Architectures Optimization Reference Manual for general strategies to address branch misprediction issues.