Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Cycles per Instruction Retired is a fundamental performance metric indicating an average amount of time each instruction took to execute. It is measured in cycles and calculated as an average across hardware threads. This view displays CPI for a hardware thread, but CPI per core is also a useful metric and can be calculated as CPI per thread / the number of hardware threads used per core. The theoretical best CPI per hardware thread is 2.0. CPIs over 4.0 in a hot function or hot loop may warrant further investigation. High CPI values usually indicate latency in the system that could be reduced.
The CPI may be too high. This could be caused by issues such as memory stalls, instruction starvation, branch misprediction or long latency instructions. Explore the other hardware-related metrics to identify what is causing high CPI.