Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
The LLC (last-level cache) is the last, and longest-latency, level in the memory hierarchy before main memory (DRAM). While LLC hits are serviced much more quickly than hits in DRAM, they can still incur a significant performance penalty. This metric also includes coherence penalties for shared data.
A significant proportion of cycles is being spent on data fetches that miss in the L2 but hit in the LLC. This metric includes coherence penalties for shared data. If contested accesses or data sharing are indicated as likely issues, address them first. Otherwise, consider the same performance tuning as you would apply for an LLC-missing workload - reduce the data working set size, improve data access locality, consider blocking or otherwise partitioning your working set so that it fits in the L2, or better exploit hardware prefetchers. Consider using software prefetchers, but note that they can interfere with normal loads, potentially increasing latency, and that they increase pressure on the memory system.