Events for Intel® Microarchitecture Code Name Haswell

This section provides reference for hardware events that can be monitored for the CPU(s):

4th generation Intel® Core™ processor family

The following performance-monitoring events are supported:

ARITH.DIVIDER_UOPS

Any uop executed by the Divider. (This includes all divide uops, sqrt, ...)

AVX_INSTS.ALL

Note that a whole rep string only counts AVX_INST.ALL once.

BACLEARS.ANY

Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end.

BR_INST_EXEC.ALL_BRANCHES

Speculative and retired branches

BR_INST_EXEC.ALL_CONDITIONAL

Speculative and retired macro-conditional branches

BR_INST_EXEC.ALL_DIRECT_JMP

Speculative and retired macro-unconditional branches excluding calls and indirects

BR_INST_EXEC.ALL_DIRECT_NEAR_CALL

Speculative and retired direct near calls

BR_INST_EXEC.ALL_INDIRECT_JUMP_NON_CALL_RET

Speculative and retired indirect branches excluding calls and returns

BR_INST_EXEC.ALL_INDIRECT_NEAR_RETURN

Speculative and retired indirect return branches.

BR_INST_EXEC.NONTAKEN_CONDITIONAL

Not taken macro-conditional branches

BR_INST_EXEC.TAKEN_CONDITIONAL

Taken speculative and retired macro-conditional branches

BR_INST_EXEC.TAKEN_DIRECT_JUMP

Taken speculative and retired macro-conditional branch instructions excluding calls and indirects

BR_INST_EXEC.TAKEN_DIRECT_NEAR_CALL

Taken speculative and retired direct near calls

BR_INST_EXEC.TAKEN_INDIRECT_JUMP_NON_CALL_RET

Taken speculative and retired indirect branches excluding calls and returns

BR_INST_EXEC.TAKEN_INDIRECT_NEAR_CALL

Taken speculative and retired indirect calls

BR_INST_EXEC.TAKEN_INDIRECT_NEAR_RETURN

Taken speculative and retired indirect branches with return mnemonic

BR_INST_RETIRED.ALL_BRANCHES

All (macro) branch instructions retired.

BR_INST_RETIRED.ALL_BRANCHES_PS

All (macro) branch instructions retired.

BR_INST_RETIRED.CONDITIONAL

Conditional branch instructions retired.

BR_INST_RETIRED.CONDITIONAL_PS

Conditional branch instructions retired.

BR_INST_RETIRED.FAR_BRANCH

Far branch instructions retired.

BR_INST_RETIRED.NEAR_CALL

Direct and indirect near call instructions retired.

BR_INST_RETIRED.NEAR_CALL_PS

Direct and indirect near call instructions retired.

BR_INST_RETIRED.NEAR_CALL_R3

Direct and indirect macro near call instructions retired (captured in ring 3).

BR_INST_RETIRED.NEAR_CALL_R3_PS

Direct and indirect macro near call instructions retired (captured in ring 3).

BR_INST_RETIRED.NEAR_RETURN

Return instructions retired.

BR_INST_RETIRED.NEAR_RETURN_PS

Return instructions retired.

BR_INST_RETIRED.NEAR_TAKEN

Taken branch instructions retired.

BR_INST_RETIRED.NEAR_TAKEN_PS

Taken branch instructions retired.

BR_INST_RETIRED.NOT_TAKEN

Not taken branch instructions retired.

BR_MISP_EXEC.ALL_BRANCHES

Speculative and retired mispredicted macro conditional branches

BR_MISP_EXEC.ALL_CONDITIONAL

Speculative and retired mispredicted macro conditional branches

BR_MISP_EXEC.ALL_INDIRECT_JUMP_NON_CALL_RET

Mispredicted indirect branches excluding calls and returns

BR_MISP_EXEC.NONTAKEN_CONDITIONAL

Not taken speculative and retired mispredicted macro conditional branches

BR_MISP_EXEC.TAKEN_CONDITIONAL

Taken speculative and retired mispredicted macro conditional branches

BR_MISP_EXEC.TAKEN_INDIRECT_JUMP_NON_CALL_RET

Taken speculative and retired mispredicted indirect branches excluding calls and returns

BR_MISP_EXEC.TAKEN_INDIRECT_NEAR_CALL

Taken speculative and retired mispredicted indirect calls

BR_MISP_EXEC.TAKEN_RETURN_NEAR

Taken speculative and retired mispredicted indirect branches with return mnemonic

BR_MISP_RETIRED.ALL_BRANCHES

All mispredicted macro branch instructions retired.

BR_MISP_RETIRED.ALL_BRANCHES_PS

This event counts all mispredicted branch instructions retired. This is a precise event.

BR_MISP_RETIRED.CONDITIONAL

Mispredicted conditional branch instructions retired.

BR_MISP_RETIRED.CONDITIONAL_PS

Mispredicted conditional branch instructions retired.

BR_MISP_RETIRED.NEAR_TAKEN

number of near branch instructions retired that were mispredicted and taken.

BR_MISP_RETIRED.NEAR_TAKEN_PS

number of near branch instructions retired that were mispredicted and taken.

CPL_CYCLES.RING0

Unhalted core cycles when the thread is in ring 0

CPL_CYCLES.RING0_TRANS

Number of intervals between processor halts while thread is in ring 0

CPL_CYCLES.RING123

Unhalted core cycles when thread is in rings 1, 2, or 3

CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE

Count XClk pulses when this thread is unhalted and the other thread is halted.

CPU_CLK_THREAD_UNHALTED.REF_XCLK

Reference cycles when the thread is unhalted (counts at 100 MHz rate)

CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY

Reference cycles when the at least one thread on the physical core is unhalted (counts at 100 MHz rate)

CPU_CLK_UNHALTED.REF_TSC

This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.

CPU_CLK_UNHALTED.THREAD

This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.

CPU_CLK_UNHALTED.THREAD_ANY

Core cycles when at least one thread on the physical core is not in halt state

CPU_CLK_UNHALTED.THREAD_P

Thread cycles when thread is not in halt state

CPU_CLK_UNHALTED.THREAD_P_ANY

Core cycles when at least one thread on the physical core is not in halt state

CYCLE_ACTIVITY.CYCLES_L1D_PENDING

Cycles with pending L1 cache miss loads.

CYCLE_ACTIVITY.CYCLES_L2_PENDING

Cycles with pending L2 cache miss loads.

CYCLE_ACTIVITY.CYCLES_LDM_PENDING

Cycles with pending memory loads.

CYCLE_ACTIVITY.CYCLES_NO_EXECUTE

This event counts cycles during which no instructions were executed in the execution stage of the pipeline.

CYCLE_ACTIVITY.STALLS_L1D_PENDING

Execution stalls due to L1 data cache misses

CYCLE_ACTIVITY.STALLS_L2_PENDING

Execution stalls due to L2 cache misses.

CYCLE_ACTIVITY.STALLS_LDM_PENDING

This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data).

DSB2MITE_SWITCHES.PENALTY_CYCLES

Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.

DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK

Load misses in all DTLB levels that cause page walks

DTLB_LOAD_MISSES.PDE_CACHE_MISS

DTLB demand load misses with low part of linear-to-physical address translation missed

DTLB_LOAD_MISSES.STLB_HIT

Load operations that miss the first DTLB level but hit the second and do not cause page walks

DTLB_LOAD_MISSES.STLB_HIT_2M

This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.

DTLB_LOAD_MISSES.STLB_HIT_4K

This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.

DTLB_LOAD_MISSES.WALK_COMPLETED

Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size.

DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M

Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (2M/4M).

DTLB_LOAD_MISSES.WALK_COMPLETED_4K

Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (4K).

DTLB_LOAD_MISSES.WALK_DURATION

This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses.

DTLB_STORE_MISSES.MISS_CAUSES_A_WALK

Store misses in all DTLB levels that cause page walks

DTLB_STORE_MISSES.PDE_CACHE_MISS

DTLB store misses with low part of linear-to-physical address translation missed

DTLB_STORE_MISSES.STLB_HIT

Store operations that miss the first TLB level but hit the second and do not cause page walks

DTLB_STORE_MISSES.STLB_HIT_2M

This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.

DTLB_STORE_MISSES.STLB_HIT_4K

This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.

DTLB_STORE_MISSES.WALK_COMPLETED

Store misses in all DTLB levels that cause completed page walks

DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M

Store misses in all DTLB levels that cause completed page walks (2M/4M)

DTLB_STORE_MISSES.WALK_COMPLETED_4K

Store miss in all TLB levels causes a page walk that completes. (4K)

DTLB_STORE_MISSES.WALK_DURATION

This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses.

EPT.WALK_CYCLES

Cycle count for an Extended Page table walk.

FP_ASSIST.ANY

Cycles with any input/output SSE or FP assist

FP_ASSIST.SIMD_INPUT

Number of SIMD FP assists due to input values

FP_ASSIST.SIMD_OUTPUT

Number of SIMD FP assists due to Output values

FP_ASSIST.X87_INPUT

Number of X87 assists due to input value.

FP_ASSIST.X87_OUTPUT

Number of X87 assists due to output value.

HLE_RETIRED.ABORTED

Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).

HLE_RETIRED.ABORTED_MISC1

Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).

HLE_RETIRED.ABORTED_MISC2

Number of times an HLE execution aborted due to uncommon conditions

HLE_RETIRED.ABORTED_MISC3

Number of times an HLE execution aborted due to HLE-unfriendly instructions

HLE_RETIRED.ABORTED_MISC4

Number of times an HLE execution aborted due to incompatible memory type

HLE_RETIRED.ABORTED_MISC5

Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts)

HLE_RETIRED.ABORTED_PS

Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).

HLE_RETIRED.COMMIT

Number of times an HLE execution successfully committed

HLE_RETIRED.START

Number of times an HLE execution started.

ICACHE.HIT

Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both cacheable and noncacheable, including UC fetches

ICACHE.IFDATA_STALL

Cycles where a code fetch is stalled due to L1 instruction-cache miss.

ICACHE.IFETCH_STALL

Cycles where a code fetch is stalled due to L1 instruction-cache miss.

ICACHE.MISSES

This event counts Instruction Cache (ICACHE) misses.

IDQ.ALL_DSB_CYCLES_4_UOPS

Cycles Decode Stream Buffer (DSB) is delivering 4 Uops

IDQ.ALL_DSB_CYCLES_ANY_UOPS

Cycles Decode Stream Buffer (DSB) is delivering any Uop

IDQ.ALL_MITE_CYCLES_4_UOPS

Cycles MITE is delivering 4 Uops

IDQ.ALL_MITE_CYCLES_ANY_UOPS

Cycles MITE is delivering any Uop

IDQ.DSB_CYCLES

Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path

IDQ.DSB_UOPS

Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path

IDQ.EMPTY

Instruction Decode Queue (IDQ) empty cycles

IDQ.MITE_ALL_UOPS

Uops delivered to Instruction Decode Queue (IDQ) from MITE path

IDQ.MITE_CYCLES

Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path

IDQ.MITE_UOPS

Uops delivered to Instruction Decode Queue (IDQ) from MITE path

IDQ.MS_CYCLES

This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.

IDQ.MS_DSB_CYCLES

Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy

IDQ.MS_DSB_OCCUR

Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy

IDQ.MS_DSB_UOPS

Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy

IDQ.MS_MITE_UOPS

Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy

IDQ.MS_SWITCHES

Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer

IDQ.MS_UOPS

This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.

IDQ_UOPS_NOT_DELIVERED.CORE

This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis.

IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE

This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis.

IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK

Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.

IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_1_UOP_DELIV.CORE

Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled

IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_2_UOP_DELIV.CORE

Cycles with less than 2 uops delivered by the front end.

IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_3_UOP_DELIV.CORE

Cycles with less than 3 uops delivered by the front end.

ILD_STALL.IQ_FULL

Stall cycles because IQ is full

ILD_STALL.LCP

This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP).

INST_RETIRED.ANY

This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.

INST_RETIRED.ANY_P

Number of instructions retired. General Counter - architectural event

INST_RETIRED.PREC_DIST

Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution

INST_RETIRED.X87

This is a non-precise version (that is, does not use PEBS) of the event that counts FP operations retired. For X87 FP operations that have no exceptions counting also includes flows that have several X87, or flows that use X87 uops in the exception handling.

INT_MISC.RECOVERY_CYCLES

This event counts the number of cycles spent waiting for a recovery after an event such as a processor nuke, JEClear, assist, hle/rtm abort etc....

INT_MISC.RECOVERY_CYCLES_ANY

Core cycles the allocator was stalled due to recovery from earlier clear event for any thread running on the physical core (e.g. misprediction or memory nuke)

ITLB.ITLB_FLUSH

Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages.

ITLB_MISSES.MISS_CAUSES_A_WALK

Misses at all ITLB levels that cause page walks

ITLB_MISSES.STLB_HIT

Operations that miss the first ITLB level but hit the second and do not cause any page walks

ITLB_MISSES.STLB_HIT_2M

Code misses that miss the DTLB and hit the STLB (2M)

ITLB_MISSES.STLB_HIT_4K

Core misses that miss the DTLB and hit the STLB (4K)

ITLB_MISSES.WALK_COMPLETED

Misses in all ITLB levels that cause completed page walks

ITLB_MISSES.WALK_COMPLETED_2M_4M

Code miss in all TLB levels causes a page walk that completes. (2M/4M)

ITLB_MISSES.WALK_COMPLETED_4K

Code miss in all TLB levels causes a page walk that completes. (4K)

ITLB_MISSES.WALK_DURATION

This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses.

L1D.REPLACEMENT

This event counts when new data lines are brought into the L1 Data cache, which cause other lines to be evicted from the cache.

L1D_PEND_MISS.FB_FULL

Cycles a demand request was blocked due to Fill Buffers inavailability

L1D_PEND_MISS.PENDING

L1D miss oustandings duration in cycles

L1D_PEND_MISS.PENDING_CYCLES

Cycles with L1D load Misses outstanding.

L1D_PEND_MISS.PENDING_CYCLES_ANY

Cycles with L1D load Misses outstanding from any thread on physical core

L1D_PEND_MISS.REQUEST_FB_FULL

Number of times a request needed a FB entry but there was no entry available for it. That is the FB unavailability was dominant reason for blocking the request. A request includes cacheable/uncacheable demands that is load, store or SW prefetch. HWP are e

L2_DEMAND_RQSTS.WB_HIT

Not rejected writebacks that hit L2 cache

L2_LINES_IN.ALL

This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss.

L2_LINES_IN.E

L2 cache lines in E state filling L2

L2_LINES_IN.I

L2 cache lines in I state filling L2

L2_LINES_IN.S

L2 cache lines in S state filling L2

L2_LINES_OUT.DEMAND_CLEAN

Clean L2 cache lines evicted by demand

L2_LINES_OUT.DEMAND_DIRTY

Dirty L2 cache lines evicted by demand

L2_RQSTS.ALL_CODE_RD

L2 code requests

L2_RQSTS.ALL_DEMAND_DATA_RD

Demand Data Read requests

L2_RQSTS.ALL_DEMAND_MISS

Demand requests that miss L2 cache

L2_RQSTS.ALL_DEMAND_REFERENCES

Demand requests to L2 cache

L2_RQSTS.ALL_PF

Requests from L2 hardware prefetchers

L2_RQSTS.ALL_RFO

RFO requests to L2 cache

L2_RQSTS.CODE_RD_HIT

L2 cache hits when fetching instructions, code reads.

L2_RQSTS.CODE_RD_MISS

L2 cache misses when fetching instructions

L2_RQSTS.DEMAND_DATA_RD_HIT

Demand Data Read requests that hit L2 cache

L2_RQSTS.DEMAND_DATA_RD_MISS

Demand Data Read miss L2, no rejects

L2_RQSTS.L2_PF_HIT

L2 prefetch requests that hit L2 cache

L2_RQSTS.L2_PF_MISS

L2 prefetch requests that miss L2 cache

L2_RQSTS.MISS

All requests that miss L2 cache

L2_RQSTS.REFERENCES

All L2 requests

L2_RQSTS.RFO_HIT

RFO requests that hit L2 cache

L2_RQSTS.RFO_MISS

RFO requests that miss L2 cache

L2_TRANS.ALL_PF

L2 or L3 HW prefetches that access L2 cache

L2_TRANS.ALL_REQUESTS

Transactions accessing L2 pipe

L2_TRANS.CODE_RD

L2 cache accesses when fetching instructions

L2_TRANS.DEMAND_DATA_RD

Demand Data Read requests that access L2 cache

L2_TRANS.L1D_WB

L1D writebacks that access L2 cache

L2_TRANS.L2_FILL

L2 fill requests that access L2 cache

L2_TRANS.L2_WB

L2 writebacks that access L2 cache

L2_TRANS.RFO

RFO requests that access L2 cache

LD_BLOCKS.NO_SR

The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use

LD_BLOCKS.STORE_FORWARD

This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued.

LD_BLOCKS_PARTIAL.ADDRESS_ALIAS

Aliasing occurs when a load is issued after a store and their memory addresses are offset by 4K. This event counts the number of loads that aliased with a preceding store, resulting in an extended address check in the pipeline which can have a performance impact.

LOAD_HIT_PRE.HW_PF

Not software-prefetch load dispatches that hit FB allocated for hardware prefetch

LOAD_HIT_PRE.SW_PF

Not software-prefetch load dispatches that hit FB allocated for software prefetch

LOCK_CYCLES.CACHE_LOCK_DURATION

Cycles when L1D is locked

LOCK_CYCLES.SPLIT_LOCK_UC_LOCK_DURATION

Cycles when L1 and L2 are locked due to UC or split lock

LONGEST_LAT_CACHE.MISS

Core-originated cacheable demand requests missed L3

LONGEST_LAT_CACHE.REFERENCE

Core-originated cacheable demand requests that refer to L3

LSD.CYCLES_4_UOPS

Cycles 4 Uops delivered by the LSD, but didn't come from the decoder

LSD.CYCLES_ACTIVE

Cycles Uops delivered by the LSD, but didn't come from the decoder

LSD.UOPS

Number of Uops delivered by the LSD.

MACHINE_CLEARS.COUNT

Number of machine clears (nukes) of any type.

MACHINE_CLEARS.CYCLES

Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes.

MACHINE_CLEARS.MASKMOV

This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0.

MACHINE_CLEARS.MEMORY_ORDERING

This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently.

MACHINE_CLEARS.SMC

This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT

Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM

Retired load uops which data sources were HitM responses from shared L3.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS

This event counts retired load uops that hit in the L3 cache, but required a cross-core snoop which resulted in a HITM (hit modified) in an on-pkg core cache. This does not include hardware prefetches. This is a precise event.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS

This event counts retired load uops that hit in the L3 cache, but required a cross-core snoop which resulted in a HIT in an on-pkg core cache. This does not include hardware prefetches. This is a precise event.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS

Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS

Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_NONE

Retired load uops which data sources were hits in L3 without snoops required.

MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_NONE_PS

Retired load uops which data sources were hits in L3 without snoops required.

MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM

This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches.

MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM_PS

This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. This is a precise event.

MEM_LOAD_UOPS_RETIRED.HIT_LFB

Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready.

MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS

Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready.

MEM_LOAD_UOPS_RETIRED.L1_HIT

Retired load uops with L1 cache hits as data sources.

MEM_LOAD_UOPS_RETIRED.L1_HIT_PS

Retired load uops with L1 cache hits as data sources.

MEM_LOAD_UOPS_RETIRED.L1_MISS

Retired load uops misses in L1 cache as data sources.

MEM_LOAD_UOPS_RETIRED.L1_MISS_PS

This event counts retired load uops in which data sources missed in the L1 cache. This does not include hardware prefetches. This is a precise event.

MEM_LOAD_UOPS_RETIRED.L2_HIT

Retired load uops with L2 cache hits as data sources.

MEM_LOAD_UOPS_RETIRED.L2_HIT_PS

Retired load uops with L2 cache hits as data sources.

MEM_LOAD_UOPS_RETIRED.L2_MISS

Miss in mid-level (L2) cache. Excludes Unknown data-source.

MEM_LOAD_UOPS_RETIRED.L2_MISS_PS

Retired load uops with L2 cache misses as data sources.

MEM_LOAD_UOPS_RETIRED.L3_HIT

Retired load uops which data sources were data hits in L3 without snoops required.

MEM_LOAD_UOPS_RETIRED.L3_HIT_PS

This event counts retired load uops in which data sources were data hits in the L3 cache without snoops required. This does not include hardware prefetches. This is a precise event.

MEM_LOAD_UOPS_RETIRED.L3_MISS

Miss in last-level (L3) cache. Excludes Unknown data-source.

MEM_LOAD_UOPS_RETIRED.L3_MISS_PS

Miss in last-level (L3) cache. Excludes Unknown data-source.

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128

Loads with latency value being above 128

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16

Loads with latency value being above 16

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256

Loads with latency value being above 256

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32

Loads with latency value being above 32

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4

Loads with latency value being above 4

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512

Loads with latency value being above 512

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64

Loads with latency value being above 64

MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8

Loads with latency value being above 8

MEM_UOPS_RETIRED.ALL_LOADS

All retired load uops.

MEM_UOPS_RETIRED.ALL_LOADS_PS

All retired load uops. (precise Event)

MEM_UOPS_RETIRED.ALL_STORES

All retired store uops.

MEM_UOPS_RETIRED.ALL_STORES_PS

This event counts all store uops retired. This is a precise event.

MEM_UOPS_RETIRED.LOCK_LOADS

Retired load uops with locked access.

MEM_UOPS_RETIRED.LOCK_LOADS_PS

Retired load uops with locked access. (precise Event)

MEM_UOPS_RETIRED.SPLIT_LOADS

Retired load uops that split across a cacheline boundary.

MEM_UOPS_RETIRED.SPLIT_LOADS_PS

This event counts load uops retired which had memory addresses spilt across 2 cache lines. A line split is across 64B cache-lines which may include a page split (4K). This is a precise event.

MEM_UOPS_RETIRED.SPLIT_STORES

Retired store uops that split across a cacheline boundary.

MEM_UOPS_RETIRED.SPLIT_STORES_PS

This event counts store uops retired which had memory addresses spilt across 2 cache lines. A line split is across 64B cache-lines which may include a page split (4K). This is a precise event.

MEM_UOPS_RETIRED.STLB_MISS_LOADS

Retired load uops that miss the STLB.

MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS

Retired load uops that miss the STLB. (precise Event)

MEM_UOPS_RETIRED.STLB_MISS_STORES

Retired store uops that miss the STLB.

MEM_UOPS_RETIRED.STLB_MISS_STORES_PS

Retired store uops that miss the STLB. (precise Event)

MISALIGN_MEM_REF.LOADS

Speculative cache line split load uops dispatched to L1 cache

MISALIGN_MEM_REF.STORES

Speculative cache line split STA uops dispatched to L1 cache

MOVE_ELIMINATION.INT_ELIMINATED

Number of integer Move Elimination candidate uops that were eliminated.

MOVE_ELIMINATION.INT_NOT_ELIMINATED

Number of integer Move Elimination candidate uops that were not eliminated.

MOVE_ELIMINATION.SIMD_ELIMINATED

Number of SIMD Move Elimination candidate uops that were eliminated.

MOVE_ELIMINATION.SIMD_NOT_ELIMINATED

Number of SIMD Move Elimination candidate uops that were not eliminated.

OFFCORE_REQUESTS.ALL_DATA_RD

Demand and prefetch data reads

OFFCORE_REQUESTS.DEMAND_CODE_RD

Cacheable and noncachaeble code read requests

OFFCORE_REQUESTS.DEMAND_DATA_RD

Demand Data Read requests sent to uncore

OFFCORE_REQUESTS.DEMAND_RFO

Demand RFO requests including regular RFOs, locks, ItoM

OFFCORE_REQUESTS_BUFFER.SQ_FULL

Offcore requests buffer cannot take more entries for this thread core.

OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD

Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore

OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD

Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore

OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD

Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore

OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO

Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle

OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD

Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle

OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD

Offcore outstanding Demand Data Read transactions in uncore queue.

OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD_GE_6

Cycles with at least 6 offcore outstanding Demand Data Read transactions in uncore queue

OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO

Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore

OFFCORE_RESPONSE

Offcore response can be programmed only with a specific pair of event select and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attributes of the offcore transaction

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_HIT.ANY_RESPONSE

Counts all demand & prefetch code reads that hit in the L3

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all demand & prefetch code reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all demand & prefetch code reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all demand & prefetch code reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_HIT.SNOOP_MISS

Counts all demand & prefetch code reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_MISS.ANY_RESPONSE

Counts all demand & prefetch code reads that miss in the L3

OFFCORE_RESPONSE:request=ALL_CODE_RD:response=L3_MISS.LOCAL_DRAM

Counts all demand & prefetch code reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_HIT.ANY_RESPONSE

Counts all demand & prefetch data reads that hit in the L3

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all demand & prefetch data reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all demand & prefetch data reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all demand & prefetch data reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_HIT.SNOOP_MISS

Counts all demand & prefetch data reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_MISS.ANY_RESPONSE

Counts all demand & prefetch data reads that miss in the L3

OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_MISS.LOCAL_DRAM

Counts all demand & prefetch data reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_HIT.ANY_RESPONSE

Counts all prefetch code reads that hit in the L3

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all prefetch code reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all prefetch code reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all prefetch code reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_HIT.SNOOP_MISS

Counts all prefetch code reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_MISS.ANY_RESPONSE

Counts all prefetch code reads that miss in the L3

OFFCORE_RESPONSE:request=ALL_PF_CODE_RD:response=L3_MISS.LOCAL_DRAM

Counts all prefetch code reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_HIT.ANY_RESPONSE

Counts all prefetch data reads that hit in the L3

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all prefetch data reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all prefetch data reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all prefetch data reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_HIT.SNOOP_MISS

Counts all prefetch data reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_MISS.ANY_RESPONSE

Counts all prefetch data reads that miss in the L3

OFFCORE_RESPONSE:request=ALL_PF_DATA_RD:response=L3_MISS.LOCAL_DRAM

Counts all prefetch data reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_HIT.ANY_RESPONSE

Counts prefetch RFOs that hit in the L3

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_HIT.HITM_OTHER_CORE

Counts prefetch RFOs that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts prefetch RFOs that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_HIT.NO_SNOOP_NEEDED

Counts prefetch RFOs that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_HIT.SNOOP_MISS

Counts prefetch RFOs that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_MISS.ANY_RESPONSE

Counts prefetch RFOs that miss in the L3

OFFCORE_RESPONSE:request=ALL_PF_RFO:response=L3_MISS.LOCAL_DRAM

Counts prefetch RFOs that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_READS:response=L3_HIT.ANY_RESPONSE

Counts all data/code/rfo reads (demand & prefetch) that hit in the L3

OFFCORE_RESPONSE:request=ALL_READS:response=L3_HIT.HITM_OTHER_CORE

Counts all data/code/rfo reads (demand & prefetch) that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_READS:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all data/code/rfo reads (demand & prefetch) that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_READS:response=L3_HIT.NO_SNOOP_NEEDED

Counts all data/code/rfo reads (demand & prefetch) that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_READS:response=L3_HIT.SNOOP_MISS

Counts all data/code/rfo reads (demand & prefetch) that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_READS:response=L3_MISS.ANY_RESPONSE

Counts all data/code/rfo reads (demand & prefetch) that miss in the L3

OFFCORE_RESPONSE:request=ALL_READS:response=L3_MISS.LOCAL_DRAM

Counts all data/code/rfo reads (demand & prefetch) that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_HIT.ANY_RESPONSE

Counts all requests that hit in the L3

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_HIT.HITM_OTHER_CORE

Counts all requests that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all requests that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_HIT.NO_SNOOP_NEEDED

Counts all requests that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_HIT.SNOOP_MISS

Counts all requests that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_MISS.ANY_RESPONSE

Counts all requests that miss in the L3

OFFCORE_RESPONSE:request=ALL_REQUESTS:response=L3_MISS.LOCAL_DRAM

Counts all requests that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_HIT.ANY_RESPONSE

Counts all demand & prefetch RFOs that hit in the L3

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_HIT.HITM_OTHER_CORE

Counts all demand & prefetch RFOs that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all demand & prefetch RFOs that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_HIT.NO_SNOOP_NEEDED

Counts all demand & prefetch RFOs that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_HIT.SNOOP_MISS

Counts all demand & prefetch RFOs that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_MISS.ANY_RESPONSE

Counts all demand & prefetch RFOs that miss in the L3

OFFCORE_RESPONSE:request=ALL_RFO:response=L3_MISS.LOCAL_DRAM

Counts all demand & prefetch RFOs that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_HIT.ANY_RESPONSE

Counts all demand code reads that hit in the L3

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all demand code reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all demand code reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all demand code reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_HIT.SNOOP_MISS

Counts all demand code reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_MISS.ANY_RESPONSE

Counts all demand code reads that miss in the L3

OFFCORE_RESPONSE:request=DEMAND_CODE_RD:response=L3_MISS.LOCAL_DRAM

Counts all demand code reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_HIT.ANY_RESPONSE

Counts demand data reads that hit in the L3

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_HIT.HITM_OTHER_CORE

Counts demand data reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts demand data reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts demand data reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_HIT.SNOOP_MISS

Counts demand data reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_MISS.ANY_RESPONSE

Counts demand data reads that miss in the L3

OFFCORE_RESPONSE:request=DEMAND_DATA_RD:response=L3_MISS.LOCAL_DRAM

Counts demand data reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.ANY_RESPONSE

Counts all demand data writes (RFOs) that hit in the L3

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.HITM_OTHER_CORE

Counts all demand data writes (RFOs) that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all demand data writes (RFOs) that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.NO_SNOOP_NEEDED

Counts all demand data writes (RFOs) that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_MISS

Counts all demand data writes (RFOs) that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_MISS.ANY_RESPONSE

Counts all demand data writes (RFOs) that miss in the L3

OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_MISS.LOCAL_DRAM

Counts all demand data writes (RFOs) that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=OTHER:response=L3_HIT.ANY_RESPONSE

Counts any other requests that hit in the L3

OFFCORE_RESPONSE:request=OTHER:response=L3_HIT.HITM_OTHER_CORE

Counts any other requests that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=OTHER:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts any other requests that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=OTHER:response=L3_HIT.NO_SNOOP_NEEDED

Counts any other requests that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=OTHER:response=L3_HIT.SNOOP_MISS

Counts any other requests that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=OTHER:response=L3_MISS.ANY_RESPONSE

Counts any other requests that miss in the L3

OFFCORE_RESPONSE:request=OTHER:response=L3_MISS.LOCAL_DRAM

Counts any other requests that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_HIT.ANY_RESPONSE

Counts all prefetch (that bring data to LLC only) code reads that hit in the L3

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all prefetch (that bring data to LLC only) code reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all prefetch (that bring data to LLC only) code reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all prefetch (that bring data to LLC only) code reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_HIT.SNOOP_MISS

Counts all prefetch (that bring data to LLC only) code reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_MISS.ANY_RESPONSE

Counts all prefetch (that bring data to LLC only) code reads that miss in the L3

OFFCORE_RESPONSE:request=PF_L2_CODE_RD:response=L3_MISS.LOCAL_DRAM

Counts all prefetch (that bring data to LLC only) code reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_HIT.ANY_RESPONSE

Counts prefetch (that bring data to L2) data reads that hit in the L3

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_HIT.HITM_OTHER_CORE

Counts prefetch (that bring data to L2) data reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts prefetch (that bring data to L2) data reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts prefetch (that bring data to L2) data reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_HIT.SNOOP_MISS

Counts prefetch (that bring data to L2) data reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_MISS.ANY_RESPONSE

Counts prefetch (that bring data to L2) data reads that miss in the L3

OFFCORE_RESPONSE:request=PF_L2_DATA_RD:response=L3_MISS.LOCAL_DRAM

Counts prefetch (that bring data to L2) data reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_HIT.ANY_RESPONSE

Counts all prefetch (that bring data to L2) RFOs that hit in the L3

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_HIT.HITM_OTHER_CORE

Counts all prefetch (that bring data to L2) RFOs that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all prefetch (that bring data to L2) RFOs that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_HIT.NO_SNOOP_NEEDED

Counts all prefetch (that bring data to L2) RFOs that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_HIT.SNOOP_MISS

Counts all prefetch (that bring data to L2) RFOs that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_MISS.ANY_RESPONSE

Counts all prefetch (that bring data to L2) RFOs that miss in the L3

OFFCORE_RESPONSE:request=PF_L2_RFO:response=L3_MISS.LOCAL_DRAM

Counts all prefetch (that bring data to L2) RFOs that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_HIT.ANY_RESPONSE

Counts prefetch (that bring data to LLC only) code reads that hit in the L3

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_HIT.HITM_OTHER_CORE

Counts prefetch (that bring data to LLC only) code reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts prefetch (that bring data to LLC only) code reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts prefetch (that bring data to LLC only) code reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_HIT.SNOOP_MISS

Counts prefetch (that bring data to LLC only) code reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_MISS.ANY_RESPONSE

Counts prefetch (that bring data to LLC only) code reads that miss in the L3

OFFCORE_RESPONSE:request=PF_L3_CODE_RD:response=L3_MISS.LOCAL_DRAM

Counts prefetch (that bring data to LLC only) code reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_HIT.ANY_RESPONSE

Counts all prefetch (that bring data to LLC only) data reads that hit in the L3

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_HIT.HITM_OTHER_CORE

Counts all prefetch (that bring data to LLC only) data reads that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all prefetch (that bring data to LLC only) data reads that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_HIT.NO_SNOOP_NEEDED

Counts all prefetch (that bring data to LLC only) data reads that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_HIT.SNOOP_MISS

Counts all prefetch (that bring data to LLC only) data reads that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_MISS.ANY_RESPONSE

Counts all prefetch (that bring data to LLC only) data reads that miss in the L3

OFFCORE_RESPONSE:request=PF_L3_DATA_RD:response=L3_MISS.LOCAL_DRAM

Counts all prefetch (that bring data to LLC only) data reads that miss the L3 and the data is returned from local dram

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_HIT.ANY_RESPONSE

Counts all prefetch (that bring data to LLC only) RFOs that hit in the L3

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_HIT.HITM_OTHER_CORE

Counts all prefetch (that bring data to LLC only) RFOs that hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_HIT.HIT_OTHER_CORE_NO_FWD

Counts all prefetch (that bring data to LLC only) RFOs that hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_HIT.NO_SNOOP_NEEDED

Counts all prefetch (that bring data to LLC only) RFOs that hit in the L3 and sibling core snoops are not needed as either the core-valid bit is not set or the shared line is present in multiple cores

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_HIT.SNOOP_MISS

Counts all prefetch (that bring data to LLC only) RFOs that hit in the L3 and the snoops sent to sibling cores return clean response

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_MISS.ANY_RESPONSE

Counts all prefetch (that bring data to LLC only) RFOs that miss in the L3

OFFCORE_RESPONSE:request=PF_L3_RFO:response=L3_MISS.LOCAL_DRAM

Counts all prefetch (that bring data to LLC only) RFOs that miss the L3 and the data is returned from local dram

OTHER_ASSISTS.ANY_WB_ASSIST

Number of times any microcode assist is invoked by HW upon uop writeback.

OTHER_ASSISTS.AVX_TO_SSE

Number of transitions from AVX-256 to legacy SSE when penalty applicable.

OTHER_ASSISTS.SSE_TO_AVX

Number of transitions from SSE to AVX-256 when penalty applicable.

PAGE_WALKER_LOADS.DTLB_L1

Number of DTLB page walker hits in the L1+FB

PAGE_WALKER_LOADS.DTLB_L2

Number of DTLB page walker hits in the L2

PAGE_WALKER_LOADS.DTLB_L3

Number of DTLB page walker hits in the L3 + XSNP

PAGE_WALKER_LOADS.DTLB_MEMORY

Number of DTLB page walker hits in Memory

PAGE_WALKER_LOADS.EPT_DTLB_L1

Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB.

PAGE_WALKER_LOADS.EPT_DTLB_L2

Counts the number of Extended Page Table walks from the DTLB that hit in the L2.

PAGE_WALKER_LOADS.EPT_DTLB_L3

Counts the number of Extended Page Table walks from the DTLB that hit in the L3.

PAGE_WALKER_LOADS.EPT_DTLB_MEMORY

Counts the number of Extended Page Table walks from the DTLB that hit in memory.

PAGE_WALKER_LOADS.EPT_ITLB_L1

Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB.

PAGE_WALKER_LOADS.EPT_ITLB_L2

Counts the number of Extended Page Table walks from the ITLB that hit in the L2.

PAGE_WALKER_LOADS.EPT_ITLB_L3

Counts the number of Extended Page Table walks from the ITLB that hit in the L2.

PAGE_WALKER_LOADS.EPT_ITLB_MEMORY

Counts the number of Extended Page Table walks from the ITLB that hit in memory.

PAGE_WALKER_LOADS.ITLB_L1

Number of ITLB page walker hits in the L1+FB

PAGE_WALKER_LOADS.ITLB_L2

Number of ITLB page walker hits in the L2

PAGE_WALKER_LOADS.ITLB_L3

Number of ITLB page walker hits in the L3 + XSNP

PAGE_WALKER_LOADS.ITLB_MEMORY

Number of ITLB page walker hits in Memory

RESOURCE_STALLS.ANY

Resource-related stall cycles

RESOURCE_STALLS.ROB

Cycles stalled due to re-order buffer full.

RESOURCE_STALLS.RS

Cycles stalled due to no eligible RS entry available.

RESOURCE_STALLS.SB

This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available.

ROB_MISC_EVENTS.LBR_INSERTS

Count cases of saving new LBR

RS_EVENTS.EMPTY_CYCLES

This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end.

RS_EVENTS.EMPTY_END

Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues.

RTM_RETIRED.ABORTED

Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).

RTM_RETIRED.ABORTED_MISC1

Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts)

RTM_RETIRED.ABORTED_MISC2

Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts).

RTM_RETIRED.ABORTED_MISC3

Number of times an RTM execution aborted due to HLE-unfriendly instructions

RTM_RETIRED.ABORTED_MISC4

Number of times an RTM execution aborted due to incompatible memory type

RTM_RETIRED.ABORTED_MISC5

Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt)

RTM_RETIRED.ABORTED_PS

Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).

RTM_RETIRED.COMMIT

Number of times an RTM execution successfully committed

RTM_RETIRED.START

Number of times an RTM execution started.

TLB_FLUSH.DTLB_THREAD

DTLB flush attempts of the thread-specific entries

TLB_FLUSH.STLB_ANY

STLB flush attempts

TX_EXEC.MISC1

Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.

TX_EXEC.MISC2

Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region

TX_EXEC.MISC3

Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded

TX_EXEC.MISC4

Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region.

TX_EXEC.MISC5

Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region

TX_MEM.ABORT_CAPACITY_WRITE

Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes.

TX_MEM.ABORT_CONFLICT

Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address

TX_MEM.ABORT_HLE_ELISION_BUFFER_MISMATCH

Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer

TX_MEM.ABORT_HLE_ELISION_BUFFER_NOT_EMPTY

Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero.

TX_MEM.ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT

Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer.

TX_MEM.ABORT_HLE_STORE_TO_ELIDED_LOCK

Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer

TX_MEM.HLE_ELISION_BUFFER_FULL

Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.

UNC_ARB_COH_TRK_OCCUPANCY.All

Each cycle count number of valid entries in Coherency Tracker queue from allocation till deallocation. Aperture requests (snoops) appear as NC decoded internally and become coherent (snoop L3, access memory)

UNC_ARB_COH_TRK_REQUESTS.ALL

Number of entries allocated. Account for Any type: e.g. Snoop, Core aperture, etc.

UNC_ARB_TRK_OCCUPANCY.ALL

Each cycle count number of all Core outgoing valid entries. Such entry is defined as valid from it's allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.

UNC_ARB_TRK_REQUESTS.ALL

Total number of Core outgoing entries allocated. Accounts for Coherent and non-coherent traffic.

UNC_ARB_TRK_REQUESTS.WRITES

Number of Writes allocated - any write transactions: full/partials writes and evictions.

UNC_CBO_CACHE_LOOKUP.ANY_ES

L3 Lookup any request that access cache and found line in E or S-state

UNC_CBO_CACHE_LOOKUP.ANY_I

L3 Lookup any request that access cache and found line in I-state

UNC_CBO_CACHE_LOOKUP.ANY_M

L3 Lookup any request that access cache and found line in M-state

UNC_CBO_CACHE_LOOKUP.ANY_MESI

L3 Lookup any request that access cache and found line in MESI-state

UNC_CBO_CACHE_LOOKUP.EXTSNP_ES

L3 Lookup external snoop request that access cache and found line in E or S-state

UNC_CBO_CACHE_LOOKUP.EXTSNP_I

L3 Lookup external snoop request that access cache and found line in I-state

UNC_CBO_CACHE_LOOKUP.EXTSNP_M

L3 Lookup external snoop request that access cache and found line in M-state

UNC_CBO_CACHE_LOOKUP.EXTSNP_MESI

L3 Lookup external snoop request that access cache and found line in MESI-state

UNC_CBO_CACHE_LOOKUP.READ_ES

L3 Lookup read request that access cache and found line in E or S-state

UNC_CBO_CACHE_LOOKUP.READ_I

L3 Lookup read request that access cache and found line in I-state

UNC_CBO_CACHE_LOOKUP.READ_M

L3 Lookup read request that access cache and found line in M-state

UNC_CBO_CACHE_LOOKUP.READ_MESI

L3 Lookup read request that access cache and found line in any MESI-state

UNC_CBO_CACHE_LOOKUP.WRITE_ES

L3 Lookup write request that access cache and found line in E or S-state

UNC_CBO_CACHE_LOOKUP.WRITE_I

L3 Lookup write request that access cache and found line in I-state

UNC_CBO_CACHE_LOOKUP.WRITE_M

L3 Lookup write request that access cache and found line in M-state

UNC_CBO_CACHE_LOOKUP.WRITE_MESI

L3 Lookup write request that access cache and found line in MESI-state

UNC_CBO_XSNP_RESPONSE.HITM_EVICTION

A cross-core snoop resulted from L3 Eviction which hits a modified line in some processor core.

UNC_CBO_XSNP_RESPONSE.HITM_EXTERNAL

An external snoop hits a modified line in some processor core.

UNC_CBO_XSNP_RESPONSE.HITM_XCORE

A cross-core snoop initiated by this Cbox due to processor core memory request which hits a modified line in some processor core.

UNC_CBO_XSNP_RESPONSE.HIT_EVICTION

A cross-core snoop resulted from L3 Eviction which hits a non-modified line in some processor core.

UNC_CBO_XSNP_RESPONSE.HIT_EXTERNAL

An external snoop hits a non-modified line in some processor core.

UNC_CBO_XSNP_RESPONSE.HIT_XCORE

A cross-core snoop initiated by this Cbox due to processor core memory request which hits a non-modified line in some processor core.

UNC_CBO_XSNP_RESPONSE.MISS_EVICTION

A cross-core snoop resulted from L3 Eviction which misses in some processor core.

UNC_CBO_XSNP_RESPONSE.MISS_EXTERNAL

An external snoop misses in some processor core.

UNC_CBO_XSNP_RESPONSE.MISS_XCORE

A cross-core snoop initiated by this Cbox due to processor core memory request which misses in some processor core.

UNC_CLOCK.SOCKET

This 48-bit fixed counter counts the UCLK cycles

UOPS_DISPATCHED_PORT.PORT_0

Cycles per thread when uops are executed in port 0

UOPS_DISPATCHED_PORT.PORT_1

Cycles per thread when uops are executed in port 1

UOPS_DISPATCHED_PORT.PORT_2

Cycles per thread when uops are executed in port 2

UOPS_DISPATCHED_PORT.PORT_3

Cycles per thread when uops are executed in port 3

UOPS_DISPATCHED_PORT.PORT_4

Cycles per thread when uops are executed in port 4

UOPS_DISPATCHED_PORT.PORT_5

Cycles per thread when uops are executed in port 5

UOPS_DISPATCHED_PORT.PORT_6

Cycles per thread when uops are executed in port 6

UOPS_DISPATCHED_PORT.PORT_7

Cycles per thread when uops are executed in port 7

UOPS_EXECUTED.CORE

Number of uops executed on the core.

UOPS_EXECUTED.CORE_CYCLES_GE_1

Cycles at least 1 micro-op is executed from any thread on physical core

UOPS_EXECUTED.CORE_CYCLES_GE_2

Cycles at least 2 micro-op is executed from any thread on physical core

UOPS_EXECUTED.CORE_CYCLES_GE_3

Cycles at least 3 micro-op is executed from any thread on physical core

UOPS_EXECUTED.CORE_CYCLES_GE_4

Cycles at least 4 micro-op is executed from any thread on physical core

UOPS_EXECUTED.CORE_CYCLES_NONE

Cycles with no micro-ops executed from any thread on physical core

UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC

This events counts the cycles where at least one uop was executed. It is counted per thread.

UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC

This events counts the cycles where at least two uop were executed. It is counted per thread.

UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC

This events counts the cycles where at least three uop were executed. It is counted per thread.

UOPS_EXECUTED.CYCLES_GE_4_UOPS_EXEC

Cycles where at least 4 uops were executed per-thread

UOPS_EXECUTED.STALL_CYCLES

Counts number of cycles no uops were dispatched to be executed on this thread.

UOPS_EXECUTED_PORT.PORT_0

Cycles per thread when uops are executed in port 0

UOPS_EXECUTED_PORT.PORT_0_CORE

Cycles per core when uops are exectuted in port 0

UOPS_EXECUTED_PORT.PORT_1

Cycles per thread when uops are executed in port 1

UOPS_EXECUTED_PORT.PORT_1_CORE

Cycles per core when uops are exectuted in port 1

UOPS_EXECUTED_PORT.PORT_2

Cycles per thread when uops are executed in port 2

UOPS_EXECUTED_PORT.PORT_2_CORE

Cycles per core when uops are dispatched to port 2

UOPS_EXECUTED_PORT.PORT_3

Cycles per thread when uops are executed in port 3

UOPS_EXECUTED_PORT.PORT_3_CORE

Cycles per core when uops are dispatched to port 3

UOPS_EXECUTED_PORT.PORT_4

Cycles per thread when uops are executed in port 4

UOPS_EXECUTED_PORT.PORT_4_CORE

Cycles per core when uops are exectuted in port 4

UOPS_EXECUTED_PORT.PORT_5

Cycles per thread when uops are executed in port 5

UOPS_EXECUTED_PORT.PORT_5_CORE

Cycles per core when uops are exectuted in port 5

UOPS_EXECUTED_PORT.PORT_6

Cycles per thread when uops are executed in port 6

UOPS_EXECUTED_PORT.PORT_6_CORE

Cycles per core when uops are exectuted in port 6

UOPS_EXECUTED_PORT.PORT_7

Cycles per thread when uops are executed in port 7

UOPS_EXECUTED_PORT.PORT_7_CORE

Cycles per core when uops are dispatched to port 7

UOPS_ISSUED.ANY

This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops.

UOPS_ISSUED.CORE_STALL_CYCLES

Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads

UOPS_ISSUED.FLAGS_MERGE

Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch.

UOPS_ISSUED.SINGLE_MUL

Number of Multiply packed/scalar single precision uops allocated

UOPS_ISSUED.SLOW_LEA

Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not.

UOPS_ISSUED.STALL_CYCLES

Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread

UOPS_RETIRED.ALL

Actually retired uops.

UOPS_RETIRED.ALL_PS

Actually retired uops.

UOPS_RETIRED.CORE_STALL_CYCLES

Cycles without actually retired uops.

UOPS_RETIRED.RETIRE_SLOTS

This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.

UOPS_RETIRED.RETIRE_SLOTS_PS

Retirement slots used.

UOPS_RETIRED.STALL_CYCLES

Cycles without actually retired uops.

UOPS_RETIRED.TOTAL_CYCLES

Cycles with less than 10 actually retired uops.