Intel® C++ Compiler 16.0 User and Reference Guide
Intel Graphics Technology enables you to isolate a part of level 3 cache and use it as high-bandwidth memory explicitly addressable by program code. This memory is called shared local memory (SLM). SLM is useful for storing data that is frequently accessed by multiple threads in a group. The data allocated in SLM is completely protected from level 3 cache misses. Each thread group is assigned its own portion of SLM. You can access SLM programmatically and share it between threads within the same thread group, where a thread group is a set of threads sharing some common hardware-defined characteristics, including the same hardware thread group id, the same synchronization domain for the thread group barrier and others.
In comparison to main memory SLM offers increased bandwidth, lower latency, and improved performance for gather and scatter operations. There is up to 64Kb of SLM per half-slice, half the size of level 3 cache. A half-slice is the basic hardware building block: different HD graphics configurations differ in the number of half slices: 1 for GT1, 2 for GT2, 4 for GT3.
To use shared local memory, you need to understand:
extensions to the programming model
the syntax for using shared local memory, using Intel® Cilk™ Plus
semantics and restrictions