Intel® C++ Compiler 16.0 User and Reference Guide
This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
The offload runtime system maintains a section of memory at the same virtual address on the CPU and the coprocessor. The keyword _Cilk_shared enables you to use this shared memory as follows:
It places variables in this shared memory address range.
It specifies that a function is defined on both the CPU and the coprocessor.
The compiler allocates shared variables such that:
their virtual addresses are the same on the CPU and the coprocessor
their values are synchronized between the CPU and the coprocessor at a predefined point
Pointers to shared variables have the same value on the CPU and the coprocessor, because shared variables have the same addresses. So offloaded code can easily operate on linked data structures. Memory is synchronized between the CPU and the coprocessor only at offload call sites.
You cannot conditionally control shared variables when you compile them to run on Intel® MIC Architecture.
When a variable is marked _Cilk_shared, the memory allocation for the variable is dynamic instead of static, and only the host allocates the memory, within the memory space that the host and the coprocessor share. When the program is compiled, the host generates code to create the shared memory dynamically, so if the host does not see the shared variable, it doesn't allocate any memory for the variable. Since the memory is not allocated in the shared space, when the coprocessor tries to access this memory an illegal access occurs.
Because the host allocates memory for the shared variable, the variable must be visible on the host even if the host does not use the variable.
For example, the following code is incorrect because the variable is only conditionally compiled for the coprocessor. Therefore, no code is generated on the host to allocate memory for the variable.
#ifdef __MIC__ _Cilk_shared int res; #endif
If you compile the variable conditionally for the coprocessor, the _Cilk_shared keyword is not necessary because the host does not access the variable. Use _Cilk_shared only when sharing data between the host and the coprocessor.
By default, the compiler builds two binary files:
A CPU version: Includes all functions in the source code, whether marked _Cilk_shared or not.
A target version: Includes only functions marked _Cilk_shared in the source code.
Use the negative form of the [Q]offload compiler option to build only the CPU version.
The functions described below are available for allocating and freeing shared memory to work with the _Cilk_shared and _Cilk_offload keywords. These functions revert to the standard malloc or free versions if Intel® MIC Architecture-based hardware is not installed in the system, or if the Intel® Manycore Platform Software Stack (Intel® MPSS) is not loaded.
void *_Offload_shared_malloc(size_t size);
void *_Offload_shared_aligned_malloc(size_t size, size_t alignment);
_Offload_shared_free(void *p);
_Offload_shared_aligned_free(void *p);
By default, containers provided in the standard C++ library allocate non-shared memory. When an object of such a container is marked _Cilk_shared, its data members are allocated in shared memory. However, any memory that its data accesses is not shared. To enable sharing of this memory, use the shared_allocator<T> class template. The shared_allocator<T> class template is defined in offload.h.
#pragma offload_attribute (push, _Cilk_shared) #include <vector> #include <offload.h> #pragma offload_attribute (pop) #include <stdio.h> using namespace std; // typedef vector to use the offload shared allocator typedef vector<int, __offload::shared_allocator<int> > shared_vec_int; _Cilk_shared shared_vec_int * _Cilk_shared v; _Cilk_shared int test_result() { int result = 1; for (int i = 0; i < 5; i++) { if ((*v)[i] != i) { result = 0; } } return result; } int main() { int result; // Use placement new to construct an object in the shared memory space. v = new (_Offload_shared_malloc(sizeof(vector<int>))) _Cilk_shared vector<int, __offload::shared_allocator<int> >(5); for (int i = 0; i < 5; i++) { (*v)[i] = i; } result = _Cilk_offload test_result(); if (result != 1) printf("Failed\n"); else printf("Passed\n"); return 0; }