Intel® C++ Compiler 16.0 User and Reference Guide
There are two types of SVRNG functions: the initialization and service routines and generation functions. The initialization and service routines introduce two new data types:
svrng_engine_t |
A pointer to the engine-specific data structure created by the engine initialization routine. The structure contains pre-computed constants necessary for fast and precise random number vector generation by the engine. The structure size is engine-dependent. |
svrng_distribution_t |
A pointer to the distribution-specific data structure created by the distribution initialization routine. The structure contains pre-computed loop invariant constants to perform distribution transformations efficiently. The structure size is distribution-dependent. |
While scalar SVRNG generation functions return native "C" data types (float, double, 32-bit and 64-bit integers), the SIMD-vector versions produce 1 1 , 2, 4, 8, 16, or 32 packed results in one or several SIMD-vector registers. A set of SVRNG-specific vector types have been introduced to return these packed results. These types are CPU-specific and mapped to different numbers of SIMD-registers depending on the architecture where the program runs:
Type name | Number of packed values | SSE2 (default) | AVX22 | MIC | MIC AVX512 / CPU AVX512 |
Unsigned 32-bit integer |
|||||
svrng_uint1_t |
1 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_uint2_t |
2 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_uint4_t |
4 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_uint8_t |
8 |
struct { __m128i r[2]; } |
__m256i |
__m512i |
__m256i |
svrng_uint16_t |
16 |
struct { __m128i r[4]; } |
struct { __m256i r[2]; } |
__m512i |
__m512i |
svrng_uint32_t |
32 |
struct { __m128i r[8]; } |
struct { __m256i r[4]; } |
struct { __m512i r[2]; } |
struct { __m512i r[2]; } |
Unsigned 64-bit integer |
|||||
svrng_ulong1_t |
1 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_ulong2_t |
2 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_ulong4_t |
4 |
struct { __m128i r[2]; } |
__m256i |
__m512i |
__m256i |
svrng_ulong8_t |
8 |
struct { __m128i r[4]; } |
struct { __m256i r[2]; } |
__m512i |
__m512i |
svrng_ulong16_t |
16 |
struct { __m128i r[8]; } |
struct { __m256i r[4]; } |
struct { __m512i r[2]; } |
struct { __m512i r[2]; } |
svrng_ulong32_t |
32 |
struct { __m128i r[16]; } |
struct { __m256i r[8]; } |
struct { __m512i r[4]; } |
struct { __m512i r[4]; } |
Signed 32-bit integer |
|||||
svrng_int1_t |
1 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_int2_t |
2 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_int4_t |
4 |
__m128i |
__m128i |
__m512i |
__m128i |
svrng_int8_t |
8 |
struct { __m128i r[2]; } |
__m256i |
__m512i |
__m256i |
svrng_int16_t |
16 |
struct { __m128i r[4]; } |
struct { __m256i r[2]; } |
__m512i |
__m512i |
svrng_int32_t |
32 |
struct { __m128i r[8]; } |
struct { __m256i r[4]; } |
struct { __m512i r[2]; } |
struct { __m512i r[2]; } |
Single-precision floating point |
|||||
svrng_float1_t |
1 |
__m128 |
__m128 |
__m512 |
__m128 |
svrng_float2_t |
2 |
__m128 |
__m128 |
__m512 |
__m128 |
svrng_float4_t |
4 |
__m128 |
__m128 |
__m512 |
__m128 |
svrng_float8_t |
8 |
struct { __m128 r[2]; } |
__m256 |
__m512 |
__m256 |
svrng_float16_t |
16 |
struct { __m128 r[4]; } |
struct { __m256 r[2]; } |
__m512 |
__m512 |
svrng_float32_t |
32 |
struct { __m128 r[8]; } |
struct { __m256 r[4]; } |
struct { __m512 r[2]; } |
struct { __m512 r[2]; } |
Double-precision floating point |
|||||
svrng_double1_t |
1 |
__m128d |
__m128d |
__m512d |
__m128d |
svrng_double2_t |
2 |
__m128d |
__m128d |
__m512d |
__m128d |
svrng_double4_t |
4 |
struct { __m128d r[2]; } |
__m256d |
__m512d |
__m256d |
svrng_double8_t |
8 |
struct { __m128d r[4]; } |
struct { __m256d r[2]; } |
__m512d |
__m512d |
svrng_double16_t |
16 |
struct { __m128d r[8]; } |
struct { __m256d r[4]; } |
struct { __m512d r[2]; } |
struct { __m512d r[2]; } |
svrng_double32_t |
32 |
struct { __m128d r[16]; } |
struct { __m256d r[8]; } |
struct { __m512d r[4]; } |
struct { __m512d r[4]; } |
1 SIMD-functions with 1 "packed" result have been added mostly for consistency and easy transfer of manually-written intrinsic pieces of code to different length sections (including length=1) — in other words, for "peeling", "unrolling", or "clean-up".
2 Note that SVRNG does not have optimizations specific to the Intel® Advanced Vector Extensions (Intel® AVX) instruction set. On hardware that supports Intel® AVX the Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instruction default versions are called, so you must use the Intel® SSE2 data types to interpret the results.
All SVRNG routines use the regcall calling convention which provides the most use of hardware vector registers for passing parameters and returning results. See the "C/C++ Calling Conventions" section and the "_vectorcall and __regcall demystified" article referenced in the Introduction. This avoids unnecessary memory spills and fills of registers and improves performance.
In addition this convention provides the opportunity to deploy the "vector variant" declaration feature specific to the Intel® compiler. The declaration specifies a vector variant function that corresponds to its original C/C++ scalar function. This vector variant function can be invoked in vector context at the site of the call. See the vector_variant section for more detail. All SIMD-vector SVRNG intrinsics ( except packed length = 1 ) are declared in the svrng.h header file as vector_variant to support automatic vectorization.