Data Types and Calling Conventions

Data types specific to the Short Vector Random Number Generator (SVRNG) Library

There are two types of SVRNG functions: the initialization and service routines and generation functions. The initialization and service routines introduce two new data types:

svrng_engine_t	A pointer to the engine-specific data structure created by the engine initialization routine. The structure contains pre-computed constants necessary for fast and precise random number vector generation by the engine. The structure size is engine-dependent.
svrng_distribution_t	A pointer to the distribution-specific data structure created by the distribution initialization routine. The structure contains pre-computed loop invariant constants to perform distribution transformations efficiently. The structure size is distribution-dependent.

While scalar SVRNG generation functions return native "C" data types (float, double, 32-bit and 64-bit integers), the SIMD-vector versions produce 1¹ , 2, 4, 8, 16, or 32 packed results in one or several SIMD-vector registers. A set of SVRNG-specific vector types have been introduced to return these packed results. These types are CPU-specific and mapped to different numbers of SIMD-registers depending on the architecture where the program runs:

Type name	Number of packed values	SSE2 (default)	AVX2²	MIC	MIC AVX512 / CPU AVX512
Unsigned 32-bit integer
*svrng_uint1_t*	1	__m128i	__m128i	__m512i	__m128i
*svrng_uint2_t*	2	__m128i	__m128i	__m512i	__m128i
*svrng_uint4_t*	4	__m128i	__m128i	__m512i	__m128i
*svrng_uint8_t*	8	struct { __m128i r[2]; }	__m256i	__m512i	__m256i
*svrng_uint16_t*	16	struct { __m128i r[4]; }	struct { __m256i r[2]; }	__m512i	__m512i
*svrng_uint32_t*	32	struct { __m128i r[8]; }	struct { __m256i r[4]; }	struct { __m512i r[2]; }	struct { __m512i r[2]; }
Unsigned 64-bit integer
*svrng_ulong1_t*	1	__m128i	__m128i	__m512i	__m128i
*svrng_ulong2_t*	2	__m128i	__m128i	__m512i	__m128i
*svrng_ulong4_t*	4	struct { __m128i r[2]; }	__m256i	__m512i	__m256i
*svrng_ulong8_t*	8	struct { __m128i r[4]; }	struct { __m256i r[2]; }	__m512i	__m512i
*svrng_ulong16_t*	16	struct { __m128i r[8]; }	struct { __m256i r[4]; }	struct { __m512i r[2]; }	struct { __m512i r[2]; }
*svrng_ulong32_t*	32	struct { __m128i r[16]; }	struct { __m256i r[8]; }	struct { __m512i r[4]; }	struct { __m512i r[4]; }
Signed 32-bit integer
*svrng_int1_t*	1	__m128i	__m128i	__m512i	__m128i
*svrng_int2_t*	2	__m128i	__m128i	__m512i	__m128i
*svrng_int4_t*	4	__m128i	__m128i	__m512i	__m128i
*svrng_int8_t*	8	struct { __m128i r[2]; }	__m256i	__m512i	__m256i
*svrng_int16_t*	16	struct { __m128i r[4]; }	struct { __m256i r[2]; }	__m512i	__m512i
*svrng_int32_t*	32	struct { __m128i r[8]; }	struct { __m256i r[4]; }	struct { __m512i r[2]; }	struct { __m512i r[2]; }
Single-precision floating point
*svrng_float1_t*	1	__m128	__m128	__m512	__m128
*svrng_float2_t*	2	__m128	__m128	__m512	__m128
*svrng_float4_t*	4	__m128	__m128	__m512	__m128
*svrng_float8_t*	8	struct { __m128 r[2]; }	__m256	__m512	__m256
*svrng_float16_t*	16	struct { __m128 r[4]; }	struct { __m256 r[2]; }	__m512	__m512
*svrng_float32_t*	32	struct { __m128 r[8]; }	struct { __m256 r[4]; }	struct { __m512 r[2]; }	struct { __m512 r[2]; }
Double-precision floating point
*svrng_double1_t*	1	__m128d	__m128d	__m512d	__m128d
*svrng_double2_t*	2	__m128d	__m128d	__m512d	__m128d
*svrng_double4_t*	4	struct { __m128d r[2]; }	__m256d	__m512d	__m256d
*svrng_double8_t*	8	struct { __m128d r[4]; }	struct { __m256d r[2]; }	__m512d	__m512d
*svrng_double16_t*	16	struct { __m128d r[8]; }	struct { __m256d r[4]; }	struct { __m512d r[2]; }	struct { __m512d r[2]; }
*svrng_double32_t*	32	struct { __m128d r[16]; }	struct { __m256d r[8]; }	struct { __m512d r[4]; }	struct { __m512d r[4]; }

¹ SIMD-functions with 1 "packed" result have been added mostly for consistency and easy transfer of manually-written intrinsic pieces of code to different length sections (including length=1) — in other words, for "peeling", "unrolling", or "clean-up".

² Note that SVRNG does not have optimizations specific to the Intel® Advanced Vector Extensions (Intel® AVX) instruction set. On hardware that supports Intel® AVX the Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instruction default versions are called, so you must use the Intel® SSE2 data types to interpret the results.

SVRNG calling conventions

All SVRNG routines use the regcall calling convention which provides the most use of hardware vector registers for passing parameters and returning results. See the "C/C++ Calling Conventions" section and the "_vectorcall and __regcall demystified" article referenced in the Introduction. This avoids unnecessary memory spills and fills of registers and improves performance.

In addition this convention provides the opportunity to deploy the "vector variant" declaration feature specific to the Intel® compiler. The declaration specifies a vector variant function that corresponds to its original C/C++ scalar function. This vector variant function can be invoked in vector context at the site of the call. See the vector_variant section for more detail. All SIMD-vector SVRNG intrinsics ( except packed length = 1 ) are declared in the svrng.h header file as vector_variant to support automatic vectorization.

Data Types and Calling Conventions

Data types specific to the Short Vector Random Number Generator (SVRNG) Library

SVRNG calling conventions

See Also