Intel® C++ Compiler 16.0 User and Reference Guide
The Glossary is an alphabetical list of important terms used in this programmer's guide and gives brief explanations and definitions.
atomicIndivisible. An instruction sequence executed by a strand is atomic if it appears at any moment to any other strand as if either no instructions in the sequence have been executed or all instructions in the sequence have been executed.
chip multiprocessorA general-purpose multiprocessor implemented as a single multicore chip.
cilk_forA keyword that indicates a for loop whose iterations can be executed independently in parallel.
cilk_spawnA keyword that indicates that the named subroutine can execute independently and in parallel with the caller.
cilk_syncA keyword that indicates that all functions spawned within the current function must complete before statements following the cilk_sync can be executed.
commutative operationAn operation (op), over a type (T), is commutative if a op b = b op a for any two objects, a and b, of type T. Integer addition and set union are commutative, but string concatenation is not.
concurrent agentA processor, process, thread, or other entity that executes a program instruction potentially simultaneously with other similar concurrent agents.
coreA single processor unit of a multicore chip. The terms "processor" and "CPU" are often used in place of "core", although industry usage varies.
CPU"Central Processing Unit"; a synonym for "core", or a single processor of a multicore chip.
critical sectionThe code executed by a strand while holding a lock.
critical-path lengthSee span.
data raceA race condition that occurs when two or more parallel strands, holding no lock in common, access the same memory location and at least one of the strands performs a write. Compare with determinacy race.
deadlockA situation when two or more strand instances are each waiting for another to release a resource, and the "waiting-for" relation forms a cycle so that none can ever proceed.
determinacy raceA race condition that occurs when two parallel strands access the same memory location and at least one strand performs a write.
determinismThe property of a program when it behaves identically from run to run when executed on the same inputs. Deterministic programs are usually easier to debug.
distributed memoryComputer storage that is partitioned among several processors. A distributed-memory multiprocessor is a computer in which processors must send messages to remote processors to access data in remote processor memory. Contrast with shared memory.
execution timeHow long a program takes to execute on a given computer system. Also called running time.
false sharingThe situation that occurs when two strands access different memory locations residing on the same cache block, thereby contending for the cache block.
global variableA variable that is bound outside of all local scopes. See also nonlocal variable.
hyperobjectAn object which interacts with the Intel® Cilk™ Plus runtime to manage the creation, interaction, and destruction of a set of related data structures as parallel strands are spawned and synced. Hyperobjects include reducers and holders.
instructionA single operation executed by a processor.
linear speedupSpeedup proportional to the processor count. See also perfect linear speedup.
lockA synchronization mechanism for providing atomic operation by limiting concurrent access to a resource. Important operations on locks include acquire (lock) and release (unlock). Many locks are implemented as a mutex, whereby only one strand can hold the lock at any time.
lock contentionThe situation wherein multiple strands vie for the same lock.
multicoreA semiconductor chip containing more than one processor core.
multiprocessorA computer containing multiple general-purpose processors.
mutexA "mutually exclusive" lock that only one strand can acquire at a time, thereby ensuring that only one strand executes the critical section protected by the mutex at a time. Windows* supports several types of locks, including the CRITICAL_SECTION. Linux* supports Pthreads pthread_mutex_t objects.
nondeterminismThe property of a program when it behaves differently from run to run when executed on exactly the same inputs. Nondeterministic programs are usually hard to debug.
nonlocal variableA program variable that is bound outside of the scope of the function, method, or class in which it is used. In Intel® Cilk™ Plus programs, this term refers to variables with a scope outside a cilk_for loop.
parallel loopA for loop all of whose iterations can be run independently in parallel. The cilk_for keyword designates a parallel loop.
parallelismThe ratio of work to span, which is the largest speedup an application could possibly attain when run on an infinite number of processors.
pedigreeA sequence of 64-bit integers used to uniquely identify a maximal strand.
perfect linear speedupSpeedup equal to the processor count. See also linear speedup.
processA concurrent agent that has its own address space and is managed by the operating system. Memory can be shared among processes only through the use of explicit operating system calls.
processorA processor implements the logic to execute program instructions sequentially; the term "core" is used as a synonym.
race conditionA source of nondeterminism whereby the result of a concurrent computation depends on the timing or relative order of the execution of instructions in each individual strand.
receiverA variable to receive the result of the function call.
reducerA hyperobject used in place of an ordinary variable to allow a parallel associative computation such as an arithmetic sum, string concatenation, or list creation without data races.
response timeThe time it takes to execute a computation from the time a human user provides an input to the time the user gets the result.
running timeHow long a program takes to execute on a given computer system. Also called execution time.
scale downThe ability of a parallel application to run efficiently on one or a small number of processors.
scale outThe ability to run multiple copies of an application efficiently on a large number of processors.
scale upThe ability of a parallel application to run efficiently on a large number of processors. See also linear speedup.
sequential consistencyThe memory model for concurrency wherein the effect of concurrent agents is as if their operations on shared memory were interleaved in a global order consistent with the orders in which each agent executed them.
serial executionExecution of the serialization of an Intel® Cilk™ Plus program.
serial semanticsA deterministic Intel® Cilk™ Plus program will produce the same results as the serialization of the program. This is referred to as “serial semantics.”
serializationThe C/C++ program that results from stubbing out the Intel® Cilk™ Plus keywords of a program, where cilk_spawn and cilk_sync are elided and cilk_for is replaced with an ordinary for. The serialization can be used for debugging and, in the case of a converted C/C++ program, will behave exactly as the original C/C++ program. The term "serial elision" is used in some of the literature.
shared memoryComputer storage that is shared among several processors. A shared-memory multiprocessor is a computer in which each processor can directly address any memory location. Contrast with distributed memory.
spanThe theoretically fastest execution time for a parallel program when run on an infinite number of processors, discounting overheads for communication and scheduling. Often denoted by T∞ in the literature, and sometimes called critical-path length.
spawnTo call a function without waiting for it to return, as in a normal call. The caller can continue to execute in parallel with the called function. See also cilk_spawn.
speedupHow many times faster a program is when run in parallel than when run on one processor. Speedup can be computed by dividing the running time T P of the program on P processors by its running time T1 on one processor.
strandA serial chain of instructions containing no spawns, syncs, returns from spawn, or other parallel control.
syncTo wait for a set of spawned functions to return before proceeding. The current function is dependent upon the spawned functions and cannot proceed in parallel with them. See also cilk_sync.
threadA concurrent agent that shares an address space with other threads within the same process. Scheduling of threads is typically managed by the operating system.
throughputA number of operations performed per unit time.
viewThe state of a hyperobject as seen by a given strand.
workThe running time of a program when run on one processor, sometimes denoted by T1.
work stealingA scheduling strategy where processors post parallel work locally and, when a processor runs out of local work, it steals work from another processor. Work-stealing schedulers are notable for their efficiency, because they incur no communication or synchronization overhead when there is ample parallelism. The Intel® Cilk™ Plus runtime system employs a work-stealing scheduler.
workerA concurrent agent that executes the instructions in one strand, possibly at the same time that another worker executes instructions in a parallel strand. Workers are managed by the Intel® Cilk™ Plus runtime system's work stealing scheduler. A worker is implemented as an operating system thread.