Glossary

The Glossary is an alphabetical list of important terms used in this programmer's guide and gives brief explanations and definitions.

atomic

Indivisible. An instruction sequence executed by a strand is atomic if it appears at any moment to any other strand as if either no instructions in the sequence have been executed or all instructions in the sequence have been executed.

chip multiprocessor

A general-purpose multiprocessor implemented as a single multicore chip.

cilk_for

A keyword that indicates a for loop whose iterations can be executed independently in parallel.

cilk_spawn

A keyword that indicates that the named subroutine can execute independently and in parallel with the caller.

cilk_sync

A keyword that indicates that all functions spawned within the current function must complete before statements following the cilk_sync can be executed.

commutative operation

An operation (op), over a type (T), is commutative if a op b = b op a for any two objects, a and b, of type T. Integer addition and set union are commutative, but string concatenation is not.

concurrent agent

A processor, process, thread, or other entity that executes a program instruction potentially simultaneously with other similar concurrent agents.

core

A single processor unit of a multicore chip. The terms "processor" and "CPU" are often used in place of "core", although industry usage varies.

CPU

"Central Processing Unit"; a synonym for "core", or a single processor of a multicore chip.

critical section

The code executed by a strand while holding a lock.

critical-path length

See span.

data race

A race condition that occurs when two or more parallel strands, holding no lock in common, access the same memory location and at least one of the strands performs a write. Compare with determinacy race.

deadlock

A situation when two or more strand instances are each waiting for another to release a resource, and the "waiting-for" relation forms a cycle so that none can ever proceed.

determinacy race

A race condition that occurs when two parallel strands access the same memory location and at least one strand performs a write.

determinism

The property of a program when it behaves identically from run to run when executed on the same inputs. Deterministic programs are usually easier to debug.

distributed memory

Computer storage that is partitioned among several processors. A distributed-memory multiprocessor is a computer in which processors must send messages to remote processors to access data in remote processor memory. Contrast with shared memory.

execution time

How long a program takes to execute on a given computer system. Also called running time.

false sharing

The situation that occurs when two strands access different memory locations residing on the same cache block, thereby contending for the cache block.

global variable

A variable that is bound outside of all local scopes. See also nonlocal variable.

hyperobject

An object which interacts with the Intel® Cilk™ Plus runtime to manage the creation, interaction, and destruction of a set of related data structures as parallel strands are spawned and synced. Hyperobjects include reducers and holders.

instruction

A single operation executed by a processor.

linear speedup

Speedup proportional to the processor count. See also perfect linear speedup.

lock

A synchronization mechanism for providing atomic operation by limiting concurrent access to a resource. Important operations on locks include acquire (lock) and release (unlock). Many locks are implemented as a mutex, whereby only one strand can hold the lock at any time.

lock contention

The situation wherein multiple strands vie for the same lock.

multicore

A semiconductor chip containing more than one processor core.

multiprocessor

A computer containing multiple general-purpose processors.

mutex

A "mutually exclusive" lock that only one strand can acquire at a time, thereby ensuring that only one strand executes the critical section protected by the mutex at a time. Windows* supports several types of locks, including the CRITICAL_SECTION. Linux* supports Pthreads pthread_mutex_t objects.

nondeterminism

The property of a program when it behaves differently from run to run when executed on exactly the same inputs. Nondeterministic programs are usually hard to debug.

nonlocal variable

A program variable that is bound outside of the scope of the function, method, or class in which it is used. In Intel® Cilk™ Plus programs, this term refers to variables with a scope outside a cilk_for loop.

parallel loop

A for loop all of whose iterations can be run independently in parallel. The cilk_for keyword designates a parallel loop.

parallelism

The ratio of work to span, which is the largest speedup an application could possibly attain when run on an infinite number of processors.

pedigree

A sequence of 64-bit integers used to uniquely identify a maximal strand.

perfect linear speedup

Speedup equal to the processor count. See also linear speedup.

process

A concurrent agent that has its own address space and is managed by the operating system. Memory can be shared among processes only through the use of explicit operating system calls.

processor

A processor implements the logic to execute program instructions sequentially; the term "core" is used as a synonym.

race condition

A source of nondeterminism whereby the result of a concurrent computation depends on the timing or relative order of the execution of instructions in each individual strand.

receiver

A variable to receive the result of the function call.

reducer

A hyperobject used in place of an ordinary variable to allow a parallel associative computation such as an arithmetic sum, string concatenation, or list creation without data races.

response time

The time it takes to execute a computation from the time a human user provides an input to the time the user gets the result.

running time

How long a program takes to execute on a given computer system. Also called execution time.

scale down

The ability of a parallel application to run efficiently on one or a small number of processors.

scale out

The ability to run multiple copies of an application efficiently on a large number of processors.

scale up

The ability of a parallel application to run efficiently on a large number of processors. See also linear speedup.

sequential consistency

The memory model for concurrency wherein the effect of concurrent agents is as if their operations on shared memory were interleaved in a global order consistent with the orders in which each agent executed them.

serial execution

Execution of the serialization of an Intel® Cilk™ Plus program.

serial semantics

A deterministic Intel® Cilk™ Plus program will produce the same results as the serialization of the program. This is referred to as “serial semantics.”

serialization

The C/C++ program that results from stubbing out the Intel® Cilk™ Plus keywords of a program, where cilk_spawn and cilk_sync are elided and cilk_for is replaced with an ordinary for. The serialization can be used for debugging and, in the case of a converted C/C++ program, will behave exactly as the original C/C++ program. The term "serial elision" is used in some of the literature.

shared memory

Computer storage that is shared among several processors. A shared-memory multiprocessor is a computer in which each processor can directly address any memory location. Contrast with distributed memory.

span

The theoretically fastest execution time for a parallel program when run on an infinite number of processors, discounting overheads for communication and scheduling. Often denoted by T_∞ in the literature, and sometimes called critical-path length.

spawn

To call a function without waiting for it to return, as in a normal call. The caller can continue to execute in parallel with the called function. See also cilk_spawn.

speedup

How many times faster a program is when run in parallel than when run on one processor. Speedup can be computed by dividing the running time T _P of the program on P processors by its running time T₁ on one processor.

strand

A serial chain of instructions containing no spawns, syncs, returns from spawn, or other parallel control.

sync

To wait for a set of spawned functions to return before proceeding. The current function is dependent upon the spawned functions and cannot proceed in parallel with them. See also cilk_sync.

thread

A concurrent agent that shares an address space with other threads within the same process. Scheduling of threads is typically managed by the operating system.

throughput

A number of operations performed per unit time.

view

The state of a hyperobject as seen by a given strand.

work

The running time of a program when run on one processor, sometimes denoted by T₁.

work stealing

A scheduling strategy where processors post parallel work locally and, when a processor runs out of local work, it steals work from another processor. Work-stealing schedulers are notable for their efficiency, because they incur no communication or synchronization overhead when there is ample parallelism. The Intel® Cilk™ Plus runtime system employs a work-stealing scheduler.

worker

A concurrent agent that executes the instructions in one strand, possibly at the same time that another worker executes instructions in a parallel strand. Workers are managed by the Intel® Cilk™ Plus runtime system's work stealing scheduler. A worker is implemented as an operating system thread.