Using Coarrays

Coarrays are not supported on OS X* systems.

Coarrays, a data sharing concept standardized in Fortran 2008, enable parallel processing using multiple copies of a single program. Each copy, called an image, has ordinary local variables and also shared variables called coarrays or covariables. A covariable, which can be either an array or a scalar, is a variable whose storage spans across all the images in the program. In this Partitioned Global Address Space (PGAS) model, each image can access its own piece of a covariable as a local variable and can access those pieces that live on other images using coindices, which are enclosed in square brackets.

Intel® Fortran supports coarray programs that run using shared memory on a multicore or multiprocessor system. With an optional license, coarray programs can also run using distributed memory across a Linux* or Windows* cluster. They can also run on Linux* systems using Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Please refer to the product system requirements in the Release Notes for further details.

For more information on how to write programs using coarrays, see books on the Fortran 2008 language or the ISO Fortran 2008 standard.

Using Coarray Program Syntax

The additional syntax required by coarrays includes:

CODIMENSION attribute and "[cobounds]" to declare an object a coarray (covariable)
[coindices] notation to reference covariables on other images
SYNC ALL, SYNC IMAGES, and SYNC MEMORY statements to provide points where images must communicate to synchronize shared data
CRITICAL and END CRITICAL statements to form a block of code executed by one image at a time
LOCK and UNLOCK statements to control objects called locks, used to synchronize actions on specific images
ERROR STOP statement to end all images
ALLOCATE and DEALLOCATE statements may specify coarrays
Intrinsic procedures IMAGE_INDEX, LCOBOUND, NUM_IMAGES, THIS_IMAGE, and UCOBOUND
ATOMIC_DEFINE and ATOMIC_REF for defining and referencing an atomic variable

Using the Coarray Compiler Options

You must use the [Q]coarray compiler option to enable the compiler to recognize coarray syntax. If you do not use this compiler option, a program that uses coarray syntax or features produces a compile-time error.

In the list that follows, only one option is valid on the command line; if multiple coarray compiler options are specified, the last one specified is used. An exception to this rule is the [Q]coarray compiler option using the single keyword; if specified, this option takes precedence regardless of where it appears on the command line.

Using [Q]coarray with no keyword is equivalent to running on one node (shared memory).
Using [Q]coarray with the shared keyword causes the underlying Intel® Message Passing Interface (MPI) parallelization to run on one node with multiple cores or processors with shared memory.
Using [Q]coarray with the distributed keyword requires an Intel® Cluster Toolkit license to be installed and causes the underlying Intel® MPI Library parallelization to run in a multi-node environment (multiple CPUs with distributed memory).
Using -coarray=coprocessor (Linux*) specifies a configuration where the first images runs on the host and other images run on the host or the target. A configuration file specifies your exact configuration and determines where the images will be run. Use the -coarray-config-file=filename compiler option to specify a configuration file.
Using [Q]coarray with the single keyword creates an executable that will not be replicated, resulting in a single running image. (This is in contrast to the self-replicating behavior that occurs when any other coarray keyword is specified.) This option is useful for debugging purposes.

No special procedure is necessary to run a program that uses coarrays; you simply run the executable file. The underlying parallelization implementation uses the Intel® MPI Library. Installation of the compiler automatically installs the necessary Intel® MPI run-time libraries to run on shared memory. The Intel® Cluster Toolkit installs the necessary Intel® MPI Library run-time libraries to run on distributed memory. Use of coarray applications with any other MPI implementation, or with OpenMP*, is not supported.

By default, the number of images created is equal to the number of execution units on the current system. You can override this by specifying a number using the [Q]coarray-num-images compiler option on the ifort command line that compiles the main program. You can also specify the number of images at execution time in the environment variable FOR_COARRAY_NUM_IMAGES.

Using a Configuration File

Use of the config-file option is appropriate only in a limited number of cases:

As mentioned previously, you need to specify a configuration file when using the -coarray=coprocessor (Linux*) compiler option syntax.
You can take advantage of Intel® MPI Library features in the coarray environment. To do so, specify the command line segments used by "mpiexec -config filename" in a file named filename and pass that file name to the Intel® MPI Library using the /[Q]coarray-config-file: compiler option. If the [Q]coarray-num-images compiler option also appears on the command line, it will be overridden by what is in the configuration file. Rules for using an MPI configuration files are as follows:
- The format of a configuration file is described in the Intel® MPI Library documentation; you will need to add the MPI option "-genv FOR_ICAF_STATUS launched" in the configuration file in order for coarrays to work on multi-node (distributed memory) systems.
- You can also set the environment variable FOR_COARRAY_CONFIG_FILE to be the filename and path of the Intel® MPI Library configuration file you want to use at execution time.

Examples on Windows*:

/Qcoarray:shared /Qcoarray-num-images:8 runs a coarray program on shared memory using 8 images.
/Qcoarray:shared /Qcoarray-config-file:filename runs a coarray program on shared memory using the MPI configuration detailed in filename.
/Qcoarray:distributed /Qcoarray-config-file:filename runs a coarray program on distributed memory using the Intel® MPI Library configuration detailed in filename (the Intel® Cluster Toolkit license must be installed).

Examples on Linux*:

-coarray=shared -coarray-num-images=8 runs a coarray program on shared memory using 8 images.
-coarray=distributed -coarray-num-images=8 runs a coarray program on distributed memory across 8 images (the Intel® Cluster Toolkit license must be installed).
-coarray=coprocessor -coarray-config-file=filename runs a coarray program on a configuration where the first images runs on the host processor and other images run on some combination of host processors or coprocessors based on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture). This command line specifies use of the Intel® MPI Library configuration detailed in filename.

Considerations When Running Coprocessors based on the Intel® MIC Architecture

The compiler installation provides the necessary Intel® MPI run-time libraries and binaries on the coprocessor's file system, but these must be manually copied to the card's file system whenever the card is rebooted.

Move to the mpirt directory and copy the Intel® MPI files as shown:

sudo scp lib/mic/libmpi_mt.so mic0:/lib64/libmpi_mt.so

sudo scp bin/mic/mpiexec.hydra mic0:/bin/mpiexec.hydra

sudo scp bin/mic/pmi_proxy mic0:/bin/pmi_proxy

From that same directory, the Intel® Fortran run-time libraries for coarray support should also be copied to the card's file system:

sudo scp lib/mic/libmpi_mt.so mic0:/lib64/libmpi_mt.so

sudo scp bin/mic/mpiexec.hydra mic0:/bin/mpiexec.hydra

sudo scp bin/mic/pmi_proxy mic0:/bin/pmi_proxy

Sample Coarray Configurations When Running on the Coprocessor

Following are several sample configurations.

Sample Configuration: Images 1-N on coprocessors based on Intel® MIC Architecture

In this configuration, the coarray program runs all its images only on the coprocessor(s).

For this configuration, you would specify the following:

ifort -coarray -mmic myProg.f90 -o myProg.mic

There are two ways myProg.mic can be run.

You can use the micnativeloadex tool, as follows:

/opt/intel/mic/coi/tools/micnativeloadex/release/micnativeloadex myProg.mic

For more information on micnativeloadex, refer to the Intel® Many Integrated Core Platform Software Stack documentation.

The second way to run myProg.mic is to copy it to the card's file system, and run it from the card itself.

This configuration does not require -coarray=coprocessor at compile time.

Sample Configuration: Image 1 on host; images 2-N on coprocessors based on Intel® MIC Architecture

In this configuration, the coarray program runs some of its images on the host and some of its images on coprocessors.

For this configuration, you would specify the following:

ifort -coarray=coprocessor -coarray-config-file=MixedPlatform.conf myProg.f90 -o myProg

This creates two executables; myProg and myProgMIC. Copy myProgMIC to the card's file system, and specify that location in the configuration file.

The MixedPlatform.conf file would contain the following:

-n 1 -host <hostid-of-CPU> -genv FOR_ICAF_STATUS=T ./myProg : \

-n 4 -host <hostid-of-mic> genv FOR_ICAF_STATUS=T /home/mydir/myProgMIC

Sample Configuration: Image 1 on host processor; images 2-N spread on host processor, coprocessor(s)

This is a variant of the previous configuration. Here, the configuration file specifies additional images on either the host processor or coprocessor. The -n X option is used to specify the number of images associated with each type of host. For example, to run 4 images on the CPU and 2 images each on four coprocessors, the configuration file would contain the following:

-n 4 -host MYCPU -genv FOR_ICAF_STATUS=true ./myProg : \
-n 2 -host CARD1 -genv FOR_ICAF_STATUS=true /home/mydir/myProgMIC :\
-n 2 -host CARD2 -genv FOR_ICAF_STATUS=true /home/mydir/myProgMIC :\
-n 2 -host CARD3 -genv FOR_ICAF_STATUS=true /home/mydir/myProgMIC :\
-n 2 -host CARD4 -genv FOR_ICAF_STATUS=true /home/mydir/myProgMIC

Given the configuration file listed above, myProg should expect num_images() to be 12.

Sample Configuration: Images 1-N on host processor, with offloaded code on coprocessor

In this configuration, all coarray images run on the host processor, but each image has offloaded code that is running on the coprocessors.

For this configuration, you would specify the following:

ifort -coarray -coarray-num-images=X myProg.f90 -o myProg

For this configuration, your code must contain offload regions, marked by directives, such as:

!DIR$ OFFLOAD BEGIN
 <user code>
 !DIR$ END OFFLOAD

There are further restrictions on your code:

A coindexed object may not be used within an offload region
An image control statement may not be used within an offload region
A coindexed object may not be used within an OpenMP* region
An image control statement may not be used within an OpenMP region

Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Using Coarrays

See Also