Distributed Processing

This mode assumes that data set is split in nblocks blocks across computation nodes.

PCA computation in the distributed processing mode follows the general schema described in Algorithms.

Algorithm Parameters

The PCA algorithm in the distributed processing mode has the following parameters, depending on the computation method parameter method:

Parameter	method	Default Value	Description
computeStep	defaultDense or svdDense	Not applicable	The parameter required to initialize the algorithm. Can be: step1Local - the first step, performed on local nodes step2Master - the second step, performed on a master node
algorithmFPType	defaultDense or svdDense	double	The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
method	Not applicable	defaultDense	Available methods for PCA computation: defaultDense - the correlation method svdDense - the SVD method
covariance	defaultDense	`SharedPtr<covariance::Distributed <computeStep, algorithmFPType, covariance::defaultDense> >`	The correlation and variance-covariance matrices algorithm to be used for PCA computations with the correlation method. For details, see Correlation and Variance-covariance Matrices. Distributed Processing.

Correlation Method (defaultDense)

Use the following two-step schema:

Step 1 - on Local Nodes

In this step, the PCA algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
data	Pointer to the `n`_i x `p` numeric table that represents the `i`-th data block on the local node. The input can be an object of any class derived from NumericTable.

In this step, PCA calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
nObservationsCorrelation	Pointer to the 1 x 1 numeric table with the number of observations processed so far on the local node. By default, this result is an object of the HomogenNumericTable class, but you can define it as an object of any class derived from NumericTable except CSRNumericTable.
crossProductCorrelation	Pointer to the `p` x `p` numeric table with the cross-product matrix computed so far on the local node. By default, this table is an object of the HomogenNumericTable class, but you can define it as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.
sumCorrelation	Pointer to the 1 x `p` numeric table with partial sums computed so far on the local node. By default, this table is an object of the HomogenNumericTable class, but you can define it as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Step 2 - on Master Node

In this step, the PCA algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
partialResults	A collection that contains results computed in Step 1 on local nodes (nObservationsCorrelation, crossProductCorrelation, and sumCorrelation). The collection can contain objects of any class derived from NumericTable except the PackedSymmetricMatrix and PackedTriangularMatrix.

In this step, PCA calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
eigenvalues	Pointer to the 1 x `p` numeric table that contains eigenvalues in the descending order. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.
eigenvectors	Pointer to the `p` x `p` numeric table that contains eigenvectors in the row-major order. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples

C++:

Java*:

SVD Method (svdDense)

Use the following two-step schema:

Step 1 - on Local Nodes

In this step, the PCA algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
data	Pointer to the `n`_i x `p` numeric table that represents the `i`-th data block on the local node. The input can be an object of any class derived from NumericTable.

In this step, PCA calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
nObservationsSVD	Pointer to the 1 x 1 numeric table with the number of observations processed so far on the local node. By default, this result is an object of the HomogenNumericTable class, but you can define it as an object of any class derived from NumericTable except CSRNumericTable.
sumSVD	Pointer to the 1 x `p` numeric table with partial sums computed so far on the local node. By default, this table is an object of the HomogenNumericTable class, but you can define it as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.
sumSquaresSVD	Pointer to the 1 x `p` numeric table with partial sums of squares computed so far on the local node. By default, this table is an object of the HomogenNumericTable class, but you can define it as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.
auxiliaryDataSVD	A collection of numeric tables each with the partial result to transmit to the master node for Step 2. The collection can contain objects of any class derived from NumericTable except the PackedSymmetricMatrix and PackedTriangularMatrix.

Step 2 - on Master Node

In this step, the PCA algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
partialResults	A collection that contains results computed in Step 1 on local nodes (nObservationsSVD, sumSVD, sumSquaresSVD, and auxiliaryDataSVD). The collection can contain objects of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix.

In this step, PCA calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
eigenvalues	Pointer to the 1 x `p` numeric table that contains eigenvalues in the descending order. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.
eigenvectors	Pointer to the `p` x `p` numeric table that contains eigenvectors in the row-major order. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples

C++: pca_svd_dense_distributed.cpp

Java*: PCASVDDenseDistributed.java