Distributed Processing

Parameters

Centroid initialization for K-Means clustering in the distributed processing mode has the following parameters:

Parameter	Default Value	Description
computeStep	Not applicable	The parameter required to initialize the algorithm. Can be: step1Local - the first step, performed on local nodes step2Master - the second step, performed on a master node
algorithmFPType	double	The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
method	defaultDense	Available initialization methods for K-Means clustering: defaultDense - uses first nClusters points as initial clusters deterministicCSR - uses first nClusters points as initial clusters for data in a CSR numeric table randomDense - uses random nClusters points as initial clusters randomCSR - uses random nClusters points as initial clusters for data in a CSR numeric table For more details, see the algorithm description.
nClusters	Not applicable	The number of clusters. Required.
nRowsTotal	0	The total number of rows in all input data sets on all nodes. Required in the distributed processing mode.
seed	777	The seed for generating random numbers.

Centroid initialization for K-Means clustering follows the general schema described in Algorithms.

Step 1 - on Local Nodes

In this step, centroid initialization for K-Means clustering accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
data	Pointer to the `n`_i x `p` numeric table that represents the `i`-th data block on the local node. The input can be an object of any class derived from NumericTable.
inputCentroids	Pointer to the `nClusters` x `p` numeric table with the initial cluster centroids. This input can be an object of any class derived from NumericTable.

In this step, centroid initialization for K-Means clustering calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
nPartialClusters	Pointer to the 1 x 1 numeric table that contains the number of clusters computed on the local node. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except CSRNumericTable, PackedTriangularMatrix, and PackedSymmetricMatrix.
partialClusters	Pointer to the nClusters x `p` numeric table with cluster centroids computed on the local node. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 2 - on Master Node

In this step, centroid initialization for K-Means clustering accepts the input from each local node described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
partialResuts	A collection that contains results computed in Step 1 on local nodes (three numeric tables from each local node).

In this step, centroid initialization for K-Means clustering calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
centroids	Pointer to the nClusters x `p` numeric table with cluster centroids. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.