Batch Processing

Algorithm Input

The K-Means clustering algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID	Input
data	Pointer to the `n` x `p` numeric table with the data to be clustered. The input can be an object of any class derived from NumericTable.
inputCentroids	Pointer to the nClusters x `p` numeric table with the initial centroids. The input can be an object of any class derived from NumericTable.

Algorithm Parameters

The K-Means clustering algorithm has the following parameters:

Parameter	Default Value	Description
algorithmFPType	double	The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
method	defaultDense	Available computation methods for K-Means clustering: defaultDense - implementation of Lloyd's algorithm lloydCSR - implementation of Lloyd's algorithm for CSR numeric tables
nClusters	Not applicable	The number of clusters. Required to initialize the algorithm.
maxIterations	Not applicable	The number of iterations. Required to initialize the algorithm.
accuracyThreshold	0.0	The threshold for termination of the algorithm.
gamma	1.0	The weight to be used in distance calculation for binary categorical features.
distanceType	euclidean	The measure of closeness between points (observations) being clustered. The only distance type supported so far is the Euclidian distance.
assignFlag	true	A flag that enables computation of assignments, that is, assigning cluster indices to respective observations.

Algorithm Output

The K-Means clustering algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID	Result
centroids	Pointer to the nClusters x `p` numeric table with the cluster centroids. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.
assignments	Use when assignFlag=true. Pointer to the `n` x 1 numeric table with assignments of cluster indices to feature vectors in the input data. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.
goalFunction	Pointer to the 1 x 1 numeric table with the value of the goal function. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except CSRNumericTable.
nIterations	Pointer to the 1 x 1 numeric table with the actual number of iterations done by the algorithm. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Note

You can skip update of centroids and goalFunction in the result and compute assignments using original inputCentroids. To do this, set assignFlag to true and maxIterations to zero.

Examples

C++:

Java*: