Developer Guide for Intel® Data Analytics Acceleration Library 2016 Update 4

Distributed Processing

This mode assumes that the data set is split into nblocks blocks across computation nodes.

Parameters

Centroid initialization for K-Means clustering in the distributed processing mode has the following parameters:

Parameter

Default Value

Description

computeStep

Not applicable

The parameter required to initialize the algorithm. Can be:

  • step1Local - the first step, performed on local nodes
  • step2Master - the second step, performed on a master node

algorithmFPType

double

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available initialization methods for K-Means clustering:

  • defaultDense - uses first nClusters points as initial clusters

  • deterministicCSR - uses first nClusters points as initial clusters for data in a CSR numeric table

  • randomDense - uses random nClusters points as initial clusters

  • randomCSR - uses random nClusters points as initial clusters for data in a CSR numeric table

For more details, see the algorithm description.

nClusters

Not applicable

The number of clusters. Required.

nRowsTotal

0

The total number of rows in all input data sets on all nodes. Required in the distributed processing mode.

seed

777

The seed for generating random numbers.

Centroid initialization for K-Means clustering follows the general schema described in Algorithms.

Step 1 - on Local Nodes

In this step, centroid initialization for K-Means clustering accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID

Input

data

Pointer to the ni x p numeric table that represents the i-th data block on the local node. The input can be an object of any class derived from NumericTable.

inputCentroids

Pointer to the nClusters x p numeric table with the initial cluster centroids. This input can be an object of any class derived from NumericTable.

In this step, centroid initialization for K-Means clustering calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID

Result

nPartialClusters

Pointer to the 1 x 1 numeric table that contains the number of clusters computed on the local node. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except CSRNumericTable, PackedTriangularMatrix, and PackedSymmetricMatrix.

partialClusters

Pointer to the nClusters x p numeric table with cluster centroids computed on the local node. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 2 - on Master Node

In this step, centroid initialization for K-Means clustering accepts the input from each local node described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID

Input

partialResuts

A collection that contains results computed in Step 1 on local nodes (three numeric tables from each local node).

In this step, centroid initialization for K-Means clustering calculates the results described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID

Result

centroids

Pointer to the nClusters x p numeric table with cluster centroids. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.