Developer Guide for Intel® Data Analytics Acceleration Library 2016 Update 4
The library provides Multinomial Naïve Bayes classifier [Renie03].
Let J be the number of classes, indexed 0,1,…,J-1. The integer-valued feature vector x i = (x i1,…,x ip ), i=1,…,n, contains scaled frequencies: the value of x ik is the number of times the k-th feature is observed in the vector x i (in terms of the document classification problem, x ik is the number of occurrences of the word indexed k in the document x i ). For a given data set (a set of n documents), (x 1,…,x n ), the problem is to train a Naïve Bayes classifier.
The Training stage involves calculation of these parameters:
where N jk is the number of occurrences of the feature k in the class j, N j is the total number of occurrences of all features in the class, the α k parameter is the imagined number of occurrences of the feature k (for example, α k =1), and α is the sum of all α k .
log(p(θ j )), where p(θ j ) is the prior class estimate.
Given a new feature vector x i , the classifier determines the class the vector belongs to: