Details

The library provides Multinomial Naïve Bayes classifier [Renie03].

Let J be the number of classes, indexed 0,1,…,J-1. The integer-valued feature vector x _i= (x _i1,…,x _ip), i=1,…,n, contains scaled frequencies: the value of x _ik is the number of times the k-th feature is observed in the vector x _i (in terms of the document classification problem, x _ik is the number of occurrences of the word indexed k in the document x _i ). For a given data set (a set of n documents), (x ₁,…,x _n), the problem is to train a Naïve Bayes classifier.

Training Stage

The Training stage involves calculation of these parameters:

where N _jk is the number of occurrences of the feature k in the class j, N _j is the total number of occurrences of all features in the class, the α _k parameter is the imagined number of occurrences of the feature k (for example, α _k=1), and α is the sum of all α _k.
log(p(θ _j)), where p(θ _j) is the prior class estimate.

Prediction Stage

Given a new feature vector x _i, the classifier determines the class the vector belongs to: