Details

The library provides Apriori algorithm for association rule mining [Agrawal94].

Let I = {i₁, i₂, …, i_m} be a set of items (products) and subset T⊂I is a transaction associated with item set I. The association rule has the form: X⇒Y, where X⊂I, Y⊂I, and intersection of X and Y is empty: X∩Y=Ø. The left-hand-side set of items (itemset) X is called antecedent, while the right-hand-side itemset Y is called consequent of the rule.

Let D = {T₁, T₂, …, T_n} be a set of transactions, each associated with item set I. Item subset X⊂I has support s in the transaction set D if s percent of transactions in D contains X.

The association rule X⇒Y in the transaction set D holds with confidence c if c percent of transactions in D that contain X also contains Y. Confidence of the rule can be represented as conditional probability:

confidence(X⇒Y) = support (X∪Y)/support(X).

For a given set of transactions D = {T₁, T₂, …, T_n}, the minimum support s and minimum confidence c discover all item sets X with support greater than s and generate all association rules X⇒Y with confidence greater than c.

Therefore, the association rule discovery is decomposed into two stages: mining (training) and discovery (prediction). The mining stage involves generation of large item sets, that is, the sets that have support greater than the given parameters. At the discovery stage, the algorithm generates association rules using the large item sets identified at the mining stage.