Building an associative classifier with multiple minimum supports

Classification is one of the most important technologies used in data mining. Researchers have recently proposed several classification techniques based on the concept of association rules (also known as CBA-based methods). Experimental evaluations on these studies show that in average the CBA-based approaches can yield higher accuracy than some of conventional classification methods. However, conventional CBA-based methods adopt a single threshold of minimum support for all items, resulting in the rare item problem. In other words, the classification rules will only contain frequent items if minimum support (minsup) is set as high or any combinations of items are discovered as frequent if minsup is set as low. To solve this problem, this paper proposes a novel CBA-based method called MMSCBA, which considers the concept of multiple minimum supports (MMSs). Based on MMSs, different classification rules appear in the corresponding minsups. Several experiments were conducted with six real-world datasets selected from the UCI Machine Learning Repository. The results show that MMSCBA achieves higher accuracy than conventional CBA methods, especially when the dataset contains rare items.

have developed many classification techniques, which can be categorized as rule-based or non-rule-based approaches. The rule-based approaches, such as decision tree (Quinlan 1993), RIPPER (Cohen 1995), PART (Witten et al. 2011), and classification based on associations (CBA) (Liu et al. 1998(Liu et al. , 2000, are typically interpretative and easy to implement. On the other hand, non-rule-based approaches, such as support vector machine (SVM) (Vapnik 1999) and artificial neural network (ANN) (Venkatesh and Thangaraj 2008), have a high noise tolerance but require extensive computation.
Researchers have recently proposed several CBA-based methods, including CMAR (Li et al. 2001), CPAR (Yin and Han 2003), MCAR (Thabtah et al. 2005), CBC (Deng et al. 2014), and MMAC (Thabtah et al. 2004). Experimental studies on these methods show that CBA-based approaches can yield higher accuracy than conventional classification methods. Most CBA-based methods (Li et al. 2001;Thabtah et al. 2004Thabtah et al. , 2005) adopt a rule selection or pruning techniques to build accurate classifiers by retaining limited but effective rules. These methods typically adopt a single threshold of minimum support for all items (i.e., "item" refers to an attribute-name associated with a valid attribute-value), class labels, and itemsets. However, a single minimum support restricts the applicability of current CBA-based methods. Different items of each rule or class label will likely have different levels of importance. For example, some items or class labels may appear frequently in the database, while others may appear rarely. If the minimum support value is set at a high threshold, few items can satisfy this requirement, and rules with rare items cannot be found. To find rules with rare items, the minimum support value must be set relatively low. However, a lower value of minimum support requires extensive computation because the number of combinatorial itemsets increases exponentially; in addition, most of these itemsets are meaningless.
In the past, the class imbalance problem (Guo et al. 2008) has been addressed. In this case, the distribution of class labels is skewed, and thus, the classifiers have poor performance on rare classes. To solve the class imbalance problem, the CBA-based methods (Liu et al. 2000;Janssens et al. 2005) have applied the concept of multiple minimum supports (MMSs) to differentiate class labels. That is, a different minimum class support is assigned for each class label. It is worth noting that the above works (Liu et al. 2000;Janssens et al. 2005) only focus on the consideration of MMSs for different class labels. However, to the best of our knowledge, previous studies in classification have not integrated the MMSs into various items. Research in association rule mining has shown that the rare item problem (Liu et al. 1999) produces poor-quality rules. Because the selection of a proper set of classification rules is the primary factor in determining the effectiveness of associative classifiers, it is indispensable to address the rare item problem in CBA-based methods.
In this paper, a new approach for classification with MMSs is proposed to tackle all items and class labels in CBA rule generation. The proposed approach provides a userdefined minimum support for each item and each class label. Because different classification rules appear in the corresponding minimum supports, an algorithm based on the established Multiple Support Apriori (MSapriori) algorithm (Thabtah 2007), called MMSCBA, is proposed to discover a complete set of classification rules with MMSs to build MMSCBA-based classifiers, four methods for classification rule selection are considered. Several experiments were conducted with six real-world datasets from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) to evaluate the performance of these classifiers.
The remainder of this paper is organized as follows. "Related work" section presents related research. "Problem definition" section presents the research problem. "The MMSCBA algorithm" section presents the proposed method. "Experimental evaluation" section presents analysis and discussion. Finally, "Conclusion" section presents the conclusion.

Associative classification
Many studies have shown that associative classification (AC) achieves greater accuracy than other traditional approaches. Several AC-based studies have recently presented classification based on association (CBA) (Liu et al. 1998), classification based on multiple association rules (CMAR) (Li et al. 2001), and classification based on predictive association rules (CPAR) (Yin and Han 2003). An AC-based approach typically consists of three phases: rule generation, rule pruning, and classification.
In the early stage, the CBA approach applies the concept of association rule classification. In CBA, the system initially executes the Apriori algorithm to progressively generate association rules that are satisfied with a user-defined minimum support and confidence threshold. One subset of the generated classification rules becomes the final classifier.
Similarly to CBA, the CMAR approach adopts the FP-Growth algorithm (Guo et al. 2008) to generate frequent itemsets. The subset of matching rules is then used to classify a test object instead of one rule, and this, in turn, improves accuracy. The CMAR approach generates and evaluates rules similarly to CBA; however, CMAR uses a more efficient FP-tree structure. In addition, the CMAR approach considers multiple rules in predicting associated weights. Therefore, CMAR yields higher accuracy than CBA.
Both CBA and CMAR incur a high computation cost in rule generation and rule selection if the dataset is large. To avoid a high computation cost, the CPAR (Yin and Han 2003) approach generates a small set of predictive rules directly from the dataset based on rule prediction and coverage analysis instead of generating candidate rules. The core of CPAR is its predictive rule mining capability, in which an object is correctly covered by a rule instead of being removed. The weight of this object is decreased by multiplying a factor. This is essentially a greedy approach to rule generation and is more efficient than generating all candidate rules. The CPAR approach also uses a dynamic programming approach to avoid repeating calculations during rule generation, allowing it to propose the best k rules in prediction. Previous studies have provided more complete surveys of associative classification (Thabtah 2006(Thabtah , 2007Deen et al. 2010;Swami and Jain 2005).

Multiple minimum supports
Mining frequent patterns with a single minimum support (abbreviated as minsup) implicitly assumes that every item has the same property (i.e., frequency). If the minsup value is high, the rules involving rare items will not be found. Conversely, if the minsup value is low, a large number of meaningless rules will be generated. The MSapriori (Liu et al. 1999) approach has been proposed to extract frequent rules with rare items. In MSapriori, users are able to discover rare item rules without using frequent items to generate vast numbers of meaningless rules. Based on the definition in (Liu et al. 1999), each item in the database has a minsup that is expressed as minimum item support (MIS), and users can specify different values of MIS for different items. This approach makes it possible to observe the nature of the items and their frequencies. The definition of MIS is given as follows.
Definition 1 Let I = {i 1 , i 2 , …, i n } be a set of items, and let MIS(i p ) denote the MIS value of item i p (i p ∈ I). The MIS value of itemset A = {i 1 , i 2 , …, i k }(1 ≦ k ≦ n) is defined as follows (Liu et al. 1999).
Example 1 Consider a database including three items: Milk, Granola, and Beer. The user-defined MIS values are described as follows: If the support of itemset {Milk, Granola} is 0.7 %, then itemset {Milk, Granola} is infrequent because the MIS value of itemset {Milk, Granola} is equal to min[MIS(Milk), MIS(Granola)] = 1 %, which is larger than 0.7 %.
In conventional frequent pattern mining, the complete set of frequent patterns satisfies the downward closure property if there is only one minsup. That is, if an itemset is frequent, then all its subsets are also frequent. However, in the case of MMSs, the downward closure property does not hold; that is, certain subsets of a frequent itemset are not frequent and their support values are indeterminate.
Example 2 Continuing Example 1, the itemset {Milk, Granola} is infrequent because the support of itemset {Milk, Granola} is 0.7 %. If the support of itemset {Milk, Granola, Beer} is 0.5 %, then itemset {Milk, Granola, Beer} is frequent because MIS(Beer) is only 0.5 %. Clearly, the subset of the frequent itemset is not frequent.
To solve this problem, the sorted closure property is proposed in (Liu et al. 1999). Suppose that all items in an itemset are sorted in ascending order according to their MIS values. The MIS value of any superset of an itemset is equal to that of the first item in this itemset. If an itemset is infrequent based on the MIS value of its first item (i.e., the smallest MIS value among all items in this itemset), then none of its supersets will be frequent. Based on the above property, MSapriori (Liu et al. 1999) can decrease the search space to discover all frequent itemsets with MMSs. Specifically, MSapriori presorts all items according to their MIS values but modifies the procedure of generating candidate sets. Because the supports of certain subsets are indeterminate, MSapriori requires postprocessing to compute the supports of all subsets of frequent itemsets.
Several extensions of the MSapriori algorithm have been proposed. Hu and Chen (2006) proposed a new data structure, MIS-tree, to enhance the efficiency of MSapriori and to discover frequent patterns with MMSs. The procedure for constructing MIS(Milk) = 3 %, MIS(Granola) = 1 %, MIS(Beer) = 0.5 % the MIS-tree only scans a database once. Kiran and Reddy (2010) also proposed an enhanced method. They designed a new method of calculating the MIS value called support difference. Second, they proposed an FP-growth-like algorithm to extract rare frequent patterns. Finally, they used an evaluation scheme called "item-to-pattern difference" to adjust the distortion if the frequency between each item varies widely. Lee et al. (2005) considered a new perspective on minimum supports. They proposed the concept of maximum constraint, which provides a thorough explanation for certain domains. They also adopted the Apriori-based algorithm to discover large itemsets and association rules within the constraint. Chen et al. (2009) also proposed a fuzzy-based approach called the divide-and-conquer genetic-fuzzy mining algorithm for items with MMSs (DGFMMS). The DGFMMS is designed to find minimum supports, membership functions, and fuzzy association rules.

Problem definition
An event e is a non-empty set of items, and each item in e follows a different attribute-name. Let Y be a set of class labels. A rule-item r is of the form: r = {e, y}, where y is a class label and y ∈ Y .
Definition 3 A database D consists of a set of records (id, γ), where γ is a rule-item and id is the identifier of this rule-item. Given a rule-item β = (i define the event support count e_supp, the class support count y_supp and the rule-item support count r_supp as: Table 1 shows all attribute-values for each attribute and the complete set of items. Table 2 shows the sample database D. Given a rule-item β = {(a, 1)(d, 2)(e, 1), y 1 }, the event support count of β in D, e_supp D (β), is 4 (see sid 1, 2, 4, and 5); the class support count of β in D, y_supp D (β), is 3 (see sid 1, 4, and 5); and the rule-item support count of β in D, r_supp D (β), is 3 (see sid 1, 4, and 5).
As discussed previously, a single minimum support is inapplicable to real-life cases because of the rare item problem. In this paper, the concept of MMSs is introduced, where a user specifies the minimum support threshold of each item. By using differing minimum item supports for the respective items, users can effectively determine the support requirements for different items. The property of MMSs allows higher minimum supports for the rule-items that only involve frequent items and lower minimum supports for the rule-items that contain rare items.

Table 1 The attribute-name and attribute-values
In summary, this approach discovers all frequent rule-items that are satisfied with their own MRS. Next, an associative classifier can be built based on the set of all frequent rule-items. For example, a frequent rule-item β = (i which the support and confidence are equal to r_supp D (β) and r_supp D (β) e_supp D (β) , respectively.

The MMSCBA algorithm
The process of discovering a complete set of frequent rule-items is illustrated in Fig. 1. Initially, scan the complete database D once and count the supports of each item. Given the lowest minimum rule-item support MRS all , prune the items not satisfying MRS all and then form a pruned database D' in which the rule-items are sorted by MIS and MCS in ascending order. Then, divide D' into partitions, denoted as D ′ y , where each class label y satisfies MCS. For each partition D ′ y , the Multiple supports-Classification Based on Associations (MMSCBA) algorithm is performed to find frequent rule-items. Next, collect all frequent rule-items and their r_supp D from each partition. Because the database D' is divided into separate partitions, scan the entire database to calculate the e_supp D of the frequent rule-items found in each partition. Finally, all frequent rule-items with their e_supp D and r_supp D become classification rules, forming the proposed classifier.
The following subsections depict the MMSCBA algorithm and the scoring approaches for class label prediction.
In Line 1, scan the database D ′ y to obtain the support count of each item i, denoted as r_ supp(i). In Line 2, compare r_supp(i) with the value of MRS(i) to determine whether the item i is frequent. Each rule-item with an r_supp(i) value greater than or equal to MRS(i) is inserted into frequent 1-rule-item set L 1 . From Lines 3 to 6, use L k−1 to generate C k . By calling Candidate-Gen-C 2 (L 1 ), use L 1 to generate all 2-candidate-rule-items to form C 2 .  Fig. 1 The frequent rule-item generation process Similarly, use Candidate-Gen-C k (L k−1 ) (k > 2) to generate all k-candidate-rule-items C k from L k−1 . "Candidate-rule-item generation" section details the procedure of candidate rule-item generation. After generating the set of candidate-rule-items, Line 7 scans D ′ y to obtain the support count of each candidate-rule-item c, r_supp(c). From Lines 8 to 9, use the check-MRS(c) function to obtain the minimum support of c, denoted as MRS(c). Then, in Lines 10 and 11, the candidate-rule-item c with r_supp(c) ≥ MRS(c) is inserted into L k . At the end of this stage, we can identify all frequent rule-items from D ′ y .

Candidate-rule-item generation
From the overview in "The MMSCBA algorithm" section, we can see that the basic concept of the MMSCBA algorithm is similar to the traditional Apriori algorithm (Agrawal et al. 1993). There exists, however, a significant difference between our candidate generation functions and the traditional ones. The main reason for this is that we consider the concept of multiple minimum supports, and the downward closure property no longer holds in our approach. In other words, sub-rule-items of a frequent rule-item may not be frequent because the supports of a frequent rule-item and its sub-rule-items may differ. Therefore, to generate a complete set of candidate-rule-items, this study proposes two new candidate generation methods, Candidate-Gen-C 2 and Candidate-Gen-C k , which are based on the definition of MMSs. Figure 3 presents the function Candidate-Gen-C 2 (L 1 ). Use L 1 to generate C 2 in D ′ y . In L 1 , each two frequent 1-rule-items are joined to form a 2-candidate-rule-item. For example, two frequent 1-rule-items (i 1 , y 1 ) and (i 2 , y 1 ) can be joined as a 2-candidate-ruleitem, {(i 1 , i 2 ), y 1 }. Because all rule-items in D ′ y have the same class label, we can ignore Fig. 2 The MMSCBA algorithm the class label and only consider the events in two frequent 1-rule-items in the candidate generation process. Note that the attribute-names of i 1 and i 2 cannot be the same (i.e., i 1 .attribute-name ≠ i 2 .attribute-name), and all items in a candidate are sorted in increasing order of their MIS values.
As Fig. 4 shows, the function Candidate-Gen-C k (L k−1 ) uses L k−1 to generate C k . Given two (k−1)-rule-items p and q, two k-candidate-rule-items (k > 2) can be generated if the following two conditions are satisfied: (1) the first (k − 2) items of both p and q are the same; (2) the attribute-names of the last items in p and q are the same. Figure 5 shows two possible k-candidate-rule-items generated by the function Candidate-Gen-C k (L k−1 ). Note that if the MIS(p.item k−1 ) ≥ MIS(q.item k−1 ) then the k-candidate-rule-item cd 1 is generated; otherwise, cd 2 is generated.
Example 7 Continuing Example 4, consider two frequent 4-rule-items with class label y 2 , where d 1 = {(i 11 )(i 1 )(i 4 )(i 2 ), y 2 } and d 2 = {(i 11 )(i 1 )(i 4 )(i 7 ), y 2 }. Join the two 4-rule-items to form a new 5-candidate-rule-item in which the first three items in d 1 are identical to those in d 2 , but their last items are different. Because MIS(i 7 ) = 3, which is larger than Fig. 3 The function Candidate-Gen-C 2 (L 1 ) Fig. 4 The function Candidate-Gen-C k (L k−1 )
It is essential that the complete set of frequent patterns can be discovered through the algorithm. Because MMSCBA adopts the candidate-generation-and-test approach to discover all frequent rule-items, the completeness of the candidate generation method needs to be clarified.
Because our approach considers the concept of MMSs, all frequent rule-items must satisfy the sorted closure property. That is, any sub-rule-item β of a frequent rule-item α is also a frequent rule-item if MRS(β) = MRS(α). If r_supp(α) ≧ MRS(α), then r_supp(β) also satisfy MRS(α) = MRS(β), i.e., β is also a frequent rule-item. This property ensures that our candidate-generation-and-test method is feasible because all possible k-candidate-rule-items can be generated from their (k − 1)-sub-rule-items.

Predicting class label based on classification rules
After generating all classification rules, use them to classify uncertain objects in a testing dataset. The prediction of the class labels in associative classification can be categorized into two main approaches: prediction based on the highest precedence single rule-item and prediction based on multiple rule-items. In this study, four prediction measurements are considered: Maximum likelihood (Liu et al. 1998;Thabtah et al. 2005), Max χ 2 (Li et al. 2001), Laplace (Yin and Han 2003), and Scoring (Hu et al. 2007).

Maximum likelihood
Given a testing data object α and a set of classification rules, the maximum likelihood approach only considers the highest precedence rule that matches α. If there is no applicable rule to match α, then the default class label is assigned to α. Several associative classification algorithms (Liu et al. 1998;Thabtah et al. 2005) have adopted the maximum likelihood approach for class label prediction.

Max χ 2
Instead of considering a single rule in class label prediction (i.e., Maximum likelihood), the CMAR algorithm (Li et al. 2001) exploits a prediction method that selects a subset of high-confidence rules that are applicable to a class label. The prediction is made by analyzing the correlation among the rules. The correlation is measured using weighted χ 2 analysis to examine the strength of a rule-item based on its support and class frequency in the set of rule-items.
Following Definition 3, the weighted χ 2 of a rule-item, denoted as Max χ 2 , is defined as follows:

Laplace
Laplace accuracy (Quinlan 1986) is used to estimate the expected accuracy of a rule item. Given a rule-item r, Laplace accuracy can be defined as follows.
where |Y| is the number of classes.
To classify a data object, this approach first identifies all matching rule-items and groups them by class labels. For each rule set (i.e., rules having the same class label), the best k rules are chosen and then used to calculate the average Laplace accuracy of a class label. Finally, the class label with the highest average Laplace accuracy will be selected as the final prediction outcome. Hu et al. (2007) proposes a scoring method to calculate the score of each class label based on all matching rules. Given a frequent rule-item r, the two scoring functions are described as follows:

Scoring
The whole procedure of the scoring method can be stated as follows. Given a testing data object α, we first identify the complete set of classification rules satisfying α, meaning that the event part of a rule-item is a subset of α. Next, we divide these rules into sets according to their class labels. The WeightedSupport and WeightedConfidence of a rule set can be accumulated by summing the score of each rule-item in the set. The class label with the highest WeightedConfidence value is selected as the prediction label. If there is more than one class label with the highest WeightedConfidence value, then we compare their WeightedSupport and choose the class label with the highest value of WeightedSupport as the prediction label. .

Data collection and experimental setup
Six real-world datasets are selected from the UCI machine learning repository website (http://archive.ics.uci.edu/ml/). Table 3 provides a description of these datasets. The experiments were run on a Windows 7 PC equipped with a Intel core i5-4570 3.2 GHz processor and 16 GB of RAM. The proposed methods were implemented using the JAVA language. Several well-known classification techniques were also considered in experimental evaluations, including C4.5, SVM, PART, ANN, RIPPER, and traditional CBA. Among them, C4.5 (Quinlan 1986(Quinlan , 1993, SVM, PART, ANN, and RIPPER were performed using WEKA 3.6.10 (www.cs.waikato.ac.nz/ml/weka) (Witten et al. 2011), a popular suite of machine learning software; the CBA algorithm was performed by adopting its implementation version in (Liu et al. 1998). In all experiments, ten-fold crossvalidation (Burman 1989) was adopted to estimate the performance of the proposed method. The accuracy, defined as the proportion of the true results (i.e., both truth positive and truth negative) among the total number of samples examined, was used as the metric to measure the performance of the algorithms.
To easily generate MIS values on each item in MMSCBA, we adopted the method proposed in Thabtah (2007), which considers the actual frequencies of items as the basis for MIS value assignment. The equations are stated as follows: where f(i p ) represents the number of times item i p (i p ∈ I) occurs in the database, and MRS all denotes the smallest MIS value among all items. σ (0 ≦ σ ≦ 1) can be used to control the effect of the MIS value in the mining process. In the experiments, we modified the σ value from 0 to 1. If σ is set to 0, all items will have identical MIS values (i.e., MRS all ) and will produce the same results as traditional association rule mining. If σ is set to 1 and M(i p ) ≥ MRS all , f (i p ) is the MIS value for i p .

Results
For every dataset, the value of minsup is set as follows: (1) 0.2 ≤ minsup ≤ 0.4 for datasets BS, BC, and BCW; (2) 0.1 ≤ minsup ≤ 0.3 for datasets M2 and TF; and (3) 0.005 ≤ minsup ≤ 0.007 for the dataset WF.  Table 4 presents the classification results of the BS dataset using the MMSCBA with four rule selection methods. The best accuracy of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ 2 are 0.748, 0.593, 0.708, and 0.384, respectively. MMSCBA with maximum likelihood performs the best compared with the other three classification rule selection methods reported above. MMSCBA with Max χ 2 has the worst performance.
For the BC dataset, the results in Table 5 show that the best accuracy of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ 2 at 0.705, 0.595, 0.706, and 0.624, respectively. MMSCBA with maximum likelihood and scoring methods performs the best compared with the other two methods. MMSCBA with the Laplace method has the worst performance.
For the BCW dataset, the results in Table 6 show that the best accuracies of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ 2 were 0.963, 0.950, 0.770, and 0.818, respectively. MMSCBA with the maximum likelihood method performs the best compared with the other three methods. MMSCBA with the scoring method has the worst performance.
For the M2 dataset, the results in Table 7 show that the best accuracies of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ 2 are 0.657, 0.629, 0.672, and 0.604, respectively. MMSCBA with the scoring method performs the best compared with the other three methods. MMSCBA with the Max χ 2 method has the worst performance.
For the TF dataset, the results in Table 8 show that the best accuracies of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ 2 are 0.759, 0.571, 0.762, and 0.730, respectively. MMSCBA with the scoring method performs the best compared with the other three methods. MMSCBA with the Laplace method has the worst performance. In summary, the above results of the first five datasets show that MMSCBA with the maximum likelihood method has the highest accuracy, and MMSCBA with the Max χ 2 method has the lowest accuracy on average. The accuracy of MMSCBA with the scoring method is relatively stable for various values of α and minsup. The MMSCBA with the Max χ 2 method achieves better accuracy as the value of α decreases, but its performance is sensitive to α.   Table 9 presents the results of a comparison between non-rule-based classifiers (i.e., ANN and SVM) and rule-based classifiers (i.e., C4.5, PART, RIPPER, CBA, and MMSCBA with maximum likelihood). The results show that the performance of the rule-based classifiers is stable but not the best among all techniques. The accuracy of the non-rule-based classifiers is higher than that of the rule-based classifiers in most of the six datasets.   Table 10 presents the results of runtime for all classifiers and datasets. The results show that the CBA and our approach require more runtime than other classification techniques such as SVM, C4.5, PART, and RIPPER. The results are as expected. It is because the runtime of association rule-based approaches (i.e., CBA and MMSCBA) is affected by the minsup. That is, it may require longer execution time when minsup is set too low. Therefore, compared to other heuristic approaches (e.g., C4.5, PART, and RIP-PER etc.), our approach requires more execution time for discovering all possible classification rules from the datasets.
In many real-life applications, non-rule-based classification techniques cannot be adopted due to low interpretability. In contrast, rule-based classification techniques can generate IF-THEN rules, which can be easily stored in a knowledge base. The expert systems can also be easily built by incorporating the rules into an expert system shell. Therefore, while the performance of rule-based classifiers is acceptable, most decision makers would select rule-based classifiers in practice.
Among all the rule-based classifiers, the experimental results also show that the proposed method (i.e., MMSCBA with the maximum likelihood method) outperforms the traditional CBA and other rule-based techniques in three of the six datasets. Compared with other classification methods, the proposed method achieves remarkable accuracy when the dataset contains rare items, such as the BC and BCW datasets. Although C4.5 and CBA perform the best in datasets BC and TF, respectively, MMSCBA with the maximum likelihood method still has a satisfactory performance (i.e., close to the best classifier).

Conclusion
In CBA, it is difficult to discover rules involving rare items using a single minsup threshold because of the rare item problem. This paper presented the concept of integrating MMSs into established classifiers. Unlike conventional multiple thresholds, the proposed method uses three factors (i.e., MIS values for items, MCS values for classes, and MRS values for rule-items) to determine classification rules.
Experimental results involving six real-world datasets demonstrate that MMSCBA with a maximum likelihood classifier achieves higher accuracy than traditional CBA, especially when the dataset contains a rare item. In addition, the MMSCBA method can resolve the inadequacy of class imbalance and the rare item problem.