Building an associative classifier with multiple minimum supports

Hu, Li-Yu; Hu, Ya-Han; Tsai, Chih-Fong; Wang, Jian-Shian; Huang, Min-Wei

doi:10.1186/s40064-016-2153-1

Research
Open access
Published: 26 April 2016

Building an associative classifier with multiple minimum supports

Li-Yu Hu¹,
Ya-Han Hu²,
Chih-Fong Tsai³,
Jian-Shian Wang² &
…
Min-Wei Huang⁴

SpringerPlus volume 5, Article number: 528 (2016) Cite this article

1824 Accesses
10 Citations
Metrics details

Abstract

Classification is one of the most important technologies used in data mining. Researchers have recently proposed several classification techniques based on the concept of association rules (also known as CBA-based methods). Experimental evaluations on these studies show that in average the CBA-based approaches can yield higher accuracy than some of conventional classification methods. However, conventional CBA-based methods adopt a single threshold of minimum support for all items, resulting in the rare item problem. In other words, the classification rules will only contain frequent items if minimum support (minsup) is set as high or any combinations of items are discovered as frequent if minsup is set as low. To solve this problem, this paper proposes a novel CBA-based method called MMSCBA, which considers the concept of multiple minimum supports (MMSs). Based on MMSs, different classification rules appear in the corresponding minsups. Several experiments were conducted with six real-world datasets selected from the UCI Machine Learning Repository. The results show that MMSCBA achieves higher accuracy than conventional CBA methods, especially when the dataset contains rare items.

Background

With the advance of technology in data collection and data processing, enterprises can quickly store large amounts of data. In recent years, data mining has been recognized as a technology that can discover previously unknown and potentially useful information from databases (Witten et al. 2011). Several data mining techniques have been developed, such as association rules mining (Agrawal et al. 1993; Hu and Chen 2006), classification (Cohen 1995; Fernandez-Delgado et al. 2014; Quinlan 1993), clustering (Jain et al. 1999), temporal pattern discovery (Hu et al. 2009; Roddick and Spiliopoulou 2002), and other statistical approaches (Vapnik 1999).

Classification is one of the most important technologies used in data mining. Given a set of data objects as a training set, classification techniques construct classifiers (models) to predict class labels of new data objects. A classifier can be used to infer that a new record belongs to a certain class. Thus far, classification technology has been used in many applications, including customer relationship management, medical diagnosis, and fraud prevention (Jyoti et al. 2011; Ngai et al. 2009; Yoon and Lee 2013). Researchers have developed many classification techniques, which can be categorized as rule-based or non-rule-based approaches. The rule-based approaches, such as decision tree (Quinlan 1993), RIPPER (Cohen 1995), PART (Witten et al. 2011), and classification based on associations (CBA) (Liu et al. 1998, 2000), are typically interpretative and easy to implement. On the other hand, non-rule-based approaches, such as support vector machine (SVM) (Vapnik 1999) and artificial neural network (ANN) (Venkatesh and Thangaraj 2008), have a high noise tolerance but require extensive computation.

Researchers have recently proposed several CBA-based methods, including CMAR (Li et al. 2001), CPAR (Yin and Han 2003), MCAR (Thabtah et al. 2005), CBC (Deng et al. 2014), and MMAC (Thabtah et al. 2004). Experimental studies on these methods show that CBA-based approaches can yield higher accuracy than conventional classification methods. Most CBA-based methods (Li et al. 2001; Thabtah et al. 2004, 2005) adopt a rule selection or pruning techniques to build accurate classifiers by retaining limited but effective rules. These methods typically adopt a single threshold of minimum support for all items (i.e., “item” refers to an attribute-name associated with a valid attribute-value), class labels, and itemsets. However, a single minimum support restricts the applicability of current CBA-based methods. Different items of each rule or class label will likely have different levels of importance. For example, some items or class labels may appear frequently in the database, while others may appear rarely. If the minimum support value is set at a high threshold, few items can satisfy this requirement, and rules with rare items cannot be found. To find rules with rare items, the minimum support value must be set relatively low. However, a lower value of minimum support requires extensive computation because the number of combinatorial itemsets increases exponentially; in addition, most of these itemsets are meaningless.

In the past, the class imbalance problem (Guo et al. 2008) has been addressed. In this case, the distribution of class labels is skewed, and thus, the classifiers have poor performance on rare classes. To solve the class imbalance problem, the CBA-based methods (Liu et al. 2000; Janssens et al. 2005) have applied the concept of multiple minimum supports (MMSs) to differentiate class labels. That is, a different minimum class support is assigned for each class label. It is worth noting that the above works (Liu et al. 2000; Janssens et al. 2005) only focus on the consideration of MMSs for different class labels. However, to the best of our knowledge, previous studies in classification have not integrated the MMSs into various items. Research in association rule mining has shown that the rare item problem (Liu et al. 1999) produces poor-quality rules. Because the selection of a proper set of classification rules is the primary factor in determining the effectiveness of associative classifiers, it is indispensable to address the rare item problem in CBA-based methods.

In this paper, a new approach for classification with MMSs is proposed to tackle all items and class labels in CBA rule generation. The proposed approach provides a user-defined minimum support for each item and each class label. Because different classification rules appear in the corresponding minimum supports, an algorithm based on the established Multiple Support Apriori (MSapriori) algorithm (Thabtah 2007), called MMSCBA, is proposed to discover a complete set of classification rules with MMSs to build MMSCBA-based classifiers, four methods for classification rule selection are considered. Several experiments were conducted with six real-world datasets from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) to evaluate the performance of these classifiers.

The remainder of this paper is organized as follows. “Related work” section presents related research. “Problem definition” section presents the research problem. “The MMSCBA algorithm” section presents the proposed method. “Experimental evaluation” section presents analysis and discussion. Finally, “Conclusion” section presents the conclusion.

Related work

Associative classification

Many studies have shown that associative classification (AC) achieves greater accuracy than other traditional approaches. Several AC-based studies have recently presented classification based on association (CBA) (Liu et al. 1998), classification based on multiple association rules (CMAR) (Li et al. 2001), and classification based on predictive association rules (CPAR) (Yin and Han 2003). An AC-based approach typically consists of three phases: rule generation, rule pruning, and classification.

In the early stage, the CBA approach applies the concept of association rule classification. In CBA, the system initially executes the Apriori algorithm to progressively generate association rules that are satisfied with a user-defined minimum support and confidence threshold. One subset of the generated classification rules becomes the final classifier.

Similarly to CBA, the CMAR approach adopts the FP-Growth algorithm (Guo et al. 2008) to generate frequent itemsets. The subset of matching rules is then used to classify a test object instead of one rule, and this, in turn, improves accuracy. The CMAR approach generates and evaluates rules similarly to CBA; however, CMAR uses a more efficient FP-tree structure. In addition, the CMAR approach considers multiple rules in predicting associated weights. Therefore, CMAR yields higher accuracy than CBA.

Both CBA and CMAR incur a high computation cost in rule generation and rule selection if the dataset is large. To avoid a high computation cost, the CPAR (Yin and Han 2003) approach generates a small set of predictive rules directly from the dataset based on rule prediction and coverage analysis instead of generating candidate rules. The core of CPAR is its predictive rule mining capability, in which an object is correctly covered by a rule instead of being removed. The weight of this object is decreased by multiplying a factor. This is essentially a greedy approach to rule generation and is more efficient than generating all candidate rules. The CPAR approach also uses a dynamic programming approach to avoid repeating calculations during rule generation, allowing it to propose the best k rules in prediction. Previous studies have provided more complete surveys of associative classification (Thabtah 2006, 2007; Deen et al. 2010; Swami and Jain 2005).

Multiple minimum supports

Mining frequent patterns with a single minimum support (abbreviated as minsup) implicitly assumes that every item has the same property (i.e., frequency). If the minsup value is high, the rules involving rare items will not be found. Conversely, if the minsup value is low, a large number of meaningless rules will be generated. The MSapriori (Liu et al. 1999) approach has been proposed to extract frequent rules with rare items. In MSapriori, users are able to discover rare item rules without using frequent items to generate vast numbers of meaningless rules. Based on the definition in (Liu et al. 1999), each item in the database has a minsup that is expressed as minimum item support (MIS), and users can specify different values of MIS for different items. This approach makes it possible to observe the nature of the items and their frequencies. The definition of MIS is given as follows.

Definition 1

Let I = {i ₁, i ₂, …, i _n} be a set of items, and let MIS(i _p) denote the MIS value of item i _p $(i_{p} \in I)$. The MIS value of itemset A = {i ₁, i ₂, …, i _k}(1 ≦ k ≦ n) is defined as follows (Liu et al. 1999).

$$MIS\left( A \right) = min\left[ {MIS\left( {i_{1} } \right),MIS\left( {i_{2} } \right), \ldots ,MIS\left( {i_{k} } \right)} \right]$$

Example 1

Consider a database including three items: Milk, Granola, and Beer. The user-defined MIS values are described as follows:

$$MIS\left( {Milk} \right) = 3\;\% ,MIS\left( {Granola} \right) = 1\;\% ,MIS\left( {Beer} \right) = 0.5\;\%$$

If the support of itemset {Milk, Granola} is 0.7 %, then itemset {Milk, Granola} is infrequent because the MIS value of itemset {Milk, Granola} is equal to min[MIS(Milk), MIS(Granola)] = 1 %, which is larger than 0.7 %.

In conventional frequent pattern mining, the complete set of frequent patterns satisfies the downward closure property if there is only one minsup. That is, if an itemset is frequent, then all its subsets are also frequent. However, in the case of MMSs, the downward closure property does not hold; that is, certain subsets of a frequent itemset are not frequent and their support values are indeterminate.

Example 2

Continuing Example 1, the itemset {Milk, Granola} is infrequent because the support of itemset {Milk, Granola} is 0.7 %. If the support of itemset {Milk, Granola, Beer} is 0.5 %, then itemset {Milk, Granola, Beer} is frequent because MIS(Beer) is only 0.5 %. Clearly, the subset of the frequent itemset is not frequent.

To solve this problem, the sorted closure property is proposed in (Liu et al. 1999). Suppose that all items in an itemset are sorted in ascending order according to their MIS values. The MIS value of any superset of an itemset is equal to that of the first item in this itemset. If an itemset is infrequent based on the MIS value of its first item (i.e., the smallest MIS value among all items in this itemset), then none of its supersets will be frequent. Based on the above property, MSapriori (Liu et al. 1999) can decrease the search space to discover all frequent itemsets with MMSs. Specifically, MSapriori presorts all items according to their MIS values but modifies the procedure of generating candidate sets. Because the supports of certain subsets are indeterminate, MSapriori requires post-processing to compute the supports of all subsets of frequent itemsets.

Several extensions of the MSapriori algorithm have been proposed. Hu and Chen (2006) proposed a new data structure, MIS-tree, to enhance the efficiency of MSapriori and to discover frequent patterns with MMSs. The procedure for constructing the MIS-tree only scans a database once. Kiran and Reddy (2010) also proposed an enhanced method. They designed a new method of calculating the MIS value called support difference. Second, they proposed an FP-growth-like algorithm to extract rare frequent patterns. Finally, they used an evaluation scheme called “item-to-pattern difference” to adjust the distortion if the frequency between each item varies widely. Lee et al. (2005) considered a new perspective on minimum supports. They proposed the concept of maximum constraint, which provides a thorough explanation for certain domains. They also adopted the Apriori-based algorithm to discover large itemsets and association rules within the constraint. Chen et al. (2009) also proposed a fuzzy-based approach called the divide-and-conquer genetic-fuzzy mining algorithm for items with MMSs (DGFMMS). The DGFMMS is designed to find minimum supports, membership functions, and fuzzy association rules.

Problem definition

Let I = {i ₁, i ₂,…, i _n} denote a set of distinct items, where i _p (1 ≤ p≤n) is an item presented in the format of a pair (attribute-name, attribute-value). An event e is a non-empty set of items, and each item in e follows a different attribute-name. Let Y be a set of class labels. A rule-item r is of the form: r = {e, y}, where y is a class label and $y \in Y$.

Definition 2

Given two rule-items $\alpha = \left\{ {(i_{1}^{\alpha } i_{2}^{\alpha } \ldots i_{n}^{\alpha } ),y_{\alpha } } \right\}$ and $\beta = \left\{ {(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } ),y_{\beta } } \right\}$ where $y_{\alpha } ,y_{\beta } \in Y$ and $m \le n$ holds. The event in β, i.e., $(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } )$, is said to be contained in α if there exist integers 1 ≤ k ₁ < k ₂ < ··· <k _m ≤ n such that $i_{1}^{\beta } = i_{{k_{1} }}^{\alpha }$, $i_{2}^{\beta } = i_{{k_{2} }}^{\alpha }$, …, $i_{m}^{\beta } = i_{{k_{m} }}^{\alpha }$. Moreover, a rule-item β is contained in α if $(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } )$ is contained in α, and $y_{\alpha } = y_{\beta }$.

Example 3

Suppose there is a rule-item α = {(a, 1)(b, 2)(c, 1)(b, 1)(d, 2), y ₁} and that the rule-item β = {(a, 1)(b, 2)(c, 1)(d, 2), y ₁} is contained in α because the relationship between α and β is satisfied by the two conditions presented previously. As another example, the rule-item γ = {(a, 1) (b, 1) (d, 3), y ₂} is not contained in α because item (d, 3) is not included in α; that is, condition (1) is not true in the case of α and γ.

Definition 3

A database D consists of a set of records (id, γ), where γ is a rule-item and id is the identifier of this rule-item. Given a rule-item $\beta = \left\{ {(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } ),y_{\beta } } \right\}$ for rule-item β in D, define the event support count e_supp, the class support count y_supp and the rule-item support count r_supp as:

$$\begin{aligned} e\_supp_{D} \left( \beta \right)& = |\{ \left( {id,\gamma } \right)|\left( {id,\gamma } \right) \in D \wedge (i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } )\,{\text{is contained in}}\,\gamma | \hfill \\ y\_supp_{D} \left( \beta \right)& = |\{ \left( {id,\gamma } \right)|\left( {id,\gamma } \right) \in D \wedge y_{\beta } \,{\text{is contained in}}\,\gamma | \hfill \\ r\_supp_{D} \left( \beta \right) &= |\{ \left( {id,\gamma } \right)|\left( {id,\gamma } \right) \in D \wedge \beta \,{\text{is contained in}}\,\gamma | \hfill \\ \end{aligned}$$

Example 4

Table 1 shows all attribute-values for each attribute and the complete set of items. Table 2 shows the sample database D. Given a rule-item β = {(a, 1)(d, 2)(e, 1), y ₁}, the event support count of β in D, e_supp _D(β), is 4 (see sid 1, 2, 4, and 5); the class support count of β in D, y_supp _D(β), is 3 (see sid 1, 4, and 5); and the rule-item support count of β in D, r_supp _D(β), is 3 (see sid 1, 4, and 5).

Table 1 The attribute-name and attribute-values

Full size table

Table 2 The sample database D

Full size table

As discussed previously, a single minimum support is inapplicable to real-life cases because of the rare item problem. In this paper, the concept of MMSs is introduced, where a user specifies the minimum support threshold of each item.

Definition 4

Let MIS(i _p) denote the minimum item support of item i _p, where i _p ϵ I. In addition, MCS(y) represents the minimum class support of a class label y. Given a rule-item $\beta = \left\{ {(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } ),y_{\beta } } \right\}$, the minimum rule-item support of rule-item β, denoted as MRS(β), is equal to the minimum support value among all items and MCS(y _β) (i.e., $\hbox{min} (MIS(i_{1}^{\beta } ),MIS(i_{2}^{\beta } ), \ldots ,MIS(i_{m}^{\beta } ),MCS(y_{\beta } ))$).

By using differing minimum item supports for the respective items, users can effectively determine the support requirements for different items. The property of MMSs allows higher minimum supports for the rule-items that only involve frequent items and lower minimum supports for the rule-items that contain rare items.

Definition 5

Given a database D and a rule-item $\beta = \left\{ {(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } ),y_{\beta } } \right\}$, we call β a frequent rule-item if r_supp _D(β) ≥ MRS(β). Moreover, the confidence of a frequent rule-item β is defined as follows:

$$r\_conf_{D} (\beta ) = \frac{{r\_supp_{D} (\beta )}}{{e\_supp_{D} (\beta )}}.$$

Example 5

Continuing Example 4, the user-specified minimum thresholds are given as follows: MIS(a, 1) = 3, MIS(a, 2) = 4, MIS(a, 3) = 1, MIS(b, 1) = 3, MIS(b, 2) = 4, MIS(c, 1) = 2, MIS(c, 2) = 1, MIS(c, 3) = 2, MIS(d, 1) = 2, MIS(d, 2) = 3, MIS(e, 1) = 2, MIS(e, 2) = 2, MCS(y ₁) = 2, and MCS(y ₂) = 1. For a rule-item β = {(a, 1)(d, 2)(e, 1), y ₁}, the MRS(β) is equal to min(MIS(a, 1), MIS(d, 2), MIS(e, 1), MCS(y ₁)) = min(3, 3, 2, 2) = 2. Because r_supp _D(β) satisfies MRS(β) (i.e., 3 ≧ 2), we call β a frequent rule-item and

$$r\_conf_{D} (\beta ) = \frac{{r\_supp_{D} (\beta )}}{{e\_supp_{D} (\beta )}} = \frac{3}{4} = 0.75.$$

In summary, this approach discovers all frequent rule-items that are satisfied with their own MRS. Next, an associative classifier can be built based on the set of all frequent rule-items. For example, a frequent rule-item $\beta = \left\{ {(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } ),y_{\beta } } \right\}$ indicates a classification rule $(i_{1}^{\beta } i_{2}^{\beta } \ldots i_{m}^{\beta } ) \to y_{\beta }$ in which the support and confidence are equal to $r\_supp_{D} (\beta )$ and $\frac{{r\_supp_{D} (\beta )}}{{e\_supp_{D} (\beta )}}$, respectively.

The MMSCBA algorithm

The process of discovering a complete set of frequent rule-items is illustrated in Fig. 1. Initially, scan the complete database D once and count the supports of each item. Given the lowest minimum rule-item support MRS _all, prune the items not satisfying MRS _all and then form a pruned database D’ in which the rule-items are sorted by MIS and MCS in ascending order. Then, divide D’ into partitions, denoted as $D_{y}^{'}$, where each class label y satisfies MCS. For each partition $D_{y}^{'}$, the Multiple supports—Classification Based on Associations (MMSCBA) algorithm is performed to find frequent rule-items. Next, collect all frequent rule-items and their r_supp _D from each partition. Because the database D’ is divided into separate partitions, scan the entire database to calculate the e_supp _D of the frequent rule-items found in each partition. Finally, all frequent rule-items with their e_supp _D and r_supp _D become classification rules, forming the proposed classifier.

The following subsections depict the MMSCBA algorithm and the scoring approaches for class label prediction.

The MMSCBA algorithm

As Fig. 2 shows, the MMSCBA algorithm includes three functions: (1) Candidate-Gen-C₂(L ₁), (2) Candidate-Gen-C_k(L _k−1), and (3) Check-MRS(c).

In Line 1, scan the database $D_{y}^{'}$ to obtain the support count of each item i, denoted as r_supp(i). In Line 2, compare r_supp(i) with the value of MRS(i) to determine whether the item i is frequent. Each rule-item with an r_supp(i) value greater than or equal to MRS(i) is inserted into frequent 1-rule-item set L ₁. From Lines 3 to 6, use L _k−1 to generate C _k. By calling Candidate-Gen-C ₂(L ₁), use L ₁ to generate all 2-candidate-rule-items to form C ₂. Similarly, use Candidate-Gen-C _k(L _k−1) (k > 2) to generate all k-candidate-rule-items C _k from L _k−1. “Candidate-rule-item generation” section details the procedure of candidate rule-item generation. After generating the set of candidate-rule-items, Line 7 scans $D_{y}^{'}$ to obtain the support count of each candidate-rule-item c, r_supp(c). From Lines 8 to 9, use the check-MRS(c) function to obtain the minimum support of c, denoted as MRS(c). Then, in Lines 10 and 11, the candidate-rule-item c with r_supp(c) ≥ MRS(c) is inserted into L _k. At the end of this stage, we can identify all frequent rule-items from $D_{y}^{'}$.

Candidate-rule-item generation

From the overview in “The MMSCBA algorithm” section, we can see that the basic concept of the MMSCBA algorithm is similar to the traditional Apriori algorithm (Agrawal et al. 1993). There exists, however, a significant difference between our candidate generation functions and the traditional ones. The main reason for this is that we consider the concept of multiple minimum supports, and the downward closure property no longer holds in our approach. In other words, sub-rule-items of a frequent rule-item may not be frequent because the supports of a frequent rule-item and its sub-rule-items may differ. Therefore, to generate a complete set of candidate-rule-items, this study proposes two new candidate generation methods, Candidate-Gen-C ₂ and Candidate-Gen-C _k, which are based on the definition of MMSs.

Figure 3 presents the function Candidate-Gen-C ₂(L ₁). Use L ₁ to generate C ₂ in $D_{y}^{'}$. In L ₁, each two frequent 1-rule-items are joined to form a 2-candidate-rule-item. For example, two frequent 1-rule-items (i ₁, y ₁) and (i ₂, y ₁) can be joined as a 2-candidate-rule-item, {(i ₁, i ₂), y ₁}. Because all rule-items in $D_{y}^{'}$ have the same class label, we can ignore the class label and only consider the events in two frequent 1-rule-items in the candidate generation process. Note that the attribute-names of i ₁ and i ₂ cannot be the same (i.e., i ₁.attribute-name ≠ i ₂.attribute-name), and all items in a candidate are sorted in increasing order of their MIS values.

As Fig. 4 shows, the function Candidate-Gen-C _k(L _k−1) uses L _k−1 to generate C _k. Given two (k−1)-rule-items p and q, two k-candidate-rule-items (k > 2) can be generated if the following two conditions are satisfied: (1) the first (k − 2) items of both p and q are the same; (2) the attribute-names of the last items in p and q are the same. Figure 5 shows two possible k-candidate-rule-items generated by the function Candidate-Gen-C _k(L _k−1). Note that if the MIS(p.item _k−1) ≥ MIS(q.item _k−1) then the k-candidate-rule-item cd ₁ is generated; otherwise, cd ₂ is generated.

Example 7

Continuing Example 4, consider two frequent 4-rule-items with class label y ₂, where d ₁ = {(i ₁₁)(i ₁)(i ₄)(i ₂), y ₂} and d ₂ = {(i ₁₁)(i ₁)(i ₄)(i ₇), y ₂}. Join the two 4-rule-items to form a new 5-candidate-rule-item in which the first three items in d ₁ are identical to those in d ₂, but their last items are different. Because MIS(i ₇) = 3, which is larger than MIS(i ₂) = 4, a 5-candidate-rule-item cd ₁ = {(i ₁₁)(i ₁)(i ₄)(i ₇)(i ₂), y ₂} can be generated (as shown in Fig. 6).

It is essential that the complete set of frequent patterns can be discovered through the algorithm. Because MMSCBA adopts the candidate-generation-and-test approach to discover all frequent rule-items, the completeness of the candidate generation method needs to be clarified.

Because our approach considers the concept of MMSs, all frequent rule-items must satisfy the sorted closure property. That is, any sub-rule-item β of a frequent rule-item α is also a frequent rule-item if MRS(β) = MRS(α). If r_supp(α) ≧ MRS(α), then r_supp(β) also satisfy MRS(α) = MRS(β), i.e., β is also a frequent rule-item. This property ensures that our candidate-generation-and-test method is feasible because all possible k-candidate-rule-items can be generated from their (k − 1)-sub-rule-items.

Predicting class label based on classification rules

After generating all classification rules, use them to classify uncertain objects in a testing dataset. The prediction of the class labels in associative classification can be categorized into two main approaches: prediction based on the highest precedence single rule-item and prediction based on multiple rule-items. In this study, four prediction measurements are considered: Maximum likelihood (Liu et al. 1998; Thabtah et al. 2005), Max χ² (Li et al. 2001), Laplace (Yin and Han 2003), and Scoring (Hu et al. 2007).

Maximum likelihood

Given a testing data object α and a set of classification rules, the maximum likelihood approach only considers the highest precedence rule that matches α. If there is no applicable rule to match α, then the default class label is assigned to α. Several associative classification algorithms (Liu et al. 1998; Thabtah et al. 2005) have adopted the maximum likelihood approach for class label prediction.

Max χ²

Instead of considering a single rule in class label prediction (i.e., Maximum likelihood), the CMAR algorithm (Li et al. 2001) exploits a prediction method that selects a subset of high-confidence rules that are applicable to a class label. The prediction is made by analyzing the correlation among the rules. The correlation is measured using weighted χ ² analysis to examine the strength of a rule-item based on its support and class frequency in the set of rule-items.

Following Definition 3, the weighted χ ² of a rule-item, denoted as Max χ ², is defined as follows:

$$Max\chi^{2} = \left\{ \hbox{min} \left[e\_supp_{D} ,y\_supp_{D} \right] - \frac{{e\_supp_{D} \times y\_supp_{D} }}{|D|} \right\}^{2} \times |D| \times u$$

where

$$u = \frac{1}{{e\_supp_{D} \times y\_supp_{D} }} + \frac{1}{{e\_supp_{D} \times (|D| - y\_supp_{D} )}} + \frac{1}{{(|D| - e\_supp_{D} ) \times y\_supp_{D} }} + \frac{1}{{(|D| - e\_supp_{D} ) \times (|D| - y\_supp_{D} )}}.$$

Laplace

Laplace accuracy (Quinlan 1986) is used to estimate the expected accuracy of a rule item. Given a rule-item r, Laplace accuracy can be defined as follows.

$$Laplace\left( r \right) = \frac{{y\_supp_{D} (r) + 1}}{{e\_supp_{D} (r) + |Y|}}$$

where |Y| is the number of classes.

To classify a data object, this approach first identifies all matching rule-items and groups them by class labels. For each rule set (i.e., rules having the same class label), the best k rules are chosen and then used to calculate the average Laplace accuracy of a class label. Finally, the class label with the highest average Laplace accuracy will be selected as the final prediction outcome.

Scoring

Hu et al. (2007) proposes a scoring method to calculate the score of each class label based on all matching rules. Given a frequent rule-item r, the two scoring functions are described as follows:

$$\begin{aligned} WeightedSupport(r) & = \frac{{r\_supp_{D} (r )}}{{MRS_{all} }} \\ WeightedConfidence(r) & = r\_conf_{D} (r )\times \frac{{r\_supp_{D} (r)}}{{MRS_{all} }} \\ \end{aligned}$$

The whole procedure of the scoring method can be stated as follows. Given a testing data object α, we first identify the complete set of classification rules satisfying α, meaning that the event part of a rule-item is a subset of α. Next, we divide these rules into sets according to their class labels. The WeightedSupport and WeightedConfidence of a rule set can be accumulated by summing the score of each rule-item in the set. The class label with the highest WeightedConfidence value is selected as the prediction label. If there is more than one class label with the highest WeightedConfidence value, then we compare their WeightedSupport and choose the class label with the highest value of WeightedSupport as the prediction label.

Experimental evaluation

Data collection and experimental setup

Six real-world datasets are selected from the UCI machine learning repository website (http://archive.ics.uci.edu/ml/). Table 3 provides a description of these datasets.

Table 3 Detailed information of the UCI datasets

Full size table

The experiments were run on a Windows 7 PC equipped with a Intel core i5-4570 3.2 GHz processor and 16 GB of RAM. The proposed methods were implemented using the JAVA language. Several well-known classification techniques were also considered in experimental evaluations, including C4.5, SVM, PART, ANN, RIPPER, and traditional CBA. Among them, C4.5 (Quinlan 1986, 1993), SVM, PART, ANN, and RIPPER were performed using WEKA 3.6.10 (www.cs.waikato.ac.nz/ml/weka) (Witten et al. 2011), a popular suite of machine learning software; the CBA algorithm was performed by adopting its implementation version in (Liu et al. 1998). In all experiments, ten-fold cross-validation (Burman 1989) was adopted to estimate the performance of the proposed method. The accuracy, defined as the proportion of the true results (i.e., both truth positive and truth negative) among the total number of samples examined, was used as the metric to measure the performance of the algorithms.

To easily generate MIS values on each item in MMSCBA, we adopted the method proposed in Thabtah (2007), which considers the actual frequencies of items as the basis for MIS value assignment. The equations are stated as follows:

$$\begin{aligned} MIS(i_{p} ) & = \left\{ \begin{array}{ll} M(i_{p} ), & \quad {\text{ if }}M(i_{p} ) \ge MRS_{all} \hfill \\ MRS_{all} , &\quad {\text{ otherwise}} \hfill \\ \end{array} \right. \\ M(i_{p} ) & = \sigma \times f(i_{p} ) \, 0 \le \sigma \le 1 \\ \end{aligned}$$

where f(i _p) represents the number of times item i _p $(i_{p} \in I)$ occurs in the database, and MRS _all denotes the smallest MIS value among all items. σ (0 ≦ σ ≦ 1) can be used to control the effect of the MIS value in the mining process. In the experiments, we modified the σ value from 0 to 1. If σ is set to 0, all items will have identical MIS values (i.e., MRS _all) and will produce the same results as traditional association rule mining. If σ is set to 1 and $M(i_{p} ) \ge MRS_{all}$, $f(i_{p} )$ is the MIS value for i _p.

Results

For every dataset, the value of minsup is set as follows: (1) $0.2 \le minsup \le 0.4$ for datasets BS, BC, and BCW; (2) $0.1 \le minsup \le 0.3$ for datasets M2 and TF; and (3) $0.005 \le minsup \le 0.007$ for the dataset WF.

Table 4 presents the classification results of the BS dataset using the MMSCBA with four rule selection methods. The best accuracy of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ² are 0.748, 0.593, 0.708, and 0.384, respectively. MMSCBA with maximum likelihood performs the best compared with the other three classification rule selection methods reported above. MMSCBA with Max χ² has the worst performance.

Table 4 The experimental results of dataset BS

Full size table

For the BC dataset, the results in Table 5 show that the best accuracy of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ² at 0.705, 0.595, 0.706, and 0.624, respectively. MMSCBA with maximum likelihood and scoring methods performs the best compared with the other two methods. MMSCBA with the Laplace method has the worst performance.

Table 5 The experimental results of dataset BC

Full size table

For the BCW dataset, the results in Table 6 show that the best accuracies of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ² were 0.963, 0.950, 0.770, and 0.818, respectively. MMSCBA with the maximum likelihood method performs the best compared with the other three methods. MMSCBA with the scoring method has the worst performance.

Table 6 The experimental results of dataset BCW

Full size table

For the M2 dataset, the results in Table 7 show that the best accuracies of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ² are 0.657, 0.629, 0.672, and 0.604, respectively. MMSCBA with the scoring method performs the best compared with the other three methods. MMSCBA with the Max χ² method has the worst performance.

Table 7 The experimental results of dataset M2

Full size table

For the TF dataset, the results in Table 8 show that the best accuracies of MMSCBA with maximum likelihood, Laplace, scoring, and Max χ² are 0.759, 0.571, 0.762, and 0.730, respectively. MMSCBA with the scoring method performs the best compared with the other three methods. MMSCBA with the Laplace method has the worst performance.

Table 8 The experimental results of dataset TF

Full size table

In summary, the above results of the first five datasets show that MMSCBA with the maximum likelihood method has the highest accuracy, and MMSCBA with the Max χ² method has the lowest accuracy on average. The accuracy of MMSCBA with the scoring method is relatively stable for various values of α and minsup. The MMSCBA with the Max χ ² method achieves better accuracy as the value of α decreases, but its performance is sensitive to α.

Table 9 presents the results of a comparison between non-rule-based classifiers (i.e., ANN and SVM) and rule-based classifiers (i.e., C4.5, PART, RIPPER, CBA, and MMSCBA with maximum likelihood). The results show that the performance of the rule-based classifiers is stable but not the best among all techniques. The accuracy of the non-rule-based classifiers is higher than that of the rule-based classifiers in most of the six datasets.

Table 9 Classification accuracy (%) for all classification techniques

Full size table

Table 10 presents the results of runtime for all classifiers and datasets. The results show that the CBA and our approach require more runtime than other classification techniques such as SVM, C4.5, PART, and RIPPER. The results are as expected. It is because the runtime of association rule-based approaches (i.e., CBA and MMSCBA) is affected by the minsup. That is, it may require longer execution time when minsup is set too low. Therefore, compared to other heuristic approaches (e.g., C4.5, PART, and RIPPER etc.), our approach requires more execution time for discovering all possible classification rules from the datasets.

Table 10 Runtime (s) for all classification techniques

Full size table

In many real-life applications, non-rule-based classification techniques cannot be adopted due to low interpretability. In contrast, rule-based classification techniques can generate IF–THEN rules, which can be easily stored in a knowledge base. The expert systems can also be easily built by incorporating the rules into an expert system shell. Therefore, while the performance of rule-based classifiers is acceptable, most decision makers would select rule-based classifiers in practice.

Among all the rule-based classifiers, the experimental results also show that the proposed method (i.e., MMSCBA with the maximum likelihood method) outperforms the traditional CBA and other rule-based techniques in three of the six datasets. Compared with other classification methods, the proposed method achieves remarkable accuracy when the dataset contains rare items, such as the BC and BCW datasets. Although C4.5 and CBA perform the best in datasets BC and TF, respectively, MMSCBA with the maximum likelihood method still has a satisfactory performance (i.e., close to the best classifier).

Conclusion

In CBA, it is difficult to discover rules involving rare items using a single minsup threshold because of the rare item problem. This paper presented the concept of integrating MMSs into established classifiers. Unlike conventional multiple thresholds, the proposed method uses three factors (i.e., MIS values for items, MCS values for classes, and MRS values for rule-items) to determine classification rules.

Experimental results involving six real-world datasets demonstrate that MMSCBA with a maximum likelihood classifier achieves higher accuracy than traditional CBA, especially when the dataset contains a rare item. In addition, the MMSCBA method can resolve the inadequacy of class imbalance and the rare item problem.

Two related issues are worthy of future research. The first is the applicability of this approach to other types of datasets. Previous studies have proposed varied factors that are useful in specific cases; however, these factors are often impractical for analyzing new (or unknown types of) data. The second issue concerns efficiency. Instead of using the Apriori-like algorithm, the proposed method should be extended to other efficient pattern discovery approaches, such as the FP-growth and distributed computing algorithms.

References

Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216
Article Google Scholar
Burman P (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76:503–514
Article Google Scholar
Chen CH, Hong TP, Tseng VS (2009) An improved approach to find membership functions and multiple minimum supports in fuzzy data mining. Expert Syst Appl 36:10016–10024
Article Google Scholar
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123
Deen AA, Nofal M, Bani-Ahmad S (2010) Classification based on association-rule mining techniques: a general survey and empirical comparative evaluation. Ubiquitous Comput Commun J 5:9–17
Google Scholar
Deng H, Runger G, Tuv E, Bannister W (2014) CBC: an associative classifier with a small number of rules. Decis Support Syst 50:163–170
Article Google Scholar
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
Google Scholar
Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: Proceedings of the fourth international conference on natural computation, pp 192–201
Hu YH, Chen YL (2006) Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decis Support Syst 42:1–24
Article Google Scholar
Hu YH, Chen YL, Lin EH (2007) Classification of time-sequential attributes by using sequential pattern rules. In: Proceedings of the fourth international conference on fuzzy systems and knowledge discovery, pp 735–739
Hu YH, Huang TCK, Yang HR, Chen YL (2009) On mining multi-time-interval sequential patterns. Data Knowl Eng 68:1112–1127
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Article Google Scholar
Janssens D, Wets G, Brijs T, Vanhoof K (2005) Adapting the CBA algorithm by means of intensity of implication. Inf Sci 173:305–318
Article Google Scholar
Jyoti S, Ujma A, Dipesh S, Sunita S (2011) Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int J Comput Appl 17:43–48
Google Scholar
Kiran RU, Reddy PK (2010) Improved approaches to mine rare association rules in transactional databases. In: Proceedings of the fourth SIGMOD Ph.D. workshop on innovative database research, pp 19–24
Lee YC, Hong TP, Lin WY (2005) Mining association rules with multiple minimum supports using maximum constraints. Int J Approx Reason 40:44–54
Article Google Scholar
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of IEEE international conference on data mining, pp 369–376
Liu B, Ma Y, Wong C (2000) Improving an association rule based classifier. Lect Notes Comput Sci 1910:504–509
Article Google Scholar
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the fourth ACM SIGKDD international conference on knowledge discovery and data mining, pp 80–86
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp 337–341
Ngai EWT, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36:2592–2602
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng 14:750–767
Article Google Scholar
Swami DK, Jain RC (2005) A survey of associative classification algorithms. ADIT J Eng 2:51–55
Google Scholar
Thabtah FA (2006) Pruning techniques in associative classification: survey and comparison. J Digit Inf Manag 4:197–202
Google Scholar
Thabtah FA (2007) A review of associative classification mining. Knowl Eng Rev 22:37–65
Article Google Scholar
Thabtah FA, Cowling P, Peng Y (2004) MMAC: a new multi-class, multi-label associative classification approach. In: Proceedings of the fourth IEEE international conference on data mining, pp 217–224
Thabtah FA, Cowling P, Peng Y (2005) MCAR: multi-class classification based on association rule. In: Proceedings of the 3rd ACS/IEEE international conference on computer systems and applications, pp 127–133
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999
Article Google Scholar
Venkatesh E, Thangaraj P (2008) Self-organizing map and multi-layer perceptron neural network based data mining to envisage agriculture cultivation. J Comput Sci 4:494–502
Article Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Google Scholar
Yin X, Han J (2003) CPAR: classification based on predictive association rules. In: Proceedings the third SIAM international conference on data mining, pp 331–335
Yoon Y, Lee GG (2013) Two scalable algorithms for associative text classification. Inf Process Manag 49:484–496
Article Google Scholar

Download references

Authors’ contributions

LY participated in the design of the study and drafted the manuscript. YH participated in the design of the study, performed the statistical analysis, and drafted the manuscript. CF performed the statistical analysis and helped to draft the manuscript. JS carried out the acquisition of data and participated in data analysis and helped to draft the manuscript. MW participated in the design of the study and drafted the manuscript. All authors read and approved the final manuscript.

Acknowledgements

None.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Department of Psychiatry, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan, ROC
Li-Yu Hu
Department of Information Management, National Chung Cheng University, Chiayi, 62102, Taiwan, ROC
Ya-Han Hu & Jian-Shian Wang
Department of Information Management, National Central University, Jhongli, 32001, Taiwan, ROC
Chih-Fong Tsai
Department of Psychiatry, Chiayi Branch, Taichung Veterans General Hospital, Chiayi, Taiwan, ROC
Min-Wei Huang

Authors

Li-Yu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Han Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Fong Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Shian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min-Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min-Wei Huang.

Additional information

Li-Yu Hu and Ya-Han Hu contributed equally to this work

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Hu, LY., Hu, YH., Tsai, CF. et al. Building an associative classifier with multiple minimum supports. SpringerPlus 5, 528 (2016). https://doi.org/10.1186/s40064-016-2153-1

Download citation

Received: 12 October 2015
Accepted: 12 April 2016
Published: 26 April 2016
DOI: https://doi.org/10.1186/s40064-016-2153-1

Building an associative classifier with multiple minimum supports

Abstract

Background

Related work

Associative classification

Multiple minimum supports

Definition 1

Example 1

Example 2

Problem definition

Definition 2

Example 3

Definition 3

Example 4

Definition 4

Definition 5

Example 5

The MMSCBA algorithm

The MMSCBA algorithm

Candidate-rule-item generation

Example 7

Predicting class label based on classification rules

Maximum likelihood

Max χ2

Laplace

Scoring

Experimental evaluation

Data collection and experimental setup

Results

Conclusion

References

Authors’ contributions

Acknowledgements

Competing interests

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Max χ²