### Ontology

According to Gruber (1993), ontology is a conceptualization of a specification. Instead of using this very broad explanation, the present research takes a more tangible definition of ontology as a domain representation in which the concepts and the relations among them become explicit to allow either human negotiation of the denotations and/or machine inferences for a specific application. People build domain ontology, and consequently, rarely is there a consensual understanding of all concepts, and different ontologies can be built to describe the same domain. Since the 1990s, there have be have been many studies not only on techniques with which to build (Musen 1992; Protégé2000) and map ontologies (Euzenat and Shvaiko 2007), but also on applications that could benefit from representing knowledge using domain ontology, such as knowledge-based systems, information retrieval mechanisms, agent communication language definition and search methods. The present research investigates the effects of using domain ontology to improve the precision and recall of association-rule extraction from large databases.

Identifying the concepts and relations between concepts is crucial to building an effective domain ontology. Although the semantics of the relations can be defined as one builds an ontology, there are some relations with established meanings, such as “is-a”, “part-of” and “attribute-of”. The first relation, is-a, comes from the set theory relation set and subset. Consequently, the subset carries all definitions of the set. If Peter is-a human, then Peter inherits all characteristics of being a human. This description can be very useful in describing concepts in a more concise way.

The second relation, part-of, brings to bear the concept of composition/decomposition. The parts make the whole. A motor engine and chassis are parts of a car. A motor alone cannot be considered a car.

Attribute-of is a simple relation that represents properties of the concepts. For example, a car has a color and might have an owner. Both are attributes of a car. It is natural to map the way one describes a domain in natural language and the relations the concepts present. However, care needs to be taken for language might be misleading. A car has a color and has a motor. Although, in general, we use the same verb to connect the car and color, and the car and motor engine, the relations between these concepts are different. In the first one, the relation is clearly about characteristics of the object, while in the second one, the relation refers to composition.

### Association rules

The use of association rules is a popular technique of mining data; the technique shows the correlation between sets of items in a series of data or transactions. Association rules are an “IF antecedent THEN consequent” type of rule that guarantees, with a certain probability (confidence threshold), that whenever the antecedent happens, the consequent will follow. These rules are generated from sets of elements (itemsets) that appear together with at least some frequency (support). The most popular algorithm for obtaining association rules is Agrawal’s apriori (Agrawal and Srikant 1994). Considering a fixed confidence value, the setting of the support threshold will determine whether too many association rules are set or important relationships in the data are missed.

#### Semantic treatment of association rules

Pairs of association rules for which items in the antecedent are semantically correlated can be simplified as one single association rule, either more comprehensive or more specific, depending on the context.

In generalizations, we value the summarized view of discovered relationships, whereas in specializations, we value the rules’ discriminatory ability. To verify whether general rules can substitute various specific rules, one must check if the general rules provide enough coverage over all the specific rules. In this case, the rules with instances in the antecedent can be pruned. If not, there are singular specific rules that do not fit into the general rule and cannot be pruned.

This semantic treatment of using predefined domain knowledge to prune an association-rule outcome can be used during post- and pre-processing as described below.

#### Semantic post-processing

Our semantic post-processing consists of, initially, enhancing the rule with domain information for later analysis and decision upon pruning. Each more general enhancing rule should be able to substitute a number of specific rules by way of a generalization process. If this is possible, there are simultaneously a semantic enhancement of the set of mined association rules and a future reduction in the cardinality of the set of rules.

Our post-processing technique does not work if the dependent attribute chosen in the ontology selection doesn’t cover the determinant attributes. Whenever the results to be added do not have a reasonably coverage, the specialization will discard the aggregating rule that was semantically generated.

The CRg indicator, defined in formula 1, measures the generality of the more general rule in relation to the attributes that determined the dependent variable. This measure is based on the Coverage interest measure (Lavrac et al. 1999), which represents the fraction of instances covered by the antecedent of the rule. This can be considered a measure of rule generality. The value of the Coverage of a rule is given by the support of the antecedent of this rule.

**Definition** *(CRg). Let D be a multidimensional database. Let R*
_{i}
*be an association rule in the form* Y_{i} ^ A ⇒ B*, and the set of corresponding rules in the form* X_{ij} ^ A ⇒ B*, obtained from D. The value of the measure CRg for R*
_{i}
*and S*
_{i}
*is given by*

C{R}_{g}\left({R}_{i},{S}_{i}\right)=\frac{{\displaystyle \sum {}_{j-1.n}}sup\left(\mathit{\text{ant}}\left({r}_{\mathit{ij}}\right)\right)}{sup(\mathit{\text{ant}}\left({R}_{i}\right)}

(1)

The larger the measure of CRg, the larger the representability of the instances covered by the more specific rules in relation to those instances covered by the more general rule, which in turn means a uniform behavior of the population. The measure CRg can be interpreted as the conditional probability that an instance could satisfy the antecedent of one of the more specific rules, given that the instance satisfies the antecedent of the more general rule.

#### Semantic pre-processing

The choice between generalization (preferred) and specialization can be made by assuming a uniform distribution of support of rules containing dependent attributes that have a common father. The natural solution would be to use statistical rules to obtain the values outside of the "mean" range (i.e., "outliers"). It happens that the standard search for outliers is to look for values well above and well below expectations. In the case presented, the specific association rules with low support have to be pruned. Usually, the characterization of outliers is made for deviation greater than 1.5 or 2.0 standard deviations of the distribution. Experimentally, we determine a minimum standard deviation to consider the meaninglessness of outliers.

With the values obtained, we calculated the support standard deviation of the distribution of specific rules. If the ratio of the standard deviation and arithmetic mean of the distribution is below a specified threshold, the behavior of specific rules is regular, and these rules may be replaced by the general rule (and be pruned). Otherwise, there are singular specific rules that do not fit the general rule and cannot be pruned.

**Definition** (TRg). *Let D be a multidimensional database. Let R*
_{
i
}
*be an association rule in the form Y*
_{
i
}
*^ A ⇒ B, and S*
_{
i
} = {*r*
_{
ij
}|*j* = 1.. *n*} *a set of corresponding rules in the form X*
_{
ij
}
*^ A ⇒ B, obtained from D. Let x*
_{
k
}
*be the value of the support of a rule r*
_{
ij
}
*and μ the value of the arithmetic mean of the support of these rules. Let σ be the standard deviation support of the population of rules. The value of the measure TRg (formula 2) for R*
_{
i
}
*and S*
_{
i
}
*is given by the inverse of the coefficient of variation.*

\mathit{\text{TRg}}\left({R}_{i},{S}_{i}\right)=\frac{\mu}{\sigma ,}

(2)

Where

\begin{array}{l}\mu =\frac{{\displaystyle \sum _{k=1}^{n}{x}_{k}}}{n}\\ \sigma =\sqrt{\frac{1}{n}{\displaystyle \sum _{k=1}^{n}{\left({x}_{k}-\mu \right)}^{2}}}\end{array}

The greater the value of TRg, the less the average deviation of the support of specific rules in relation to the average, meaning the small relative importance of specific rules with high support (singular rules). When the distances of the support of rules are greater than the minimum threshold for consideration of the regular uniformity of the population (TrgMin), singular rules will be considered those rules for which the distance of their support relative to the average of supports of the distribution divided by the population standard deviation is greater than α × TRgMin (where α is an empirical coefficient that is a characteristic of the domain); i.e., those rules that respect the inequality.

\frac{{x}_{k}-\mu}{\sigma}>a*\mathit{\text{TrgMin}}

(3)

The subsets of rules that present the same consequent are generated by the algorithm described by Jager (2008). For each of these subsets, a set G of more general rules is generated. In a second stage, each general rule generated is analyzed to verify its generalization capacity.

The next step begins by generating the set E of more specific rules that are redundant with each general rule. According to the set of more specific rules, the value of measure TRg is calculated. If the value of the TRg measure is greater or equal to the minimum value specified by the user (TRgMin), the more specific rules are eliminated by way of a generalization process. If it is not, the process that is executed is the rule specialization with the elimination of the more general rule.