Skip to main content

Table 1 The preliminary notations

From: Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents

Notations

Formula

Meaning

a

\(=Count(NG_{i}|C_{j})\)

The count of the N-Gram \(NG_{i}\) when it occurs in the documents of class \(C_{j}\)

b

\(=Count(\bar{NG}_{i}|C_{j})\)

The count of other the N-Grams \(\bar{NG}_{i}\) occurred in the documents of class \(C_{j}\)

c

\(=Count(NG_{i}|\bar{C}_{j})\)

The count of the N-Gram \(NG_{i}\) occurred in the documents of other classes \(\bar{C}_{j}\)

d

\(=Count(\bar{t}_{i}|\bar{C}_{j})\)

The count of other the N-Grams \(\bar{t}_{i}\) occurred in the documents of other classes \(\bar{C}_{j}\)

N

\(= (a+b+c+d)\)

The total number of N-Grams occurred the documents of all the classes

\(p(NG_{i})\)

\(=(a+c)/N\)

The probability of the N-Gram \(NG_{i}\)

\(p(\bar{NG}_{i})\)

\(=(b+d)/N\)

The probability of other the N-Grams \(\bar{NG}_{i}\)

\(p(C_{j})\)

\(=(a+b)/N\)

The probability of the class \(C_{j}\)

\(p(\bar{C}_{j})\)

\(=(c+d)/N\)

The probability of other classes \(\bar{C}_{j}\)

\(p(NG_{i},C_{j})\)

\(=a/N\)

The probability of the N-Gram \(NG_{i}\) for being in the class \(C_{j}\)

\(p(\bar{NG}_{i},C_{j})\)

\(=b/N\)

The probability of other N-Grams \(\bar{NG}_{i}\) for being in the class \(C_{j}\)

\(p(NG_{i},\bar{C}_{j})\)

\(=c/N\)

The probability of the N-Gram \(NG_{i}\) for being in other classes \(\bar{C}_{j}\)

\(p(\bar{NG}_{i},\bar{C}_{j})\)

\(=d/N\)

The probability of other N-Grams \(\bar{t}_{i}\) for being in other classes \(\bar{C}_{j}\)

\(p(NG_{i}|C_{j})\)

\(=a/(a+b)\)

The probability of the N-Gram \(NG_{i}\) when it co-occurs with class \(C_{j}\)

\(p(\bar{NG}_{i}|C_{j})\)

\(=b/(a+b)\)

The probability of other N-Grams \(\bar{t}_{i}\) when they co-occur with the class \(C_{j}\)

\(p(NG_{i}|\bar{C}_{j})\)

\(=c/(c+d)\)

The probability of the N-Gram \(NG_{i}\) when it co-occur with other classes \(\bar{C}_{j}\)

\(p(\bar{NG}_{i} | \bar{C}_{j})\)

\(=d/(c+d)\)

The probability of other N-Grams \(\bar{t}_{i}\) when they co-occur with other classes \(\bar{C}_{j}\)

\(p(C_{j}|NG_{i})\)

\(=a/(a+c)\)

The probability of class \(C_{j}\) when the N-Gram \(NG_{i}\) co-occurs with the class \(C_{j}\)

\(p(C_{j}|\bar{NG}_{i})\)

\(=b/(b+d)\)

The probability of the class \(C_{j}\) when other N-Grams \(\bar{NG}_{i}\) co-occur with class \(C_{j}\)

\(p(\bar{C}_{j}|NG_{i})\)

\(=c/(a+c)\)

The probability of other classes \(\bar{C}_{j}\) when the N-Gram \(NG_{i}\) co-occur with other classes \(\bar{C}_{j}\)

\(p(\bar{C}_{j}|\bar{NG}_{i})\)

\(=d/(b+d)\)

The probability of other classes \(\bar{C}_{j}\) when other N-Grams \(\bar{NG}_{i}\) co-occur with other classes \(\bar{C}_{j}\)