Skip to main content

Table 6 Details of the experimental datasets

From: Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents

S. No.

Dataset

Categories name

Total number of classes

1.

Movie review

pos, neg

2

2.

ACL IMDB large movie review

pos, neg

2

3.

20Newsgroup

talk.religion.misc, talk.politics.misc, alt.atheism, talk.politics.guns, talk.politics.mideast, comp.os.ms-windows.misc, comp.sys.mac.hardware, comp.graphics, misc.forsale, comp.sys.ibm.pc.hardware, sci.electronics, comp.windows.x, sci.space, rec.autos, sci.med, sci.crypt, rec.sport.baseball, rec.motorcycles, soc.religion.christian, rec.sport.hockey

20

4.

Reuters13

lei, housing, bop, wpi, retail, ipi, jobs, reserves, cpi, gnp, interest, trade, money-fx

13

5.

Ohsumed5

C01, C02, C03, C04, C05

5

6.

Ohsumed10

C01, C02, C03, C04, C05, C06, C07, C08, C09, C10

10

7.

Ohsumed15

C01, C02, C03, C04, C05, C06, C07, C08, C09, C10, C11, C12, C13, C14, C15

15

8.

Ohsumed23

C01, C02, C03, C04, C05, C06, C07, C08, C09, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23

23

9.

Pubmed9

bird flu, swine flu, proteins, cancer, Bacterial Pneumonia, Fungal Pneumonia, Viral Pneumonia, Idiopathic interstitial pneumonia, Legionnaires

9

10.

BBC

business, entertainment, politics, sport, tech

5

11.

BBC_Sports

athletics, cricket, football, rugby, tennis

5