Skip to main content

Table 3 Test collection characteristics

From: Econo-ESA in semantic text similarity

Dataset

Docs.

Qys.

Rel.

Document terms

    

Min.

Q1

Med.

Q3

Max.

LISA

6,004

35

335

11

68

96

128.25

352

NPL

11,429

93

2,083

3

25

39

58

293

CACM

3,204

64

796

3

10

23

108

455

CISI

1,460

112

3,114

13

97

137

186

676

Cranfield

1,400

225

1,838

1

113

165

241.25

738

Time

423

83

324

91

399

612

918

6,618

Medline

1,033

30

696

24

107

159

226

758

ADI

82

35

170

28

60.25

70.5

80

216

Query terms

Explanation

   

Min.

Q1

Med.

Q3

Max.

    

23

49.5

64

85

142

Abstracts collection

   

4

9

12

15

24

Short text

   

3

8.75

16

30

62

CACM articles index

   

4

20

72

122.75

335

Index of articles

   

6

12

16

21

43

Index of articles

   

8

15

20

23.5

46

Short text

   

3

9.25

16.5

23.75

60

Medical text

   

4

8

13

21.5

57

Short articles

  Â