The present analysis focussed on highly accessed articles published in 5 arbitrarily selected open access oncology journals. It revealed interesting differences between these journals (0–19 case reports among the top 50 articles, 2–30 reviews), for example related to access and citation numbers. For each of the 5 journals and also all journals combined, the correlation between number of accesses and citations was poor to moderate (r = 0.01-0.30). Considerable variations were also observed when these analyses were restricted to specific article types such as reviews only (r = 0.21), research articles only (r = 0.34) or case reports only (r = 0.53). Even if year of publication was taken into account, high correlation coefficients were the exception from the rule. The following example illustrates these findings. Reviews published in the Journal of Hematology and Oncology during the year 2009 achieved a correlation coefficient of 0.78 whereas those published in Molecular Cancer during the year 2003 achieved −0.08. These results were surprising and in contrast to our expectations and initial findings from a preliminary analysis of the first author's open access publications (correlation coefficient 0.87). Possibly, correlations become weaker when analyses are focused on highly viewed articles rather than all articles published in a given open access journal. The interest of the readers (a heterogeneous group including for example practising oncologists, scientists, technicians, nurses, students and patients; open access without institutional subscription or fees) might not necessarily reflect the scientific impact of a given topic or practical implications of an unusual case, and the likelihood of citation in other articles (Kanaan et al. 2011). Citation frequency is also dependent on other factors including but not limited to number of authors and contributing institutions (Figg et al. 2006;Stringer et al. 2010). We are not aware of other analyses limited to the most viewed articles.
Potential limitations of the present study, aside from limiting the analyses to the most viewed articles, include the low numbers of articles in the different categories and years, and the low number of oncology journals, which are not fully representative of the broad field of cancer causes, epidemiology, research and treatment with all its different subspecialties.
Paiva et al. evaluated open access journals from the BMC and Public Library of Science (PLoS) publishing groups (all 6 PLoS journals, as well as the 6 best ranked and the 6 worst ranked BMC journals, according to Journal Citation Reports (JCR) 2010) (Paiva et al. 2012). None of the journals analysed in the present study was included. All original research articles published from September 1, 2008, to September 30, 2008, were analysed (not limited to oncology). Articles classified as review articles, case reports, commentaries, editorials, and letters to the editor were excluded from the analysis. The three-year period spanning from the article publication to the time of analysis was considered to be a sufficient amount of time to measure the impact of a specific article in the scientific community. The numbers of times the article was viewed at the publisher site, downloaded, and cited according to JCR Science Edition 2010 were collected for the period from December 6, 2011, to December 20, 2011. In total, 423 original research article titles were included in the analysis. The median number of views and citations were 2533 and 10, respectively (fewer than our data derived from top 50 publications). There was a positive correlation between the number of views and citations (r = 0.434, p < 0.001).
For the Journal of Vision (free access), comparable evaluations were performed (Watson, 2009). One comparison was between the total downloads and total citations. The correlation between these two quantities was 0.74, indicating a strong positive relationship. To neutralize the growth with age, they compared the total downloads and citations (as of July 1, 2008) for papers published in a given year. There was a strong positive correlation in each year, with a high of around 0.8 in 2003. Because of the lag between downloads and citations, one would not expect correlations to be as high for articles less than three years old. In articles at least three years old, the correlation was always above 0.6 (except for 2001, which was based on only 12 articles). This analysis indicated that download statistics provide a useful indicator, two years in advance, of eventual citations.
In contrast to these two studies, Schloegl and Gorraiz looked at oncology journals only (Schloegl & Gorraiz, 2010). None of the journals analysed in the present study was included. They identified a strong correlation between the citation frequencies and the number of downloads for their journal sample. The relationship was lower when performing the analysis on a paper by paper basis because of existing variances in the citation-download-ratio among articles. They computed Spearman rank correlation coefficients of 0.89 and twice 0.92 between the 2004 downloads and the particular 2004, 2005 and 2006 cites. The corresponding correlations between the downloads and citations of the years 2005 (n = 31) and 2006 (n = 33) were similar (between 0.9 and 0.92). Because of the big differences between downloads and citations especially in the publication year, a high correlation was not expected in 2006 (for instance 0.32 for Cancer Letters and 0.41 for Gynecologic Oncology).
Our own results derived from other oncology journals than those evaluated previously suggest that complex and variable relations exist between downloads and citations. We can not recommend a universal strategy that substitutes citation figures by downloads for the purpose of quantitative analyses.