Demand spillovers of smash-hit papers: evidence from the ‘Male Organ Incident’
- Otto Kässi^{1}Email author and
- Tatu Westling^{1}
https://doi.org/10.1186/2193-1801-2-168
© Kässi and Westling; licensee Springer. 2013
Received: 12 October 2012
Accepted: 15 February 2013
Published: 17 April 2013
Abstract
This study explores the short-run spillover effects of popular research papers. We consider the publicity of ‘Male Organ and Economic Growth: Does Size Matter?’ as an exogenous shock to economics discussion paper demand, a natural experiment of a sort. In particular, we analyze how the very substantial visibility influenced the downloads of Helsinki Center of Economic Research discussion papers. Difference in differences and regression discontinuity analysis are conducted to elicit the spillover patterns. This study finds that the spillover effect to average economics paper demand is positive and statistically significant. It seems that hit papers increase the exposure of previously less downloaded papers. We find that part of the spillover effect could be attributable to Internet search engines’ influence on browsing behavior. Conforming to expected patterns, papers residing on the same web page as the hit paper evidence very significant increases in downloads which also supports the spillover thesis.
JEL Classification
A11, C21
MSC Classification
97K80
Keywords
Scholarly spillover Media Blogs Downloads Natural experiment Difference in differences Regression discontinuity designIntroduction
Economics research papers seldom make headlines. As infrequently do they permeate the online community beyond the academic confines. Something quite unlike was on display in July 2011 following the publication of ‘Male Organ and Economic Growth: Does Size Matter?’ [henceforth MOEG] (Westling 2011) which explored the link between penile length and economic growth^{a}. In the weeks that followed it amassed some 175000 downloads and a global coverage in print, television and online media. Tim Harford of Financial Times dubbed it the ‘smash-hit economics research paper of the summer’. Arguably the whole incident with all its publicity was completely unanticipated, and came as a surprise to everyone involved. If nothing else, an attractive natural experiment came into being.
Events such as this can be viewed from many angles and disciplines. One intriguing facet is the scholarly visibility that ensued. In particular, it is tempting to speculate whether such ‘hit papers’ generate wider interest on research that emanates from the same institution. At least three motivations are clear. First, academic spillover effects can reveal something about the fabric of scholarly discourse. Second, substantial visibility externalities could alter attractiveness of different publication channels. Third, the incident itself speaks volumes about the impact Internet already has in the academic sphere. On the other hand, the natural experiment character of the event has an obvious appeal from the methodological perspective.
The objective of this study is to explore scholarly spillover effects. Namely, we analyze the download data of Helsinki Center of Economic Research [henceforth HECER] to estimate the impact of the ’male organ incident’ on the short-run demand for the institution’s economics discussion papers – a scholarly spillover effect, if that term is appropriate. The demand shock can be considered exogenous, and indeed the whole incident resembles a natural experiment. The context, therefore, is an attractive venue for causal inference. In a note of caution, however, we remind that the purpose of this study is to only explore the immediate short-run effects. Hence the existence and magnitude of the long-run demand effects remain obscure.
The economic role of blogs in dissemination of research papers is discussed convincingly in McKenzie and Özler (2011). They find very significant peaks in RePEc^{b} visibility [abstracts views and paper downloads] following papers’ coverage in the most influential blogs. Regarding spillover effects, the literature provides supportive findings. In medical research the publicity in the popular media increases citations substantially (Phillips et al. 1991). Somewhat related analyses, concentrating on economics literature, are also provided in Pieters and Baumgartner (2002) and Brown (2003). Ellison (2011) discusses the role of Internet in academic publishing and contains many references of related themes.
Much of the existing literature focuses on the long-term effects, which, of course, might be more important than the immediate impact. Nevertheless, we view the very short-term effects interesting as well. Our view is supported by findings in Edelman and Larkin (2009) who show that researchers seem to manipulate SSRN download statistics to boost their own papers’ visibility.
In this study two datasets and methods are utilized. First, we use a [public] monthly server log that captures itemized download rates for each paper. It contains most research paper series at the University of Helsinki, and hence we are able to form control groups to capture any time fixed effects [FEs]. The data spans a period of 15 months from May 2010 to July 2011. As MOEG went online on the 11th of July and the download activity was at its most intense in the following three weeks, the July data captures the vast majority of the short-term spillover effects. Second, we analyse the raw download log of July 2011. It contains very detailed information of all economics papers’ downloads, and allows us to construct time series of the patterns.
Regression estimations based on difference in differences [DID] methods support the spillover hypothesis. When comparing the downloads in June and July 2011 and allowing for paper FEs, the hit paper effect was positive but not statistically significant – MOEG was found to increase the average downloads of HECER discussion papers by 2 in July. However, when the probability of a paper being downloaded at least once is being looked at, the spillover effect is statistically significant. A hit paper increases this probability by 11%. It thus seems that previously less frequently downloaded papers reap most benefits from the spillover effects. One interpretation is that hit papers broaden institutions’ audience.
Analysis based on regression discontinuity design [RDD] corroborates with previous findings. Depending on specification, MOEG is found to increase the average monthly downloads of economics papers by 0.5 to 1.5. Despite a different estimation method and data, the figure approximates those obtained by DID.
We present evidence that browsing via Internet search engines might capture part of the spillover effect. In fact this study documents a substantial increase in the downloads of papers that appeared on the same web page as MOEG through July. The 4 papers on the same web page experienced an increase of 6 monthly downloads, which is significant at the 5% level. RDD analysis yields similar conclusions: residing on the same web page increase monthly downloads by 6.2 to 7.2.
Quite confidently we conclude that MOEG creates positive spillover effects. The magnitudes might be quantitatively modest but qualitatively interesting nevertheless. However, the 4 papers on the same web page experienced substantial spillover effects.
This paper proceeds as follows. Section ‘Data and estimation’ describes the data and estimation procedures. Section ‘Results’ presents results and section ‘Conclusions’ concludes. The tables and figures are included in the Appendix.
Data and estimation
The aggregated monthly data is based on library’s public server logs which capture all downloads at a specified time interval^{c} at the University of Helsinki. In total 15 months of data is available. However, due to addition of new papers we mostly use data from June and July 2011. This ensures that the samples of papers in adjacent months are almost equivalent. Concentrating only on two months of data also reduces problems related to autocorrelated error terms which may severely bias our standard error estimates (Bertrand et al. 2004).
Monthly downloads in June and July 2011, HECER and control groups
June | July | |
---|---|---|
HECER (n=335) | ||
25th percentile | 1 | 3 |
Median | 2 | 4 |
75th percentile | 4 | 5.5 |
Average | 3.0 | 4.7 |
Humanities (n=93) | ||
25th percentile | 0 | 0 |
Median | 2 | 4 |
75th percentile | 4 | 6 |
Average | 8.2 | 7.3 |
Natural sciences (n=870) | ||
25th percentile | 0 | 0 |
Median | 0 | 3 |
75th percentile | 1 | 5 |
Average | 1.6 | 3.5 |
To explore the spillover effects we employ data of economics discussion papers. As control groups the downloads of natural sciences and humanities papers are used.^{d} If MOEG has a positive demand effect on the control groups, the estimates are biased. In this case the spillover effect on economics papers would be underestimated. However, we find it unlikely that one paper could increase the demand for papers in the fields beyond its own even within the same university. Hence we assume that the spillover is field- but not institution-specific.
The raw download log contains detailed information of all items in the economics discussion paper series for July 2011. Unfortunately for us, the log file does not include HTTP Referer codes which would be needed to study geographical distribution of downloads. Therefore we only have access to date and time of all downloads. To construct time series for each paper, we aggregate the itemized downloads at the day level.
In addition, we attempt to clean downloads by Internet search engines and crawlers. For this, we use the browser field. This is not a complete solution; no amount of clean-up can assure that all crawler related downloads are identified and deleted. Hence to a limited extent they can interfere with our results.
Between-month estimation
where Q_{i, t}is the number of monthly downloads, ECON_{ i } is a dummy for inclusion in the HECER series, MOEG_{i, t} is the treatment and i ∈ {1,…, N} denotes individual papers. The parameter of interest is γ which identifies the average treatment effect on the paper demand. In order to ensure that paper specific unobservables do not drive our results, we also estimate (1) using paper FEs.^{f}
where notation is same as above. In the baseline specification k = 0, which is used to estimate the spillover effect’s tendency to change the probability that a given paper is downloaded at least once. Subsequently different values of k > 0 are employed to determine the cut-off point at which spillover effects are still observable. The parameter of interest is again γ.
Within-month estimation
In our baseline model, X includes only a constant and weekend dummies. Our time trend specification supplements the baseline model with a third degree polynomial time trend interacted with the treatment dummy.^{i} Our third alternative specification includes paper fixed effects and weekend dummies.
The parameters of interest are γ and η, and the former captures the effect of MOEG on the average paper downloads. The latter is the treatment for the 4 papers on the same web page. The treatment MOEG takes place on 15th July, and corresponds to its appearance in Marginal Revolution and Freakonomics. Weekend dummies capture the substantial within-week download volatility.
Due to the rotational behavior of Earth and Helsinki’s location at the GMT+2 time zone, some [local time] Friday [Monday] downloads from Western [Eastern] Hemisphere are recorded at weekends. However, we postulate that these errors largely cancel each other out, and hence that our spillover estimates are insulated by orbital factors.
Results
We first describe the download profile of MOEG through July 2011. Subsequently OLS regression estimates with DID and RDD specifications are presented. Then the role of Internet search engines is briefly discussed.
Spillover effect
DID estimates on monthly downloads and monthly downloads exceeding zero as dependent variables
Dep. variable | Monthly dls | P[ Monthly dls > 0] | ||
---|---|---|---|---|
Model spec. | Pooled OLS DID | FE DID | Pooled OLS DID | FE DID |
Constant | 6.059*** | 0.448*** | ||
(1.062) | (0.016) | |||
ECON | -3.002. | 0.444*** | ||
(1.082) | (0.024) | |||
MOEG | -0.334 | -0.334 | -0.004 | -0.004 |
(1.263) | (1.087) | (0.017) | (0.023) | |
ECON*MOEG | 1.994 | 2.018 | 0.112*** | 0.113*** |
(1.280) | (1.811) | (0.023) | (0.034) | |
Fixed effects | No | Yes | No | Yes |
R ^{2} | 0.0003 | 0.16 | 0.20 | 0.55 |
N | 2525 | 2525 | 2525 | 2525 |
Controlling for the paper FEs does not materially change the estimate of γ. This supports our assumption that the treatment indeed was exogenous and our observations are not driven by paper unobservables.
As can be seen from Table 2, the coefficient of counter-factual [MOEG] at -0.334 is not signicantly different from zero. This suggests that the spillover effect has not contaminated the control groups and verifies our prior that the spillover is field- and not institution-specific.
We are also interested in the broader impact of the hit paper effect. Namely, here the objective is to abstract away the high demand for certain particular papers – which are unlikely to be driven by spillovers – to look whether the majority of papers experience positive demand effects. This is motivated by the fact that idiosyncratic shocks can substantially change demand for very few individual papers. Indeed, the last two columns in Table 2 provide support for the idea that hit papers can increase demand for previously less downloaded papers. MOEG increases the probability that any paper is downloaded during a month by 11%. This coefficient is significant on at least 1% level in both DID specifications. Again paper FEs do not have impact on the qualitative results.
DID estimates on the probability on monthly downloads exceeding k
Cut-off | Spillover effect |
---|---|
P[Monthly dls > 5] | 0.092** |
(0.030) | |
P[Monthly dls > 10] | 0.005 |
(0.017) | |
P[Monthly dls > 15] | 0.0002 |
(0.014) |
RDD estimates on daily downloads in July 2011
Dep. variable | Daily downloads | ||
---|---|---|---|
Bandwidth | ±15 | ± 10 | ±5 |
Baseline | |||
MOEG | 0.021* | 0.012 | 0.043*** |
(0.009) | (0.010) | (0.014) | |
MOEG*PG | 0.510*** | 0.427*** | 0.366*** |
(0.098) | (0.067) | (0.141) | |
Polynomial time trend | |||
MOEG | 0.090*** | 0.073*** | 0.090* |
(0.017) | (0.019) | (0.037) | |
MOEG*PG | 0.510*** | 0.427*** | 0.366*** |
(0.012) | (0.068) | (0.058) | |
Article FE | |||
MOEG | 0.021* | 0.012 | 0.043** |
(0.018) | (0.021) | (0.013) | |
MOEG*PG | 0.511*** | 0.427*** | 0.366** |
(0.012) | (0.069) | (0.060) | |
Robustness check | |||
Quasi treatment | 5th July | 25th July | |
MOEG | -0.047. | 0.062*** | |
(0.031) | (0.016) | ||
MOEG*PG | 0.062 | 0.520*** | |
(0.174) | (0.168) |
Quasi treatments are provided as robustness checks. The robustness checks are roughly in line with our main findings: a quasi treatment prior to publication of MOEG is negative and statistically significantly different from zero, but only at 10% level. We suspect that this negative value might be related to a holiday effect: 5th of July in Finland overlaps with July 4th in the United States, which is a national holiday. The later quasi treatment coefficients are largely aligned with our main estimates. We find this reassuring since the spillover effect is likely to exhibit persistence.
Search engines
DID estimates on downloads between the first and second month after submission
textitDep. variable | Monthly downloads | |
---|---|---|
Model spec. | Pooled OLS DID | FE DID |
Constant | 4.153*** | |
(0.316) | ||
SECOND MONTH | -2.255*** | -2.283 |
(0.448) | (0.339) | |
PG | 10.596*** | |
(2.84) | ||
SECOND MONTH*PG | 6.005 | 6.033* |
(4.016) | (3.034) | |
Fixed effects | No | Yes |
R ^{2} | 0.097 | 0.485 |
N | 642 | 642 |
Analysis with the RDD specification supports previous findings, and is presented in Table 4. It shows that appearing on the same web page increases the daily downloads by 0.365 to 0.51 on average. Translated to monthly figures these correspond to an increase of 6.2 to 7.2 downloads. In the absence of other major exogenous changes – beyond reasonable doubt, that is – we attribute this level shift to the visibility of MOEG. Hence the RDD results presented here support both theses, namely that hit papers generate spillovers and that part of it is driven by search behavior.
Conclusions
This paper presents evidence of hit papers’ spillover effects by utilizing the demand shock from ‘Male Organ and Economic Growth: Does Size Matter?’ (Westling 2011) as a natural experiment. The paper garnered some 175000 downloads in just three weeks on July 2011, which is a staggering figure by University of Helsinki standards. We explore how the event changed the download patterns of economics research papers at the institution. For robustness, the estimations are conducted both with monthly and daily data, and by utilizing two different analysis methods, namely DID and RDD.
Reflecting on the findings with both approaches, the spillover thesis seems quite robust. Notwithstanding some caveats, it looks as hit papers could increase the demand for research in the short-run. We stress that only RDD results are invariably statistically significant. Depending on the method, the ‘male organ incident’ seems to increase the average monthly downloads by 1.2 to 2. However, the probability that a paper is downloaded at least once during the month increases convincingly, by 11% – hit papers, therefore, entail demand for the previously less exposed research. We interpret this as evidence of a ‘Matthew Effect in Science’, which simply states that high research visibility tends to cumulate to same people and institutions (Merton 1968).
By far the most credible evidence of the spillover effect comes from the within-month analysis. In particular we refer to the papers residing on the same page as MOEG. In monthly figures the incremental downloads reach 6.2 to 7.2. Considerable amount of scepticism is needed to attribute this to chance. We stress, however, that our measures capture only short-term spillover effects.
It must be admitted, though, that the magnitude of the spillover effect is quite modest. Significant amount of publicity is required to generate even a small amount of demand. The numbers imply that on average 0.4% of the 175 000 visitors download research beyond the hit paper. Furthermore, without more detailed log data there is no way to tell how the views are distributed between visitors. On the other hand, the figure of 0.4% might represent the lower bound since only a minor share of the visitors used search engines to locate the paper. Apparently the vast majority came through the direct links of file appearing on blogs and other web pages. The findings presented here lend anecdotal if quite irrefutable support for the prominence of blogs in dissemination of papers, and hence corroborates with the results in McKenzie and Özler (2011). Blogs do matter.
Most importantly, the external validity of the results is somewhat ambiguous. Almost by definition the emergence of a hit paper is a unique event and driven by peculiar circumstances. Whether prospective events yield similar patterns, remains thus unknown.
Endnotes
^{a} Full disclosure: one author of this paper is the author of MOEG.
^{b} Research Papers in Economics
^{c} The data is available online at https://helda.helsinki.fi/simplestats/front.
^{d} We have chosen humanities and natural sciences as our control group because 1) they are the two largest groups and 2) similar to the economics working paper series the two working paper series consist of preliminary research papers written in English.
^{e} For a text book discussion of DID, see Cameron and Trivedi (2005) pp. 55–57.
^{f} We have also experimented using humanities and science working papers as controls separately. This does not affect the estimates.
^{g} For a text book discussion of RDD see Cameron and Trivedi (2005), pp. 879–893.
^{h} We also estimated a separate model for the 4 papers on the MOEG web page. However, due to the small sample the parameter estimates were insignificant.
^{i} The order of the polynomial is chosen by the Akaike Information Criterion.
^{j} In McKenzie and Özler (2011) Freakonomics, Marginal Revolution, Greg Mankiw, Paul Krugman, The New York Times Economix blog, Dani Rodrik, Chris Blattman and Aid Watch are considered the most influential blogs.
^{k} The average treatment effect on monthly downloads is obtained by multiplying the daily figures by 17. This is the number of days after the publication of MOEG in July.
^{l} The download statistics are aggregated at calendar months. Due to different submission dates within months, the data can be quite noisy. Hence the first month downloads on average represent only 15 days of downloads. However, DID specification accounts for this. We also dropped observations with submission dates on December 2010 and January 2011 since search engines and/or backup procedures added exactly 20 downloads to all papers on the latter month. This January peak can be observed in all papers irrespective of the field or series.
Declarations
Acknowledgements
The term ‘smash-hit economics research paper of the summer’ was coined by Tim Harford in his column ‘Dubious data cut down to size’ in Financial Times on August 5th 2011. We thank Joonas Kesäniemi of Helsinki University Library for providing us the download data. We are grateful to Jani-Petri Laamanen and the participants of FEA 2012 meeting for useful comments. All errors are our own.
Authors’ Affiliations
References
- Bertrand M, Duflo E, Mullainathan S: How much should we trust differences-in-differences estimates? Q J Econ 2004, 119(1):249-275. 10.1162/003355304772839588View ArticleGoogle Scholar
- Brown L: Ranking journals using social science research network downloads. Rev Quant Finance Account 2003, 20(3):291-307. 10.1023/A:1023628613622View ArticleGoogle Scholar
- Cameron AC, Trivedi Pravin K: Microeconometrics: methods and applications. Cambridge University Press, New York; 2005.View ArticleGoogle Scholar
- Edelman BG, Larkin I: Demographics, career concerns or social comparison: Who games SSRN download counts? Harvard Business School NOM Unit Working Paper, no. 096. Harvard Business School. 2009.Google Scholar
- Ellison G: Is peer review in decline. Econ Inq 2011, 49(3):1465-7295.View ArticleGoogle Scholar
- McKenzie D, Özler Berk: The impact of economics blogs. World, Bank Policy ResearchWorking Paper no. 5783. World Bank Development Research Group. 2011.Google Scholar
- Merton R: The matthew effect in science. Science 1968, 159(3910):56-63.View ArticleGoogle Scholar
- Bednarczyk B, Kanter E, Phillips D, Tastad P: Importance of the lay press in the transmission of medical knowledge to the scientific community. New England J Med 1991, 325(16):1180-1183. 10.1056/NEJM199110173251620View ArticleGoogle Scholar
- Baumgartner H, Pieters R: Who talks to whom? intra- and interdisciplinary communication of economics journals. J Econ Lit 2002, 40(2):483-509. 10.1257/002205102320161348View ArticleGoogle Scholar
- Westling T: Male organ and economic growth: Does size matter? Helsinki, Center of Economic Research Discussion Paper, no. 335. HECER. 2011.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.