Underpricing, underperformance and overreaction in initial public offerings: Evidence from investor attention using online searches

Online activity of Internet users has proven very useful in modeling various phenomena across a wide range of scientific disciplines. In our study, we focus on two stylized facts or puzzles surrounding the initial public offerings (IPOs) – the underpricing and the long-term underperformance. Using the Internet searches on Google, we proxy the investor attention before and during the day of the offering to show that the high attention IPOs have different characteristics than the low attention ones. After controlling for various effects, we show that investor attention still remains a strong component of the high initial returns (the underpricing), primarily for the high sentiment periods. Moreover, we demonstrate that the investor attention partially explains the overoptimistic market reaction and thus also a part of the long-term underperformance.


Introduction
The Internet, a revolutionary invention from 1965 with more than two billion users by 2014, has undoubtedly changed the world we live in. It allows its users to access an unprecedented amount of information in a very short time. Due to the abundance of available information, attention has become a scarce resource that needs to be efficiently allocated in order to acquire the information of interest. For a vast majority of Internet users, search engines serve as a gateway to all that information, and Google, with its dominant market share and more than one billion unique visitors every month, is their uncrowned king. Such an online behavior leaves a digital trace. All individual search queries that have been typed into the search bar are stored by Google and the processed statistics on searches are made publicly available by the Coventry, UK Full list of author information is available at the end of the article company via its Google Trends facility. The Google search volume databank thus provides a direct measure of attention which is freely available, timely and representative to the whole population of Internet users. Such an extreme potential of Internet search data has been put into practice and it is now being used for tracking or even anticipating various social phenomena. The utilization ranges from influenza tracking (Dugas et al. 2012;Ginsberg et al. 2008), consumer interest and its impact on product sales (Choi and Varian 2009;Goel et al. 2010;Kulkarni 2012) to macroeconomic indicators (Askitas and Zimmermann 2009;Cooper et al. 2005;Preis et al. 2010). The work of Merton (1987) suggests that attention may be also relevant for the complex reality of financial markets and Preis et al. (2008) are among the first ones to support this hypothesis using the web search data to proxy attention. Since then, many researchers have used online attention to either track, nowcast or forecast various financial indicators. Here, we utilize Google searches to help us explain two stylized facts of the initial public offerings (IPOs) -the long-term underperformance and the high initial returns, also known as the IPO underpricing.
The long-term underperformance (i.e. an inferior performance to non-issuing firms) is arguably the most attractive area of the IPO academic research. Stern and Borenstein (1985) show that the issuing firms underperform the S&P 500 index by 22% in the long-term. The underperformance has been confirmed by several studies (Ritter 1991;Spiess and Affleck-Graves 1995), most notably by Loughran and Ritter (1995) who labelled the long-term performance of the newly issued stocks as a puzzle. Its existence has been questioned by various studies. Brav et al. (2000) report that the underperformance disappears when the benchmarks are matched on firm size and book-to-market ratios. Conversely, Eckbo and Noril (2000) attribute the potential underperformance to a lower risk of the IPO stocks, providing evidence that the issuing companies have lower leverage ratios and higher liquidity than the matched firms in years following the IPO. After controlling for the additional risk of peer companies, the authors do not reject the hypothesis of zero abnormal returns of the IPO stocks. Ritter and Welch (2002), in their comprehensive review of the IPO related literature, argue that the benchmarking of the long-term performance of IPOs is highly sensitive to an employed methodology as well as to the choice of a sample period. In addition, they note that despite the similar (unappealing) performance of issuers and their peers with comparable characteristics, the equally weighted post-IPO returns still underperform market indices.
The existence of the second IPO stylized fact -underpricing -is rather indisputable. Ritter and Welch (2002) report that the average difference between the offer price and the first day closing price was 18.8% for the US issuers between 1980 and 2001. Furthermore, there was a positive price change for 70% of the issuing firms, while negative initial return was exhibited only by 14% of the IPOs. The reason why the issuing firms leave money on the table remains unclear here. This is further studied by Ritter and Welch (2002) who offer a wide variety of explanations based on both symmetric and asymmetric information arguments. The most promising stream of literature struggling to explain the underpricing seems to be focused on the behavioral side of investors. Ritter (1991) sheds some light on the topic by pointing out that investors tend to be periodically overoptimistic about the potential of issuing firms, and that the firms take an advantage of it by timing the offerings correspondingly. Loughran and Ritter (1995) provide some support to the hypothesis by showing the first day returns are significantly higher following the periods when the market has grown. In line with the investor sentiment theory, it has been shown that the underpricing is positively associated with news and non-lead analyst research coverage of IPOs (Aggarwal et al. 2002;Demers and Lewellen 2003). Ljungqvist et al. (2006) and Derrien (2005) offer theoretical models for the IPO pricing and initial returns in the presence of investor sentiment. The former study (Ljungqvist et al. 2006) builds its model on the assumption of budget-constrained sentiment investors who cannot buy the entire IPO. Therefore, the firms must set the offer price below the level noise traders are willing to pay in order to induce rational investors to participate. The latter study (Derrien 2005), on the other hand, highlights the assumption that "aftermarket price support is costly for the underwriter" [ (Derrien 2005) p. 490]. While the models are different in construction, their predictions are rather similar. They predict the high underpricing in presence of high investor sentiment and consequently the poor long-term performance. Derrien aptly notes that it is not the firms who leaves the money on table but rather "the overoptimistic noise traders who pay excessive prices for IPO shares on the aftermarket" [ (Derrien 2005) The empirical evidence favors these models. Cook et al. (2006) reveal that underwriters promote IPOs in order to induce the sentiment investors into the market. It has also been reported that sentiment influences the initial pricing and that underwriters do not base their valuation solely on fundamentals and comparable valuation. The higher initial returns of IPOs that exhibited an above average abnormal attention (measured by Google search volume) and subsequent return reversal of such stocks in the longterm form the most notable empirical validation of the sentiment theories (Da et al. 2011). Here, we focus on these two IPO stylized facts in the USA between 2004 and 2010. As the measure of attention, we utilize search queries provided by Google and we examine whether such attention can be used to explain and describe the IPO underpricing and long-term underperformance.

Variables construction
Studying the two stylized facts about IPOs stems in defining two types of returns -an initial return and a long-term return. We define the initial return (which we also refer to as a first day return or we abbreviate it as IR) as where P Close i and P Offer i refer to the closing price on the first day of trading and the offering price, respectively, for the IPO i. The long-term cumulative logarithmic return is defined as where t either refers to the closing price on the first day of trading or the closing price one month after IPO, and k is equal to either 91, 183 or 366 days, depending on the used definition of the long-term. The two starting dates are considered to control for a potential immediate drop in price after the first day of trading. For the Google search volume (usually referred to as GSV in the literature), we utilize the daily statistics provided by the Google Trends database. Google provides GSV as a normalized measure of online searches and as such, the value shows the changes in proportion of the given searched term in the whole sample of searches rather than dynamics of the searches themselves. Again in correspondence to the standards in the literature, we utilize the abnormal GSV usually labeled as ASVI (Abnormal Search Volume Index) which is defined as a logarithmic deviation of the actual logarithmic GSV from the logarithm of the median GSV over a specific time period. In our application, we use the median period of the last 26 trading days a . Therefore, if we refer to GSV in the text, it represents the original Google search queries, and ASVI stands for the logarithmic deviation from the 26-day median value.

Dataset
We use the firm database of emerging growth IPOs (Kenney and Patton 2013) to identify firms going public between years 2004 and 2010. The database contains a complete list of emerging growth firms going public at the US exchanges between 1990 and 2010. We limit ourselves to the period between 2004 and 2010 due to the Google searches data span which starts in 2004. The complete list of variables can be found in the respective guide written by its authors b .
The database excludes the following types of firms and filings from the Thomson Financial Venture Expert, SDC data and other comprehensive lists of IPOs: mutual funds, real estate investment trusts (REITs), asset acquisition or blank check companies, foreign F-1 filers, and all spin-offs and other firms that are not true emerging growth firms (Da et al. 2011).
We use all the companies included in the Kenney-Patton database that went public between years 2004 and 2010, with the exception of the unit offerings and one firm that went public on the OTC (over the counter) market. This encompasses 547 companies in total. For the identification of relevant search queries, we follow the steps of Bank et al. (2011) and Vlastakis and Markellos (2012). The complete list of search terms is available from the authors upon request. Out of the 547 companies, the daily data were available only for 75 of them c . Using the daily rather than weekly data thus comes at a cost. However, the frequency of missing values is comparable with other studies (Da et al. 2011) considering the additional information value provided by higher frequency of the series.
The IPOs database (Kenney and Patton 2013) does not contain data on the post-IPO performance. Therefore, the financial data on the first day closing prices come from SCOOP Track Record from 2000 to Present IPO database d , which has been checked against data from Yahoo! Finance, Google Finance, NASDAQ web site database and IPO news coverage. For the long-term performance, the data availability is also poor as some of the companies have been already acquired, merged or delisted, and therefore do not appear in the freely available databases anymore. Thus, we utilize the Quantshare Trading Software e , or more specifically the Historical EOD data Downloader for Delisted/Bankrupt Stocks plug-in f for such stocks. When possible, these have been again checked against the SCOOP Track Record database, Yahoo! Finance, Google Finance, NASDAQ web site and news coverage for comparison. The final IPO data set contains search volumes and stock prices for 75 firms, even though long-term cumulative returns are available only for 62 firms. Table 1 lists and describes all variables used in the computational sections for the IPO data set.

Regression analysis
The IPO regressions are all estimated by the crosssectional ordinary least squares (OLS) procedure. We perform a widely applied methodology to test for the OLS assumptions. First, the presence of heteroskedasticity is tested by the Breusch and Pagan test (1979) and the White test (1980). No severe heteroskedasticity is detected in the sample. However, if any of the tests suggest presence of mild heteroskedasticity, White's heteroskedasticity consistent standard errors are used (White 1980). Second, the existence of multicollinearity is tested by the variance inflation factors. Last, the normality of residuals is tested by the Shapiro-Wilk test (Shapiro and Wilk 1964). When the Shapiro-Wilk test suggests the residuals are non-normally distributed, we use bootstrapping (1000 replications) procedure to estimate the t-statistics and p-values.

Results
We study 75 initial public offerings, which took place in the USA between 2004 and 2010, based on the Kenney-Patton database (Kenney and Patton 2013). As a measure of investor attention, we utilize Google searches provided by the Google Trends database g . For more details about the dataset selection process and variable construction, please refer to the Methods/Data section. Basic descriptives statistics are provided in Table 2. The initial returns are on average positive, positively skewed and fat-tailed, strongly rejecting normality. The long-term returns show opposite statistics with a negative mean and longer left tail, again strongly rejecting normality. These findings are independent of the long-term return definition. We thus observe a reversal between initial and long-term returns, at least on average. More detailed examination is provided

TD i
True discount of IPO defined as in Ma and Tsai (2002). TD = Pe−Po Po where P o is the offering price and P e is the so-called equilibrium price -in our case the average price between t + 150 and t + 180, where t is the IPO date

MR i
Market reaction to IPO defined as in Ma and Tsai (2002).
where P o is the offering price, P m is the first day closing price and P e is the socalled equilibrium price -in this case the average price between t + 150 and t + 180, where t is the IPO date

Sentiment
Monthly time-varying aggregate market sentiment orthogonalized with respect to a set of macroeconomic conditions developed by Baker and Wurgler (2006) Sentiment Month on month difference in time-varying aggregate market sentiment orthogonalized with respect to a set of macroeconomic conditions developed by Baker and Wurgler (2006) in the following text. The true discount is on average positive and the market reaction is very close to zero. And the offering size varies strongly across the examined IPOs.
To illustrate the importance and potential usefulness of the Google searches in the IPO setting, we start with the average dynamics of the Google Search Volume (GSV ) before IPO takes place. Figure 1 shows the average GSV for the studied 75 IPOs together with the 95% confidence intervals. The dynamics up to 30 days before IPO takes place is presented. We can see that the investor attention starts rising around 5 five days prior to IPO. This strongly justifies using daily data in the IPO analysis contrary to the standardly used weekly frequency. We now focus on the two IPO stylized facts -the high initial returns and the long-term underperformance.

Initial returns
We analyze whether the search volume brings some information or predictive power regarding the IPO first day return, which is labelled as IR in the following text. The investor sentiment theory (Aggarwal et al. 2002;Demers and Lewellen 2003;Loughran and Ritter 2002) states that the initial returns tend to be higher in periods of positive sentiment. Da et al. (2011) argue that the investor sentiment attention are closely related for retail investors as these are prone to sentiment while attention is a necessary condition for sentiment. Nonetheless, we measure the effect of both attention (firm specific) and sentiment (market level) on the first day returns.
Before proceeding to the regression analysis, we examine the relationship between the initial returns and investor attention on a basic level. We divide the firms from the sample into three groups based on their ASVI values (Abnormal Search Volume Index, see the Methods/Data section for more details) prior IPO -high, medium and low attention groups -based on quantiles.
The results show that the high attention group's average initial return is 22.85%, while the low attention group's initial return only equals to 12.23%. The difference is statistically significant at 5%. Thus, the first look at the data suggests that investor attention, very likely, drives the first day returns up. Relationship between the initial return IR and the investor attention ASVI is estimated via the following model in order to estimate how an increase in attention prior IPO influences the size of the initial return in more detail.
CON represents a set of control variables, specifically the offering size and investor sentiment (both in levels and a change to previous month). Table 3 provides the results. Column (1) shows that the steeper the increase in attention prior to the IPO is, the higher the corresponding initial returns are. The effect is highly significant and has a notable size -a standard deviation increment in ASVI leads to an increase in initial return by a magnitude of 41.4% of its standard deviation. Columns (2) to (9), which display the results of the robust-check regressions, suggest that neither the offering size nor the investor sentiment (both in levels and changes from the previous month level) are able to predict initial returns. The insignificance of the offering size variable is in contradiction with results of Da et al. (2011), who used IPO data set with 185 firms that went public between 2004 and 2007. Thus it seems that the offering size effect over the initial return largely depends on a selected sample of firms as well as quality and availability of the Google data, which are increasing in time. The authors have also found the change in investor sentiment modestly significant (at 10% level), which is not significant in our results either.
To test the sentiment hypothesis, we construct dummy variables for positive, normal and negative values of sentiment and use them in the interaction with ASVI in regressions (10) to (13) in Table 3. The results show that attention significantly increases initial returns only in positive sentiment periods. For the negative and normal sentiment times, attention boosts initial returns as well, albeit the effect is not significant. Nevertheless, the difference between the three coefficients in (13) is insignificant  The IPO first day return IR i is the dependent variable in each regression. IR i and the independent variables are defined in Table 1. *, **, and *** represent significance at the 10%, 5%, and 1% level, respectively, standard errors are shown in the parentheses. N is the number of observations. when tested by F-test. In addition, regressions (11) and (12) show that the results are robust if one controls for the original sentiment measures.

Long-term returns
We now approach the second stylized fact about IPOsthe long-term underpricing of the IPO firms compared to their already traded peers. The sentiment-based hypothesis regarding high first day returns works well with the subsequent long-term underperformance. The investors' overoptimism about the offering may lead to overly escalated initial returns, which should be followed by a price reversion towards the fundamental value afterwards, i.e. the long-term underperformance (Ljungqvist et al. 2006;Ritter and Welch 2002). We consider five different time horizons for long-term performance for which the cumulative log-returns are calculated: first day closing price to the (1) closing price one year, (2) half a year (3) and quarter of the year after the IPO; and the closing price one month after the IPO to (4) the closing price one year (5) and half a year after the IPO. Such an approach is used to avoid coincidental results based on a randomly selected period marked as the long-term. Figure 2 provides an overview of the cumulative returns over the five specified horizons for the low and high attention IPOs. It seems that, with an exception of the shortest horizon, the high attention IPOs clearly under-perform the low attention ones in the long-term.
Thus, the first results are in line with the findings of Da et al. (2011) and the attention/sentiment based theory on IPOs.
We proceed by regressing the long-term returns on the abnormal search volume on the IPO date. Table 4 compares the predictive power of ASVI over the long-term cumulative returns (LR) for the five defined periods. The results provide only weak evidence for the ability of ASVI to forecast the negative long-run returns. For the half-year horizon (measured both from the opening day (2) and one month after IPO (5)), ASVI negatively correlates with the LR returns. Nevertheless, we see no significant effect on the one year (1, 4) or quarter of the year (3) cumulative returns regardless all coefficients being negative in sign. Da et al. (2011) construct an interaction variable between ASVI and the initial return (ASVI × IR) as the high initial return of the IPOs that also experience increases in retail investor attention should be partly driven by the price pressure and hence revert in the long-term. We follow their procedure and regress the cumulative long-term returns on initial returns and the interaction variables. Table 5 shows that there is, as expected, a higher price reversion for the IPOs that experienced high initial returns (1-5), albeit the effect is significant only for cumulative returns measured from one month after IPO. The performance of the interaction variable (5-10) matches the findings of Da et al. (2011) -it is obvious that the high attention IPOs with high first day return experience a severe price reversion in the longterm. The effect is significant for all considered horizons with the exception of the quarter of the year horizon measured from the offering day. It seems, and the results from the other regressions support this claim, that a quarter of the year horizon is too short for the prices to revert to their long-term level.
We further employ the sentiment (dummy) interaction with ASVI to account for the effect of attention on the long-term returns in positive, medium and negative sentiment periods. We regress the long-term returns on ASVI in different sentiment periods. Results are provided in Table 6. Interestingly, only the IPOs that went public in high sentiment periods and get abnormal attention show The long-term performance LR i and the independent variables are defined in Table 1. The columns show over which period the cumulative return is calculated: first day closing price to the (1) closing price one year, (2) half a year (3) and 91 days after IPO; and the closing price one month after IPO to (4) the closing price one year (5) and half a year after IPO. *, **, and *** represent significance at the 10%, 5%, and 1% level, respectively, standard errors are shown in the parentheses. N is the number of observations. the price reversion in the long-term. Nevertheless, also sentiment itself is able to predict the long-term reversal, albeit for fewer horizons and with lower significance.

Initial returns versus underpricing
The terms "initial return" and "underpricing" are usually used interchangeably. However, Ma and Tsai (2002) argue that under the sentiment hypothesis, the interchangeability is not correct. According to their definition, the initial return has two components -true discount (TD) and market reaction (MR) -and it is split in the following way where P m is the first day closing price, P o is the offer price and P e is the equilibrium (fundamental) market price. In the previous section, we have shown that the price revision and reversion for the high attention IPOs happens approximately half a year after the offering. Moreover, if return variance is calculated for 30-day periods up to one year after IPO, the lowest variance corresponds to a horizon between 150 and 180 days after emission. Therefore, we use the average price between t+150 and t+180, where t is the IPO date, as an estimate for P e . Note that any estimate of the fundamental price is rather arbitrary so that other definitions are indeed feasible. According to the authors (Ma and Tsai 2002), the positive values of MR suggest that investors overreact, while the negative values suggest investors' under-reaction. The true discount, on the other hand, corresponds to the The cumulative long-term return LR i is the dependent variable in each regression. LR i and the independent variables are defined in Table 1. The columns show over which period the cumulative return is calculated: first day closing price to the (1) closing price one year, (2) half a year (3) and 91 days after IPO; and the closing price one month after IPO to (4) the closing price one year (5) and half a year after IPO. *, **, and *** represent significance at the 10%, 5%, and 1% level, respectively, standard errors are shown in the parentheses. N is the number of observations. actual underpricing. Thus, we use this setting to confirm the results that ASVI, especially if combined with positive sentiment on the market, drives the investor overreaction. In contrast, we expect that ASVI should not possess any significant information about the underpricing term TD. To see whether such expectations are valid, we calculate mean TD and MR for the high and low attention IPOs. Figure 3 displays the comparison. As expected, the true discount does not seem to be influenced by attention. Conversely, the market reaction and attention devoted to IPO show strong interdependence. The relationship is majorly confirmed by the regression results. We regress TD and MR on attention measured by ASVI, on the ASVI interaction with the initial return, and on the attention-sentiment interaction variables. Results are presented in Table 7. On the one hand, it can be observed that no attention-based variable predicts the underpricing term. On the other hand, market seems to overreact on the high attention IPOs, albeit the effect is significant only at 10%. The effect is more pronounced if we take into account the interaction with initial return, which is logical as MR is one of the two terms which the initial return consists of (the evidence is thus stronger against ASVI and TD interdependence, as the interaction term is insignificant in TD). Surprisingly, we see only an insignificant effect of the sentiment interaction variables and the market reaction. While the coefficient is positive for attention in positive sentiment periods, it is The cumulative long-term return LR i is the dependent variable in each regression. LR i and the independent variables are defined in Table 1. The columns show over which period the cumulative return is calculated: first day closing price to the (1) closing price one year, (2) half a year (3) and 91 days after IPO; and the closing price one month after IPO to (4) the closing price one year (5) and half a year after IPO. *, **, and *** represent significance at the 10%, 5%, and 1% level, respectively, standard errors are shown in the parentheses. N is the number of observations.  The dependent variables are true discount TD i and market reaction MR i as defined by Ma & Tsai (2002). TD i , MR i and independent variables are defined in Table 1. *, **, and *** represent significance at the 10%, 5%, and 1% level, respectively, standard errors are shown in the parentheses. N is the number of observations. insignificant (albeit on the edge of 10% significance). Even more surprising is the positive coefficient for the attention in negative sentiment periods, as one would expect this term to be negative. It suggests that investors overreact to IPOs also in low sentiment period and that it is the attention that drives the overreaction and not sentiment. This is confirmed by regression (8), which shows that sentiment is not able to predict the market reaction on its own. The insignificance is indisputable in this case.

Discussion
We confirm that initial returns are higher for the IPOs that receive above average attention. However, we argue that the effect is significantly present only for the firms going public in the positive sentiment periods. In addition, since the daily data are used, we are able to demonstrate that Google search volume is capable of forecasting the initial returns within a few days horizon. Contrary to Da et al. (2011), we observe a weak evidence of Google data ability to forecast (with negative sign) the long-term cumulative returns. Nevertheless, in line with the authors, we show that the high attention IPOs leaving a lot of money on the table experience a price reversal in long-term. In correspondence with the initial returns results, the long-term cumulative returns seem to be inversely proportional to the IPO investor attention only for firms that emitted shares during the positive sentiment periods. The findings correspond to predictions of Derrien (2005) claiming that it is the overoptimistic investors who leave the money on the table rather than the issuing firms.
Finally, we test Google search volume in the setting of the model proposed by Ma and Tsai (2002), which questions the interchangeability of terms initial return and underpricing. The results suggest that the Google search volume is able to predict one part of initial returns -the market overreaction to the offering -, while the otherthe true IPO discount (i.e. the underpricing) -is unpredictable by Google data, which is in fact expected.

Endnotes
a The median period of 26 trading days is chosen as it is close to a trading month and such choice delivers the best results. However, it needs to be noted that the results do not change qualitatively for the median periods between 20 and 30 trading days. b The guide is available at http://hcd.ucdavis.edu/ faculty/webpages/kenney/misc/Firm_IPO_Database_ Guide.pdf. c Google Trends system allows to download daily series for a period of up to three months. For our given dataset, we have selected a three-month period covering the IPO date for each company.