Additionally, only NASDAQ and NYSE equity price data is used as the U.S. based stock exchanges were first in establishing facilities to support the development of algorithmic trading. Consequently, high frequency trading gained volume share in the US more rapidly than in Europe, as shown in Figure 5 (Kaya, 2016, p. 2). Given these arguments and considering the limited computing power, U.S. data on algorithmic trading follows as the more established choice.

Figure 5. % Share of High Frequency Trading in total equity trading per year. Reprinted from “High-frequency trading: reaching the limits.” By O. Kaya, 2016, 2. Copyright by Deutsche Bank Research.

CRSP – Daily Stock

First of all, the daily prices and trading data such as the daily number of trades and daily volume are extracted from the CRSP U.S. Stock database within WRDS. The previous mentioned CRSP query will function as the master dataset within the Stata environment and contains end-of-day prices for equity securities on the NYSE and NASDAQ exchanges. Additionally, CRSP also contains quote data, holding period returns, shares outstanding and trading volume information. Initially the entire database is extracted for the period from 1999 to 2017 containing over 34 m

illion observations. To start, only common stock observations are maintained within the query to improve the post-merger data compatibility with the IBES Price Target dataset. For common stock the variable share code amounts to either 10 or 11, hence only these share codes are kept within the sample. Moreover, tickers with multiple different shares are dropped as those are not properly comparable to the IBES identifiers which will be elaborated on later.

Additionally, a .TXT file consisting of the remaining company ticker identifiers is derived from the dataset within Stata in order to simplify extraction of successive queries within WRDS as only information on those predetermined companies will be withdrawn from WRDS thus depressing the file size. Within the daily stock price query the actual price, bid, ask and shares outstanding are adjusted using the so-called adjustment factors in order to make the mentioned variables comparable over the entire 1999-2017 period. These adjustment factors are constructed by CRSP and adjust for corporate actions such as stock splits, dividends and rights offerings. Additionally, the effective spread variable is created similarly to Hendershott et al. (2011) by means of taking the difference between the closing bid and ask its midpoint and the actual transaction price of that day as well as a volatility variable that is calculated as the deviation amid the daily high and the daily low.

IBES – Price Target

IBES also known as the Institutional Brokers’ Estimate System is a Thomson Reuters’ database which holds historical analyst estimates for more than twenty forecast measures such as earnings per share, revenue, price targets, buy-hold-sell recommendations and gross profits regarding over 60,000 companies. After completing the extraction of price target estimation data including their horizon and analyst name data from WRDS using the same 1995-2017 period as used before, it was found that the IBES data could not directly be merged with the CRSP data. Concerning IBES, it contains two ticker variables and merely the variable official ticker is compatible with the ticker variable in CRSP and should not be confused with “ticker” in the IBES dataset.Hence, “oftic” is changed to its CRSP name: ticker.

Additionally, it must be mentioned that the in IBES so called “announcement date” should be the leading date. Finally, price target estimation values are matched with their respective future actual price by lagging the forecast with its horizon meaning that an estimation with a horizon of 6 months is lagged 6 months.

Federal Reserve Bank – Interest Rates

The WRDS RATES database used in this research is based upon the Federal Reserve Board’s H.15 release that contains selected interest rates for U.S. Treasuries and private money market and capital market instruments. Daily rates are per business day and reported in annual terms. To include interest rates as a controlling factor within the regressions, the rates of U.S. treasury bills with a maturity of 3 months are extracted from the WRDS RATES database for the period 1995 to 2017. The rates are merged with the master dataset using date as the common variable.

Data Analysis Methodology

To shed light on the automation process that entails the shift from human traders to automated trading systems, analyst predictions and their accuracy will be elaborated on in relation to algorithmic trading. However, first our scope will focus on how algorithmic trading is measure and how dispersion has changed through algorithmic trading. Moreover, all independent variables that will be used in regressions, are standardized to facilitate economic interpretation. Standardization is performed by subtracting the corresponding time series’ mean from the variables and dividing this deviation by the time series’ standard deviation.

By standardizing all independent variables in such fashion, the standardized regression coefficients will represent a standard deviation change of the independent variables in the dependent variable. Hence, independent variable X is standardized such that:

〖X’〗_tj =(X_tj- μ(X))/( σ(X))

Algorithmic Trading Measure

Preparatory, a proxy has been developed to measure the development of algorithmic trading over time within the available CRSP data. To quantify algorithmic trading in a variable Hendershott, Jones, and Menkveld (2011) and Boehmer, Fong & WU (2015) use the daily number of electronic messages from the TAQ database per $100 of trading volume as proxy to measure algorithmic trading. It is the most established measure within academic research, however the TAQ database is not at this research’s disposal and hence an inferior but comparable proxy is created. Inferiority lies in the fact that electronic messaging traffic information is not available in CRSP. However, as volume data is available, the best alternative measure would be a proxy that replaces the number of electronic messages with a comparable variable. Our data shows that volume did not increase over time while the number of trades did in a comparable way to the electronic messages used in HJM’s proxy, making this a simplified but functioning replacement within our proxy for algorithmic trading. Moreover, algorithmic trading is associated with improved liquidity and an increased number of trades with smaller volume per trade (Hendershott et al., 2011).

Hence the new proxy for algorithmic trading is calculated as the daily number of trades executed for ticker j per dollar trading volume of that day derived from the CRSP database.

(2) 〖Algorithmic Trading〗_tj =〖number of trades〗_tj/〖volume〗_tj

For it being a much noisier proxy, it gives a very similar representation of the development of algorithmic trading over time that was established by Glantz & Kissel (2013) which can be noted in Figure 1 & 2.

Effects of Algorithmic Trading on Dispersion

It is assumed that algorithms have more similarities than its human counterparts and for this reason dispersion is expected to decrease with more algorithmic trading. As flash crashes are known to happen with algorithmic trading (Johnson et al., 2012) extreme short-term dispersion might have increased instead. However, considering that this study is only able to use daily data, flash crashes are not expected to influence the results. Hence, the hypotheses are formulated as:

H0: Dispersion does not change with increased algorithmic trading

H1: Dispersion changes with increased algorithmic trading

Idiosyncratic or stock-specific volatility is used to measure dispersion. Idiosyncratic risk can be calculated in numerous ways, the various measures however all give comparable results (Malkiel & Xu, 2003). Moreover, according to Bello (2008) there are no significant differences between the Capital Asset Pricing Model, the Fama French Three Factor Model and the Carhart Model regarding their outcome. Hence, in this study the CAPM is used to calculate idiosyncratic volatility as this suits the dataset best. The CAPM formula used is as follows:

R_tj-〖Rf〗_t=〖 α〗_j+β_j (〖Rm〗_t-〖Rf〗_t )+ ε_tj

Where: Rtj is Return of Stock j, Rft is equal to the Risk Free Rate, Rmt is the Return of Market portfolio and εtj is the error term of returns (i.e. idiosyncratic or company specific risk). First of all, two new variables are created to simplify the alpha and beta estimation process within Stata, namely: 〖ERS=R〗_tj-〖Rf〗_t and 〖ERM= Rm〗_t-〖Rf〗_t. These are then applied in a simple OLS regression to estimate alpha and beta per ticker over the entire period. Almost 9500 regressions similar to (4) below are performed using a loop function in Stata after which the results are then saved in the variables α and β.

Y_(ERS )= 〖 α〗_j+ β_j*〖ERM〗_t

Once alpha and beta are estimated ε_tj is then calculated as:

ε_tj=〖ERS〗_tj-〖 α〗_j-β_j (〖ERM〗_t)

It follows that idiosyncratic volitality and thus dispersion is the monthly standard deviation of the error term as displayed below:

〖Idiosyncratic Volatility〗_(t(m)j)= σ_(t(m)) (ε_tj)

Finally, idiosyncratic volatility or preferably called dispersion is regressed on the algorithmic trading measure as in line with the hypotheses to analyze if return dispersion has changed through an increase in algorithmic trading. The model is also performed while controlling for firm fixed effects and year fixed effects as it is clear from Figure 3 that for dispersion there seems to be quite a variance amongst different years and in particular for years of financial crisis.

The reason why fixed effects are used instead of random effects is that the Hausman test for random effects versus fixed effects is significant at the 99.9% significance level for regression (7) meaning that the unique errors ε_tj are correlated with the regressors and hence fixed effects panel data regressions are used to analyze dispersion. In regression (8) and (9) firm fixed effects and year fixed effects are added respectively to see if and how firm and year specific effects influence our model. Comparing the results of regressions (7) and (8) will show the effect of firm specific effects whereas the comparison of (8) and (9) is to display the influence of year fixed effects.

〖Y_(Idiosyncratic Volatility)〗_tj=〖β_0+β〗_1*〖Algorithmic Trading〗_tj + ε_tj

〖〖 Y〗_(Idiosyncratic Volatility)〗_tj=〖β_0+β〗_1*〖Algorithmic Trading〗_tj + a_j+ ε_tj

〖〖 Y〗_(Idiosyncratic Volatility)〗_tj=〖β_0+β〗_1*〖Algorithmic Trading〗_tj + a_j+γ_t+ ε_tj

*With a_j as firm fixed effects and γ_t as year fixed effects

Effects of Algorithmic Trading on Analyst Forecast Accuracy

To analyze the prediction accuracy of the remaining human analysts within the market, historical Thomson Reuters analysts’ estimations obtained from the IBES dataset are used to obtain the prediction error for a certain forecast. It follows that the difference between the estimation value at time t and the adjusted price on date t divided by the adjusted price on that date gives the prediction error of a certain estimation by analyst i for stock j. Additionally, the prediction error is squared to emphasize on the analysts that were off most in their forecasts, be it below or above. As the squared prediction error will only return positive values it lays focus on just the deviation itself for the direction of the deviation is not of concern.

〖Prediction Error〗_(t,i,j)=((〖estimation value〗_(t,i,j)-〖adjusted price〗_(t,j))/〖adjusted price〗_(t,j) )^2

Consecutively, the analyst prediction error variable will then be tested using regression analysis within the Stata statistical analysis software to see if analysts’ predictions have become statistically more accurate since the development of automation within stock markets. The dataset can be described as an unbalanced three-dimensional panel dataset for which stock ticker, date and analyst name represent the dimensions, for every ticker there are different numbers of analyst estimations on varying dates. The “missing” data is due to analysts specializing in specific stocks and because the date at which estimations are placed is random, there is however no actual missing data.

The ticker and analyst variable are into a new combined variable called tic_alys where each group merely represents the specific forecasts by analyst i for ticker j. This procedure removes the need to drop the third dimension in order to run a multi-dimensional fixed effects panel data regression within Stata. These dimensions are only combined for regression (14) and (16) where firm and analyst fixed effects are included conjointly. To answer the research question the following hypotheses are developed:

H0: Analysts’ prediction error is not influenced by increased algorithmic trading

H1: Analysts’ prediction error is influenced by increased algorithmic trading

These hypotheses lead to the regressions below of which it is expected that analyst prediction error has indeed increased in the period where automation has taken place. It seems unlikely that analysts can predict the direction of future stock prices as the analysts would have to be able to execute transactions faster than the algorithms.

Therefore, it is hard to form a definite hypothesis as algorithmic trading probably also leads to less dispersion which could facilitate analyst predictions. For this reason, the hypothesis is two-sided where time t is in date format and per day. Testing analyst prediction error versus algorithmic trading is the most direct way of examining the effects that algorithmic trading has on analyst forecast accuracy. As many other factors potentially affect the forecast accuracy, sufficient control variables are to be added and fixed or random effects will be controlled for. Moreover, to determine whether the regressions need to be controlled for fixed or random effects the Hausman test is used again. Testing for random versus fixed effects again gives a significant outcome with a 99.99% confidence level and hence H0 is rejected meaning that fixed effects need to be applied within the panel data regressions.

It follows, that six different panel data regressions will be tested within Stata to determine how prediction error is influenced. The first regression model is a plain panel regression merely to test the effect of algorithmic trading on the analyst prediction error whereas the remaining five are fixed effects panel data regressions that each control for a certain fixed effect. Regression (11) is the plain panel data regression, then firm fixed effects are added in (12) to see how firm specific effects affect the regression output compared to the plain model. Thirdly, year fixed effects are controlled for as well using year dummies to control for a time trend and comparing regression (13) with (12) should deliver insight in the effects that time exerts on the dependent variable. Successively, analyst fixed effects are controlled for in regression (14) and again by merely adding this factor to the model it should become clear if and how the model is influenced through analyst-specific properties. By comparing the outcomes of the four regressions it should become clear if, how and which fixed effects affect prediction error. The first four regressions amount to: