The concept of liquidity
Every day, hundreds of billions, if not trillions of dollars exchange hands via stock markets worldwide. Nasdaq alone reports daily trading volume to consistently be above 100 billion dollars each day . It is not difficult to imagine that with this enormous amount of trading going on, even the tiniest fraction of this amount in the sense of transaction costs is already a very large number by itself. While one would intuitively think of transaction costs as commissions and brokerage fees paid in order to execute a trade, stock market liquidity refers to the cost of transacting in ways beyond the traditional sense of these costs.
Liquidity is a puzzling concept in finance that has long been investigated. It is difficult to grasp as there is little consensus on a single measurement that captures it. The possible reason for this is that there is no single measure that defines it as it is hardly a one-dimensional concept. It is often defined as encompassing the costliness and speed with which an investor can buy or sell a large quantity of an asset, without severely impacting its price. Holden, Jacobsen and Subrahmanyam (2013) categorize liquidity into three dimensions: cost, quantity, and time. This is also intuitively implied by the definition above. Inherent to this definition, a highly liquid asset is an asset that can be traded quickly and in large amounts at approximately the same price. In contrast, a highly illiquid asset is an asset that cannot be traded at all; or can be traded but only by severely impacting the price.
This research will mostly focus on the cost dimension, which can be regarded as the transaction cost to a liquidity demanding trader. The aim is to use previously identified determinants that drive these transaction costs to test whether the cost of liquidity can be predicted on a daily basis. After all, it would be relevant to virtually anyone interest in trading if the transaction cost of trading tomorrow or even next week could be estimated using data available today.
The transaction cost associated with trading a security can be roughly divided into four components: brokerage fees, the opportunity costs accompanying the trade, the bid-ask spread, and lastly the price impact of the trade. Taking the brokerage fees and opportunity costs as given, the latter two can be defined as the cost of seeking immediacy i.e. the cost of liquidity.
In most trading environments; be it an open-outcry pit, dealer market, or electronic exchange, a trade is executed between a liquidity demander, i.e. an investor seeking immediacy in buying or selling a security, and a liquidity supplier, who readily has quotes available to buy a security at a bid price or sell it at the ask price. The difference between the bid and the ask is called the spread and is set by the liquidity supplier, also called the market maker. Since the current market price of an asset is the midpoint between the highest bid and the lowest ask quote, the difference between the actual price paid or received and the quoted midpoint is one-way cost of transacting, also called the half-spread. Damodaran (2005) defines liquidity cost as “the cost of buyer’s remorse: it is the cost of reversing an asset trade almost instantaneously after you make the trade”. In other words, the bid-ask spread can be regarded as the round-trip transaction cost of buying and instantly selling a particular security. Lastly, the price impact of a transaction can be viewed as the difference between the midpoint price of an asset before the trade is initiated and the new midpoint after the trade has occurred. Since quantity is an important factor in liquidity through the quoted depth, trading a large quantity of an illiquid asset can result in a severe price impact.
Research idea
First formally investigated by Demsetz (1968), the bid-ask spread has been a central factor in the liquidity literature. The investigation of its determinants is intuitively important for market makers, as they are the ones setting the spread and hence directly experience its impact. Moreover, it provides practical relevance to exchange operators as well as regulators, by possibly identifying macro-economic factors in the market that could distort liquidity. Furthermore, establishing predictive factors of liquidity cost can help traders reduce their overall transaction costs by scheduling trades. In general, a better understanding of the drivers of liquidity is useful to both academics and practitioners. Hence, the main research question of this paper is:
Can we predict aggregate stock market liquidity costs?
Answering this question will contribute to the existing literature by considering liquidity cost on an aggregate market-wide level, instead of the cross-sectional firm level perspective that most studies adhere to. The relation between liquidity cost (the effective spread) and volume, volatility, and size has been extensively researched on a cross-sectional basis (e.g. Stoll and Whaley, 1983). However, there are not that many studies on the predictive factors of aggregate liquidity cost. The study by Chordia, Roll, and Subrahmanyam (2000) does investigate aggregate market liquidity, though they do find irregular results such as a negative relation between the spread and volatility. Also, one of the shortcomings of their paper is that the sample period is a consistent bullish market, which may influence the results. I will build on this by analyzing the behavior of aggregate market liquidity during more turbulent times, i.e. the first decade of this century. Also, Breen, Hodrick and Korajczyk (2002) develop a price impact measure based on cross-sectional firm characteristics and compare the out-of-sample predicted estimate to actual price impact. They find that on average their estimate overstates the actual price impact and suggest future research to investigate, among other things, the possible explanatory power of price impact over other measures of liquidity, like the bid-ask spread. Moreover, Jones (2002) investigates the time series of turnover and spreads over a hundred-year period and finds that variation in aggregate stock market liquidity is an important determinant of expected stock returns, he even presents evidence of spreads and turnover being able to predict stock returns one year in the future.
As has become evident, the cross-sectional determinants of liquidity costs have been extensively studied as well as their predictive power. Also, a few studies have investigated the time series of aggregate liquidity, but conclusive results explaining why aggregate liquidity varies over time have yet to come. This forms a gap in the current literature as well as the sample period chosen (almost all studies have sample periods ending in the previous century) and the interaction between the price impact measure of liquidity and the bid-ask spread. I will try to narrow this gap by analyzing the predictability of aggregate market liquidity using cross-sectional aggregated spreads and price impact measures during a recent sample period, and I will try to identify macro-economic factors that influence this predictability.
The predictability of aggregate market liquidity will be investigated by performing a regression analysis on various dependent variables that proxy for aggregate market liquidity such as the bid-ask spread and the price impact. Stock price and volume data will be retrieved from the CRSP Compustat database, while intraday data such as the effective and quoted spread will be retrieved from the Wharton Research Data Services database, which has already aggregated intraday Trade and Quote (TAQ) data on a daily level for individual shares, making it possible to analyze intraday data on a regular computer. Furthermore, interest rate data will be obtained from the Federal Reserve database.
The remainder of this paper is outlined as follows. The following section is a literature review and examines the current state of the literature, looking at how liquidity is measured, the determinants of liquidity and how liquidity can be predicted. The next section is an overview of the data sample and the methodology used, it discusses the definition of variables, the formulation of testable hypotheses, and the statistical methods used to analyze the data. Following, the results section provides a discussion of the empirical results, succeeded by a conclusion section.
II. Literature Review
The aim of this section is to analyze and synthesize the current state of the literature. First, I will identify the various measures that can be employed to capture the cost of liquidity and clarify why it is difficult to build a single measure that can be agreed upon. Then, I will consider the various theoretical determinants of liquidity and how they are currently employed in the literature. Finally, I will explore recent studies on the prediction of liquidity cost and how I will contribute to this literature with my current research by presenting a conceptual framework consisting of hypothesized relations between the theoretical concepts.
Measuring liquidity
As mentioned earlier, liquidity is a multi-dimensional concept that is almost too broad to be narrowed down into a one-sentence definition or a single measure. One possible reason there is so much debate on how to measure liquidity is the difficulty of gathering and analyzing the data. To determine the daily or monthly average bid-ask spread on a security, one has to analyze the transactional data for that security for every single executed trade and its accompanying bid-ask quotes. While the bid-ask spread is only one facet of the concept of liquidity, ignoring quantity and time dimension, it is empirically the most relevant way to estimate transaction costs as it estimates the round-trip cost of a trade that is immediately reversed, i.e. the cost of liquidity.
Holden et al. (2013) define the effective spread as the sum of the price impact and the realized spread. Theoretically, the price impact is the dollar or percentage amount that the price of a security moves after a specific trade, while the realized spread can be seen as the difference between the price paid for a security, and its newly quoted midpoint just after the trade. Empirically, the realized spread is measured as the difference between the log of the price of a trade and the log of the midpoint of the best prevailing bid and ask at an arbitrary amount after that trade, multiplied by the sign of the trade (negative for a sell) and by two to account for the round-trip cost. The price impact is measured as the difference between the log of the midpoint at the time of the trade and the log of the midpoint at an arbitrary amount of time after the trade, again multiplied by the sign and by two.
Since Demsetz (1968) started the research on the bid-ask spread, many have followed while at the same time technology has made an impressive leap forward. Fong, Holden, and Trzcinka (2014) state that the number of trades and quotes globally have grown at a compounded rate of 32.8% per year from 1996 to 2007 and during the same period Hennessy and Patterson (2012) document that computing power measured by CPU performance has experienced an annual compound growth rate of 31.0%. Hence, it is relatively safe to say that the growth of global trading volume has at least kept pace with advances in computing power and consequently the ease of handling this intraday transactional data has not improved.
This difficulty in data analysis is one of the reasons that various alternative methods of measuring liquidity have emerged. The other reason being that this intraday transactional data is provided by the New York Stock Exchange TAQ database starting in 1993. Hence, time series analysis of the bid-ask spread over longer periods before this time are simply not possible and hence alternative measures constructed from available daily data are necessary. Roll (1984) first attempted this by using the serial covariance of daily stock returns to construct a measure of the bid-ask spread. He finds that the resulting estimates of the spread are strongly negatively related to firm size. Lesmond, Ogden, and Trzcinka (1999) estimate transaction costs by modeling the time-series of daily returns and incorporating zero-returns while doing so. With similar results to Roll (1984), their cross-sectional estimates of transaction costs range from over 10% for small firms to 1.2% for large firms.
Amihud (2002) employs a measure of illiquidity that is given by the daily ratio of absolute stock return to dollar volume and tests the effect over time of expected market liquidity on expected stock returns, he hypothesizes that excess returns reflect compensation for expected stock illiquidity. This hypothesis is strongly supported by the results and the evidence suggests the existence of an illiquidity premium in asset pricing. This premium is also investigated by Pastor and Stambaugh (2003), who find that expected returns are related cross-sectionally to variations in market-wide liquidity. Moreover, they find that expected returns of stocks that are highly sensitive to liquidity, exceed those of stocks with low sensitivities by 7.5% annually (Pastor and Stambaugh, 2003).
Hasbrouck (2009) estimates effective cost (the Gibbs estimate) from daily closing prices and compares these estimates to high frequency measures such as the bid-ask spread while also looking at the asset pricing implications, expecting a positive relation between returns and trading costs. He indeed finds this positive relation, though the effect is mitigated by seasonality as it is mostly concentrated in January. More importantly, Hasbrouck (2009) finds a high correlation between his Gibbs estimate and the data high frequency data retrieved from the Trade and Quote database of the NYSE.
Finally, Goyenko, Holden, and Trzcinka (2009) attempt to find a best measure amongst all these different estimates. Besides employing their own measurement based on daily data, they conduct a horserace between all liquidity measurements mentioned above as well as measurements calculated form intraday data. They find that their own new measure outperforms all others, although the Amihud (2002) illiquidity estimate is a good measure of price impact.
Evidently, these studies have demonstrated that accurate measures of (il)liquidity can be constructed using daily stock return and volume data. However, considering I want to test the predictability of transaction costs on an aggregate level and decompose those costs into realized spread and the price impact, I will be using the WRDS intraday indicators compiled on a daily basis to construct these aggregated measures.
The determinants of the bid-ask spread
Although the literature clearly suggests various components to the effective spread, it is still not entirely clear what underlying factors cause the bid-ask spread. The two-way decomposition of effective spread into realized spread and price impact mentioned earlier has been modeled by Huang and Stoll (1997) and encompasses the three main components of the spread prevalent in the current literature: order processing costs, adverse selection costs, and an inventory cost component. Order processing costs can be viewed as the explicit costs in making a market, e.g. settlement or execution fees. The negative serial covariance model by Roll (1984) estimates the implicit costs that are assumed to be driven order processing.
The adverse selection component in liquidity costs, also known as the asymmetric information paradigm studied by (Admati & Pfleiderer, 1988) entails the agency cost of adverse selection. The basic premise here is that when liquidity demanders are informed, they will only trade if their expected returns net of transaction costs are positive. Hence, market makers, who are the liquidity suppliers, anticipate that some traders are informed and adjust their quoted spread and depth accordingly. This insider trading argument was first brought forward by Kyle (1985), who is well known for his Lambda, a price-impact coefficient that is the slope of the price function. This measure encompasses both cost and quantity and is measured by the signed square root of volume. Using this measure, Kyle (1985) examines the “informational content” of prices.
Another component of the bid-ask spread is encompassed by a phenomenon known as the inventory paradigm (Demsetz, 1968, Stoll, 1978, and Ho and Stoll, 1981). This suggests that the inventory cost component of the bid-ask spread compensates for dealer financing and the risks for the market maker associated with holding inventory. Seeing how a market maker or specialist has to hold inventory to smooth the process of trading if need be, this can impose risks during times of market volatility as well as steep interest rate changes.
The model by Huang and Stoll (1997) identifies all components of the spread, their results indicate the existence of a large order processing cost as well as a smaller but still significant adverse selection and inventory cost component. Moreover, they find that components vary greatly according to trade size: with medium and large trades carrying a much larger proportion of the adverse selection and inventory cost component. This result also confirms the argument by Stoll (1978) that order processing costs are likely to be fixed, implying that these costs decrease relatively to trade size.
Benstond and Hagerman (1974) argue that the bid-ask spread is a function of the market demand curve, i.e. the amount of immediacy sought by liquidity demanders, the amount of competition in the market (measured by the number of dealers in the market), and lastly the cost born by liquidity supplying dealers. In their study, Benston and Hagerman (1974) take the market or investors’ demand as given, and name inventory costs, insider trading, and competition as important factors affecting the spread. Their results support these hypotheses.
Admati and Pfleiderer (1989) and Foster and Viswanathan (1993) suggest that liquidity could show seasonal patterns possibly due to the opportunity cost of devoting time considering a certain trade. This could influence investor sentiment on specific weekdays or around market closures on holidays.
In a more general setting, Hasbrouck and Seppi (2001) examine commonalities in market-wide liquidity. They examine common covariation in liquidity proxies and trade impact coefficients, though their results are not supportive of common factors that are economically significant. Similarly, Huberman and Halka (2001) document the appearance of a systematic time-varying component of liquidity. However, they do not find evidence to support either the inventory-risk or asymmetric information paradigms to explain the variation in systematic liquidity.
Predicting liquidity
Various studies such as Stoll and Whaley (1983), Jegadeesh and Subrahmanyam (1993), and Tinic and West (1972) study the cross-sectional determinants of liquidity. They find the percentage effective spreads to be negatively related with price level, volume, the number of market makers, and positively related with the volatility. However, these studies do not consider liquidity on an aggregate market-wide level.
Engle and Lange (1997) employ a new measure of liquidity that measures the depth of the market. They find that market depth varies positively with past volume, while negatively with the number of transactions. Linking to Kyle’s (1985) informed trader paradigm, this suggests that a higher volume may be connected with an arrival of (informed) speculators, and hence reduces market liquidity.
The study by Chordia et al. (2000) most closely resembles what I am pursuing with this research. They try to find a better understanding of the determinants of both liquidity and trading activity over time, and study this by aggregating daily spreads and depths market-wide. The dependent variables used that proxy for liquidity are the percentage quoted and effective spread, dollar depth, and a composite measure of both spread and depth. To measure trading activity, they employ the dollar volume and the number of trades. In finding explanatory variables the study builds on the aforementioned inventory and asymmetric information paradigms. They suggest that a change in the short rate (overnight Federal Funds Rate) could imply a change in margin trading and financing inventory, and hence affect the spread and depth. Both the short rate and the long-term treasury rate are also used to establish a term structure, in which a change could also affect liquidity. Moreover, a variable that proxies for default spreads is employed since an increase in default spreads could affect the risks of holding inventory. Based on the information asymmetry paradigm, they suggest that major macro-economic announcements such as GDP announcements, unemployment announcements, and CPI announcements would proxy well for information-based trading. Also, they nominate equity market performance and volatility as other causative candidates. Stock price movements could trigger changes in investor expectations as well as optimal portfolio combinations, they do note however, that the direction of movement could trigger asymmetric changes in liquidity as market makers could find it more difficult to adjust inventory in falling markets rather than in a rising market. Volatility is expected to be positively related to the spread as an increase in market volatility increases the risk of holding inventory. The findings are generally as expected: quoted spread, depth, and trading activity respond to short-term interest rates, the term spread, and market return. However, the response to volatility is not as expected, as volatility seems to be negatively related to the spread and positively to the depth (Chordia et al., 2000). Moreover, they find an indication that bid-ask spreads respond asymmetrically to market movements: while spreads increase heavily in down markets, they only marginally decrease in up markets. I will contribute to this by testing whether this holds true for the predictive power of the determinants as well.
In another paper, Chordia, Sakar, and Subrahmanyam (2001) study the determinants of the bid-ask spread and volume in both bond an equity markets. They use roughly the same variables as the preceding study by Chordia et al. (2000) but now also include a regression with lagged variables to investigate their predictive power. They find that equity and bond spreads have a significant influence on each other, as well as lagged returns, lagged interest rates, lagged volume, and lagged spreads have an influence on both equity and bond spreads. Interestingly, they find that bond spreads lead equity spreads, confirming the view that when an order imbalance occurs, the institutionally crowded bond markets are first appealed. Moreover, they find that during times of crisis, defined as the Russian and Asian crises of the 1990s, both bond and stock market spreads and volume become more volatile and more correlated.
In a related study, Goyenko and Ukhov (2009) aim to establish a liquidity link between the equity market and the Treasury bond market. Similar to Chordia et al. (2001), they find a lead-lag relationship between the two markets with regard to illiquidity, as well as bi-directional Granger causality. The effect of stock illiquidity on bond illiquidity is consistent with so-called flight-to-quality scenarios where investors partially exit the equity market and move funds to safer (read: more liquid) assets, usually during times of market turbulence (Goyenko and Ukhov, 2009).
On a professional level, estimating reliable forecasts of transaction costs intuitively can cut costs on a massive scale. An investment company called Investment Technology Group (ITG, 2009), has built a structural model to predict future transaction cost, adaptable to any scenario. They rely on “stock-specific econometric models of volatility, price impact, and price improvement, as well as a risk model” (ITG, 2009). Taylor (2001) quantifies the possible impact of scheduling trades according to a spread forecast my assessing the economic and statistical significance of forecasts using bid-ask spreads from the London Stock Exchange. He finds that when the unrestricted VAR model as proposed by Huang and Masulis (1997) is used, spreads incurred are about 35% lower than when trades are not scheduled (Taylor, 2001).
As has been documented by several researchers, volatility may be an important factor in forecasting systemic liquidity (e.g. Chordia et. al (2001), Wyss (2004). Deuskar (2007) employs a model of liquidity and volatility linking stock return to volatility and liquidity. He finds that markets are more liquid when current stock return and investor sentiment is high. An example given is that when liquidity suppliers believe a risky asset is actually less risky, they are willing to hold more of it and hence push down its risk premium, increasing current return. The market makers now charge a lower premium to liquidity demanders, resulting in more liquidity. Contrary, when liquidity suppliers believe the asset is riskier, they will charge a higher risk premium, pushing down the current return, which leads to a more illiquid market (Deuskar, 2007).
Hypothesis Development
After having closely examined the literature, it is evident that there is a vast amount of research on liquidity. Although a lot of the literature focuses on the measurement of liquidity, many studies have attempted to find common determinants of liquidity and have documented commonalities (e.g. Chordia et al. (2000)). Following the findings of the current state of the literature I hypothesize the following:
H1: market-wide liquidity can be reliably forecasted on a daily basis
H2: macro-economic and market-specific variables such as interest rates, return, and volatility can be used to forecast liquidity on a daily basis.
III. Method and Data
After conducting the literature review, it has become clear that there is an abundance of literature on liquidity measurements and their possible determinants. As the goal of this thesis is to determine whether we can forecast liquidity on an aggregate level, this section will describe the methodology and data used to test the hypotheses set out earlier. First, the methodology used to analyze the data and test the hypotheses is discussed, followed by the method of data collection, data filters, and variable composition. Finally, I will also present descriptive statistics on the main variables used.
To assess predictability of liquidity, the bid-ask spread in particular, a time series ordinary least squares regression is employed. This method closely follows the work of Chordia et al. (2000) and Soderberg (2008). Specifically, these authors use macro-economic predictor variables such as market return and the short-term borrowing rate to forecast the effective spread. As persistence in the bid-ask spread has been widely documented (e.g. Chordia et al. (2000, 2001), it is relevant to compare the forecasted model to a naïve forecast model, i.e. a forecast of the spread based only on its own lagged values. Moreover, following Soderberg (2008), an out-of-sample forecast will be produced to test the reliability of the estimated coefficients.
Data Sample
The sample used in this study will cover all NYSE-listed stocks in the period 1999 – 2011 inclusive, as the WRDS intraday indicators are only available up to 2013 and the last two years will be used to estimate an out-of-sample forecast. Stock return and spread data, respectively, will be retrieved from the Center for Research in Security Prices (CRSP), as well as volatility measured by the return on the CBOE VIX index. Intraday indicators (i.e. spread and price impact) compiled by the WRDS service, based on data from the NYSE Trade and Quote (TAQ) database are used to determine aggregate liquidity. Interest rate data is provided on the Federal Reserve website, and macro-economic announcement history can be retrieved from various U.S. government websites as well.
Data screens and filters will be applied following the methodology of Chordia et al. (2000): stocks are either kept or deleted according to the following criteria:
• Stocks are included if they are present in both the CRSP and the TAQ databases in both the beginning and the end of a specific year.
• Stocks are dropped if they change from NYSE to NASDAQ or vice versa during the sample period.
• Securities other than ordinary equity such as ADRs, preferred stocks, and trust components are also dropped.
• To avoid influence from extremely high or low-priced stocks, stocks with a price above 999$ and under $2 are deleted from the sample.
• Stocks with the following bid-ask spread anomalies will also be deleted from the sample: quoted spread higher than 5$, effective spread more than four times larger than quoted spread.
To forecast the predictability of aggregate liquidity, three different models will be employed. The models use the same predictor variables but will differ on the dependent variable. As the effective spread can be decomposed into realized spread and the price impact, I will follow this two-way decomposition to analyze possible differences in predictor coefficients and predictability as a whole. Below, the three different dependent variables are outlined, followed by the explanatory variables used in this research. After describing the variables used, a description of the various regression models will be provided.
Dependent variables
For each stock the following dependent variables are defined:
%EffectiveSpread is the daily value weighted percentage effective bid-ask spread on a given stock, which is then averaged across stocks to come up with a daily market-wide effective spread.
%RealizedSpread is the daily value weighted percentage realized bid-ask spread on a given stock, also averaged across stocks to estimate an aggregate daily average.
%PriceImpact is the daily value weighted percentage price impact on a given stock, once again averaged across stocks to determine an aggregate number.
The dependent variables all follow the standard formula to calculate spreads as outlined by Holden et al. (2014):
%EffectiveSpread = 2 * Dk * (Ln(Pk) – Ln(Mk))
%RealizedSpread = 2 * Dk * (Ln(Pk) – Ln(Mk+1))
%PriceImpact = 2 * Dk * (Ln(Mk+n) – Ln(Mk))
Where the number two accounts for the round-trip cost of buying and selling, Dk is an indicator of whether the kth trade is a buy or sell, Pk is the realized price of the kth trade, and Mk represents the quoted midpoint at the time of the kth trade. As can be observed from the formulas, the realized spread accounts for the difference between the price and the new midpoint after a particular trade, while price impact grasps the difference in midpoints between the trade and an arbitrary amount afterwards (Holden et al. 2014).
These variables are computed per stock on a value weighted basis, using the Lee-Ready (1991) method to estimate the trade direction indicator. To calculate aggregate cross-sectional averages of these daily measures I follow the procedure outlined by Chordia et al. (2001) by taking the value-weighted average across stocks, leaving stocks that did not trade on a day (and hence have a spread and price impact of zero) out of the sample.
Predictor variables
Following the conceptual framework and Chordia et al. (2001), the explanatory variables will be mostly based on the inventory paradigm (Stoll, 1978) and the asymmetric information paradigm (Kyle, 1985). The explanatory variables are defined as follows:
ShortRate is the daily difference in the overnight Federal Funds Rate.
TermStructure is the daily difference of the difference between the overnight Federal Funds Rate and the yield on a 10-year constant maturity Treasury bond.
DefaultSpread is the daily difference of the difference between the yield on 10-year constant maturity Treasury bonds and the yield on investment grade corporate bonds.
Ret+ is the daily return on the CRSP market index if positive, and zero otherwise.
Ret- is the daily return on the CRSP market index if negative, and zero otherwise.
Vol is the daily return on the VIX index to account for a change in volatility.
Forecast Models
As previously mentioned, different models will be employed to assess predictability for effective spread as well as realized spread and price impact separately. In these models, various sub-models will be employed as well, these will be outlined now.
Model(1,1): EffectiveSpread_t = a + B(EffectiveSpread_t-n) + e_t
Model(1,2): EffectiveSpread_t = a + B(Predictors_t-n) + e_t
Model(1,3): EffectiveSpread_t = a + B(EffectiveSpread_t-n) + C(Predictors_t-n) + e_t
Where EffectiveSpread_t is the effective market-wide bid-ask spread at time t. The variable a is a constant and e_t is the error term at time t. Predictors_t-n is the array of predictor variables at time t-n, depending on the number of lags used in the model.
Model(2,1): RealizedSpread_t = a + B(RealizedSpread_t-n) + e_t
Model(2,2): RealizedSpread_t = a + B(Predictors_t-n) + e_t
Model(2,3): RealizedSpread_t = a + B(RealizedSpread_t-n) + C(Predictors_t-n) + e_t
Where RealizedSpread_t is the effective market-wide bid-ask spread at time t. The variable a is a constant and e_t is the error term at time t. Predictors_t-n is the array of predictor variables at time t-n, depending on the number of lags used in the model.
Model(3,1): PriceImpact_t = a + B(PriceImpact_t-n) + e_t
Model(3,2): PriceImpact_t = a + B(Predictors_t-n) + e_t
Model(3,3): PriceImpact_t = a +B(PriceImpact_t-n) + C(Predictors_t-n) + e_t
Where PriceImpact_t is the effective market-wide bid-ask spread at time t. The variable a is a constant and e_t is the error term at time t. Predictors_t-n is the array of predictor variables at time t-n, depending on the number of lags used in the model.
Of all the aforementioned explanatory variables, lagged variables up to 5 days will be generated to test their predictive power for the bid-ask spread and price impact. The estimated forecasts will then first be compared to a naïve forecast, using only the lagged dependent variable itself as a predictor. The root mean squared error terms of the forecasts will then be compared to observe which has more explanatory power.
Descriptive statistics
Before setting up the models and analyzing the forecasts, it is relevant to look at the descriptive statistics of the variables. Figure 1 shows the development of the percentage bid-ask spread over time. The blue line indicates the total percentage effective spread composed by the percentage realized spread (orange line) and the price impact (yellow line). As can be observed, the price impact only makes up a very small portion of the total effective spread, hence it will be interesting to see how reliable the forecast of the price impact will be. Moreover, it becomes clear that spreads vary significantly over time, with remarkable peaks of up to 2.5%. Intuitively, it is apparent that these peaks correspond to turbulent market times, i.e. just after the turn of the century as well as the 2007-09 economic crisis.
Table 1 presents the descriptive statistics for the dependent and explanatory variables. Before applying the various data screens and filters as well as averaging the intraday indicators on an aggregate level, the dataset contains 25,904,190 observations. After applying the filters and averaging the dependent variables across stocks, we are left with 3,270 daily observations. As can be observed, over the entire time period the average spread is just 0.95%, with a standard deviation of only 0.43%. However, it is also noteworthy that at times the percentage effective spread can become as high as 2.62%.
The overnight federal funds rate has a mean of 2.68%, ranging from as low as 0.04% to a high of 7.03%. The 10-year Treasury rate ranges from 1.72% to 6.79%, with an average of 4.33%. The last rate considered captures the yield on investment grade corporate bonds (Baa) relative to the 10-year Treasury rate.
Interestingly, the average market return on the S&P over the twelve-year period amounts to 0.01%, which is coherent with the idea in finance that returns are on average equal to zero.
To test the relation between the explanatory and dependent variables. Various hypotheses will be constructed. The first hypothesis is that changes in interest rates, in the form of short rates, term structure, and default spread, will be positively related to changes in the bid-ask spread. Furthermore, I hypothesize that lagged interest rate variables, market returns, and spreads can have a predictive power to changes in the bid-ask spread.
I will perform an OLS regression to test these hypotheses, in which the null hypotheses will be ‘no effect’, and hence they can be rejected if there is a statistical significance to the coefficients. As mentioned before, I will do a robustness check by performing the regressions on both equal-weighted cross-sectional averages as well as value-weighted cross-sectional averages, though no notable difference is expected, similar to Chordia et al. (2000). Also, I will control for heteroscedasticity and autocorrelation issues in the data by using the Generalized Method of Moments with Newey West correction, following Chordia et al. (2001).
IV. Results
Table 2 reports the results of the estimated forecast of the first model, which has the percentage effective spread as its dependent variable. Three different regressions are estimated. The first column corresponds to the naïve model to test for persistence in the bid-ask spread. As can be observed, both the regression coefficient and its adjusted R-squared are very close to 1, implying there is a strong relation between the effective spread and its lagged values. Also, the t-stat (in parentheses) is astonishingly big, implying a p-value of 0.
The next estimation depicted in the second column regresses the percentage effective spread on the aforementioned explanatory variables. In this regression, the lagged value of the spread itself is not taken into account. Evidently, the results are very statistically significant again, with all but one coefficient being significant at the 0.1 percentile. Surprisingly, the change in volatility as measured by the return on the CBOE VIX index has a very small coefficient that is not statistically significant at all, in contrast to earlier studies linking volatility to liquidity (Holden et al., 2014). Also, the return on the market index has no large effect on the effective spread, though the effect for either negative or positive returns is the opposite (as would be expected). Most notably the rate on 10-year Treasury bonds seems to have a large effect on the development of the percentage effective spread, possibly reflecting the inventory risk of market makers. The sign on the coefficient here is as expected.
Lastly, the third model employs both the lagged percentage effective spread as the predictors as explanatory variables in the regression. Statistical significance and explanatory power is of similar nature as the previous two regressions, although the coefficient size on all predictors but the lagged effective spread is significantly reduced.
The next two models can be found in table 3 and 4, respectively. Table 3 displays the model in which the percentage realized spread acts as the dependent variable, no notable changes as compared to the first model are observed. Table 4 models the forecast with the percentage price impact as its dependent variable. Again, the explanatory power is very different from either of the previous models, although the coefficients are particularly low.
V. Conclusion
This study has aimed to forecast aggregate market-wide liquidity on a daily basis. To do so I have used known determinants of systematic liquidity such as interest rates, market return, and volatility. The reasoning behind these determinants comes from the adverse selection cost and inventory risk born by market makers i.e. liquidity suppliers. I have found that the liquidity proxy itself is the biggest predictor of future liquidity, be it the spread or the price impact. Moreover, the forecasts in general seem to estimate liquidity particularly well, as measured by the high R-squared in all the forecasts.
Furthermore, I have been able to reject the hypothesis that there is no relation between the predictor variables and the liquidity variable. However, seeing how coefficients are relatively small in many cases in combination with strong explanatory power of the lagged dependent variable itself, it remains unclear how economically significant the predictor variables are.