Clustering stocks using partial correlation coefficients
Introduction
For decades, the financial market has received an enormous amount of attention from academia. Yet its complex nature still remains elusive and recent financial crises support the need to better understand it. Statistical Physics is one of the popular tools to analyze financial data and many important discoveries have been made by using it. For example, though a financial asset’s returns show no serial correlation, the absolute values of returns have positive correlation over long lags, a phenomenon well known as the long memory property [1]. Also, a financial return series has heavier tails than Gaussian distribution and the absolute value of the return series tends to follow a power law [2], [3]. It also shows nonlinearity in behavior as observed in phenomena such as volatility clustering and regime switching behavior [4], [5], [6], [7]. Another interesting finding is that when multiple assets are considered together, the return series are often correlated and the correlation tends to become larger during a financial crisis [8], [9]. Correlation plays a critical role in analysis of financial time series. For example, it has been known for a while that the correlation in equity returns is time varying [10]. Also, co-movement of international markets was widely studied and methods such as conditional correlation and dynamic correlation were some of the most popular tools to approach this subject [11], [12], [13]. Network theory can be applied to study a correlation matrix as well, and previously it was demonstrated that a correlation matrix can be associated with a hierarchical tree and when tested in a small set of stocks, the stocks in the same sectors were clustered together [14]. When it comes to the construction of portfolio of assets, correlation is one of the key variables an investor needs to consider [15], [16].
This study employs the correlation to study how firms are related to others over different periods of time including the recent financial crises. Pearson correlation is widely accepted as a standard measure of co-movement between two financial return series but it has few weaknesses. For example, as it was mentioned before, the market dynamics are nonlinear in nature and so are the correlations between assets, but Pearson correlation measures a linear co-movement between two random variables. Also, when it comes to analysis of firms in a similar environment, one should consider the common factors affecting the firms such as the growth rate of market or foreign exchange rates. Those common factors may have an influence on the stocks, resulting in a bias in analysis. Such bias may skew the results to favor one side to the other, giving it a ’false correlation’. Though it is impossible to completely remove all external factors from a financial time series as some of the factors are unobservable or difficult to assign numerical values such as the corporate governance, it is likely that removing some of the obvious and prevalent factors may provide different insights which have not discovered.
Recently, Kenett et al. applied the concept of partial (residual) correlation analysis to study the financial markets [17]. They successfully demonstrated the index cohesive effect and identified a dominating sector within a market using the partial correlation analysis [18], [19]. The network approach was also used to construct node–node correlation matrices and it was shown that a node with insignificant influence does not disrupt the network even when the node is removed [20]. Another study proposed a measure called Sector Dominance Ratio using Pearson and partial correlations to study the market structure and it has empirically shown that the financial sector exhibits strong dominance for US and UK markets [21]. The studies mentioned above utilize the partial correlation analysis and the strength of this method comes from its ability to remove a common factor in correlation between two variables. A partial correlation measures how a random variable ‘’ correlates with another random variable ‘’ when a common factor ‘’ is removed from both of them. In this sense, it can be said that it is correlation between residuals of and .
One may notice that if a common factor is chosen to be the market return, the returns used to compute correlation resemble the return from Capital Asset Pricing Model [22], [23], [24]. CAPM was developed to describe how a risky asset should be valued and if the model was correct, the correlation between the residuals, or the partial correlation coefficient of two risky assets, should be statistically zero. Section 3 discusses this point and shows that the correlation coefficients are in fact non-zero. This should not come as a surprise as the CAPM was known for its relatively weak explanatory power [25].
This study is another application of the partial correlation analysis. A market index such as Korea Composite Stock Price Index (KOSPI) reflects the overall size of Korean economy. A growth of the index means that the size of economy is growing as well which will positively affect every firm in the market. By removing the effect of the market index, it may be possible to shed a new light regarding the individual performance of the stocks and their correlation with others. Specifically, it addresses following two questions:
- i.
How much difference exists between the partial correlation and Pearson correlation? Does the difference persist through different period of time?
- ii.
In terms of the partial correlation, how are firms related to each other?
Both the partial correlation and Pearson correlation are computed every 30 months to minimize the effect of change in correlation coefficients which occurs due to the nonlinearity of the market. The length of 30 months also ensures that the computed correlation coefficients are statistically significant. It is then visualized to address the first question and a simple agglomerative clustering approach is used to explore the second question.
The remaining sections of this paper are organized as follows: Section 2 describes the data set used for analysis and method of how to compute the correlations used in this paper. Section 3 reports the results of the partial correlation analysis and Pearson correlation analysis. In Section 4, clustering analyses are performed. Section 5 provides the conclusion.
Section snippets
Data and methods
The monthly adjusted closing price series of KOSPI and the stocks listed in the index from December of 2004 to December of 2014 are used for this study. The data are provided by DataGuide from FnGuide (http://www.fnguide.com/), a professional financial analytics service for the Korean market. All computation and analytics are done using MATLAB. Total of 732 firms existed throughout this period and trade volume data are used to filter out the ones that did not trade during a period of analysis.
Pearson correlation vs. partial correlation
Fig. 1 shows color maps of Pearson correlation matrix and the partial correlation matrix for entire period from 2005 to 2014. The firms are sorted by market capitalization.
The values are color gradient coded so that blue to white represents negative value (), white for zero (), and white to red corresponds to positive value (). It is visibly clear that the Pearson correlation matrix has much more red which implies it mostly consists of positive values. However, the partial
Clustering analysis
The partial correlation analysis from the previous section proposes that the interconnection between firms is different from what it normally seems. This section is dedicated to study the proximity between firms using partial correlation analysis to see which firms are close to each other. Agglomerative clustering analysis is performed with the Euclidean distance between correlation coefficients as a distance measure to determine the proximity between objects, which are firms in this case. To
Conclusion
In this paper, the partial correlation analysis is performed on the Korean stock market. The partial correlation analysis is an analysis of co-movement of two random variables when common factors are controlled. This is an important aspect to consider in Econophysics as many financial time series are subjected to common factors which may skew the results of an analysis.
In the Korean market, the market index, KOSPI, has a strong influence on stocks, driving correlation between them higher. By
References (25)
- et al.
A long memory property of stock market returns and a new model
J. Empir. Finance
(1993) - et al.
Exploring the WTI crude oil price bubble process using the Markov regime switching model
Physica A
(2015) - et al.
Testing for contagion: a conditional correlation analysis
J. Empir. Finance
(2005) - et al.
Asset returns and volatility clustering in financial time series
Physica A
(2011) Is the correlation in international equity returns constant: 1960–1990?
J. Int. Money Finance
(1995)- et al.
Dynamic correlation analysis of financial contagion: Evidence from asian markets
J. Int. Money Finance
(2007) - et al.
Correlation, hierarchies, and networks in financial markets
J. Econ. Behav. Organ.
(2010) - et al.
Asymmetric correlations of equity portfolios
J. Financ. Econ.
(2002) - et al.
Sector dominance ratio analysis of financial markets
Physica A
(2015) A critique of the asset pricing theory’s tests part 1: On past and potential testability of the theory
J. Financ. Econ.
(1977)
Empirical properties of asset returns: stylized facts and statistical issues
Quant. Finance
A theory of power-law distributions in financial market fluctuations
Nature
Cited by (31)
Estimating Historical Downside Risks of Global Financial Market Indices via Inflation Rate-Adjusted Dependence Graphs
2023, Research in International Business and FinancePartial cross-quantilogram networks: Measuring quantile connectedness of financial institutions
2022, North American Journal of Economics and FinanceForecasting price movements of global financial indexes using complex quantitative financial networks[Formula presented]
2022, Knowledge-Based SystemsCitation Excerpt :Christodoulakis [32] suggested an evolving correlation matrix and found stock co-movement based on a correlation coefficient. Jung et al. [33] analyzed the financial market structure by performing agglomerative hierarchical clustering through correlation analyses. However, a true correlation is difficult to achieve based on empirical correlation in the financial market due to white noise and market-wide movement [16].
Clustering framework based on multi-scale analysis of intraday financial time series
2021, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :In recent years, with the growth of financial market data and the improvement of computing power, researchers and investors are taking advantage of various data mining and artificial intelligence technologies to find the basic laws in financial market and investment opportunities from huge financial data. There are many studies on applying data mining and artificial intelligence technologies to financial field, such as applying deep learning models to financial prediction [11,12], utilizing text mining technologies to monitor financial market sentiment [13,14], using various clustering algorithms to classify different stocks [15,16], and developing new machine learning models dedicated to financial forecasting or stock selection [17–19]. However, in most studies related to applying data mining and artificial intelligence technologies to financial field, the financial data analyzed are usually daily rather than intraday [18,19].
Time-varying comovement and changes of comovement structure in the Chinese stock market: A causal network method
2019, Economic ModellingCitation Excerpt :Our study complements the literature by extending the analysis to the comovement of individual stocks in the Chinese stock market, an emerging market, to provide additional empirical evidence of the patterns of stock returns' comovement. The methods used to measure the comovements of asset prices include correlation (e.g., Forbes and Rigobon, 2002; Tse et al., 2010), partial correlation (Jung and Chang, 2016), cointegration relationships (Awokuse et al., 2009; Tu, 2014), variance decomposition (Diebold and Yılmaz, 2014), and causal linkages (Masih and Masih, 1999; Billio et al., 2012). Recently, some studies have used complex network topology to address this issue (Mantegna, 1999; Bonanno et al., 2001; Tse et al., 2010; Peralta and Zareei, 2016; Billio et al., 2012; Tu, 2014).
Cluster analysis on the structure of the cryptocurrency market via Bitcoin–Ethereum filtering
2019, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :From the work of [10], the complex system also possesses a large portion of Econophysics by analyzing the structure of various financial markets. Especially, many correlation-based approaches have discovered the collective market behavior [11–15], clustering phenomenon [16–21], topology of financial markets based on the minimum spanning tree(MST) [18,22–26], and its application to the portfolio management [27–30]. In particular, the studies on the crisis-driven structural changes in the financial markets have presented high correlations during the crisis [31].