Clustering stocks using partial correlation coefficients

https://doi.org/10.1016/j.physa.2016.06.094Get rights and content

Highlights

  • Correlation analyses are conducted on Korean stock market.

  • Agglomerative hierarchical clustering is performed based on correlation matrices.

  • Each cluster consists of firms from multiple business sectors.

Abstract

A partial correlation analysis is performed on the Korean stock market (KOSPI). The difference between Pearson correlation and the partial correlation is analyzed and it is found that when conditioned on the market return, Pearson correlation coefficients are generally greater than those of the partial correlation, which implies that the market return tends to drive up the correlation between stock returns. A clustering analysis is then performed to study the market structure given by the partial correlation analysis and the members of the clusters are compared with the Global Industry Classification Standard (GICS). The initial hypothesis is that the firms in the same GICS sector are clustered together since they are in a similar business and environment. However, the result is inconsistent with the hypothesis and most clusters are a mix of multiple sectors suggesting that the traditional approach of using sectors to determine the proximity between stocks may not be sufficient enough to diversify a portfolio.

Introduction

For decades, the financial market has received an enormous amount of attention from academia. Yet its complex nature still remains elusive and recent financial crises support the need to better understand it. Statistical Physics is one of the popular tools to analyze financial data and many important discoveries have been made by using it. For example, though a financial asset’s returns show no serial correlation, the absolute values of returns have positive correlation over long lags, a phenomenon well known as the long memory property  [1]. Also, a financial return series has heavier tails than Gaussian distribution and the absolute value of the return series tends to follow a power law  [2], [3]. It also shows nonlinearity in behavior as observed in phenomena such as volatility clustering and regime switching behavior  [4], [5], [6], [7]. Another interesting finding is that when multiple assets are considered together, the return series are often correlated and the correlation tends to become larger during a financial crisis  [8], [9]. Correlation plays a critical role in analysis of financial time series. For example, it has been known for a while that the correlation in equity returns is time varying  [10]. Also, co-movement of international markets was widely studied and methods such as conditional correlation and dynamic correlation were some of the most popular tools to approach this subject  [11], [12], [13]. Network theory can be applied to study a correlation matrix as well, and previously it was demonstrated that a correlation matrix can be associated with a hierarchical tree and when tested in a small set of stocks, the stocks in the same sectors were clustered together  [14]. When it comes to the construction of portfolio of assets, correlation is one of the key variables an investor needs to consider  [15], [16].

This study employs the correlation to study how firms are related to others over different periods of time including the recent financial crises. Pearson correlation is widely accepted as a standard measure of co-movement between two financial return series but it has few weaknesses. For example, as it was mentioned before, the market dynamics are nonlinear in nature and so are the correlations between assets, but Pearson correlation measures a linear co-movement between two random variables. Also, when it comes to analysis of firms in a similar environment, one should consider the common factors affecting the firms such as the growth rate of market or foreign exchange rates. Those common factors may have an influence on the stocks, resulting in a bias in analysis. Such bias may skew the results to favor one side to the other, giving it a ’false correlation’. Though it is impossible to completely remove all external factors from a financial time series as some of the factors are unobservable or difficult to assign numerical values such as the corporate governance, it is likely that removing some of the obvious and prevalent factors may provide different insights which have not discovered.

Recently, Kenett et al. applied the concept of partial (residual) correlation analysis to study the financial markets  [17]. They successfully demonstrated the index cohesive effect and identified a dominating sector within a market using the partial correlation analysis  [18], [19]. The network approach was also used to construct node–node correlation matrices and it was shown that a node with insignificant influence does not disrupt the network even when the node is removed  [20]. Another study proposed a measure called Sector Dominance Ratio using Pearson and partial correlations to study the market structure and it has empirically shown that the financial sector exhibits strong dominance for US and UK markets  [21]. The studies mentioned above utilize the partial correlation analysis and the strength of this method comes from its ability to remove a common factor in correlation between two variables. A partial correlation measures how a random variable ‘i’ correlates with another random variable ‘j’ when a common factor ‘k’ is removed from both of them. In this sense, it can be said that it is correlation between residuals of i and j.

One may notice that if a common factor is chosen to be the market return, the returns used to compute correlation resemble the return from Capital Asset Pricing Model  [22], [23], [24]. CAPM was developed to describe how a risky asset should be valued and if the model was correct, the correlation between the residuals, or the partial correlation coefficient of two risky assets, should be statistically zero. Section  3 discusses this point and shows that the correlation coefficients are in fact non-zero. This should not come as a surprise as the CAPM was known for its relatively weak explanatory power  [25].

This study is another application of the partial correlation analysis. A market index such as Korea Composite Stock Price Index (KOSPI) reflects the overall size of Korean economy. A growth of the index means that the size of economy is growing as well which will positively affect every firm in the market. By removing the effect of the market index, it may be possible to shed a new light regarding the individual performance of the stocks and their correlation with others. Specifically, it addresses following two questions:

  • i.

    How much difference exists between the partial correlation and Pearson correlation? Does the difference persist through different period of time?

  • ii.

    In terms of the partial correlation, how are firms related to each other?

Both the partial correlation and Pearson correlation are computed every 30 months to minimize the effect of change in correlation coefficients which occurs due to the nonlinearity of the market. The length of 30 months also ensures that the computed correlation coefficients are statistically significant. It is then visualized to address the first question and a simple agglomerative clustering approach is used to explore the second question.

The remaining sections of this paper are organized as follows: Section  2 describes the data set used for analysis and method of how to compute the correlations used in this paper. Section  3 reports the results of the partial correlation analysis and Pearson correlation analysis. In Section  4, clustering analyses are performed. Section  5 provides the conclusion.

Section snippets

Data and methods

The monthly adjusted closing price series of KOSPI and the stocks listed in the index from December of 2004 to December of 2014 are used for this study. The data are provided by DataGuide from FnGuide (http://www.fnguide.com/), a professional financial analytics service for the Korean market. All computation and analytics are done using MATLAB. Total of 732 firms existed throughout this period and trade volume data are used to filter out the ones that did not trade during a period of analysis.

Pearson correlation vs. partial correlation

Fig. 1 shows color maps of Pearson correlation matrix and the partial correlation matrix for entire period from 2005 to 2014. The firms are sorted by market capitalization.

The values are color gradient coded so that blue to white represents negative value (1ρ<0), white for zero (ρ=0), and white to red corresponds to positive value (0<ρ1). It is visibly clear that the Pearson correlation matrix has much more red which implies it mostly consists of positive values. However, the partial

Clustering analysis

The partial correlation analysis from the previous section proposes that the interconnection between firms is different from what it normally seems. This section is dedicated to study the proximity between firms using partial correlation analysis to see which firms are close to each other. Agglomerative clustering analysis is performed with the Euclidean distance between correlation coefficients as a distance measure to determine the proximity between objects, which are firms in this case. To

Conclusion

In this paper, the partial correlation analysis is performed on the Korean stock market. The partial correlation analysis is an analysis of co-movement of two random variables when common factors are controlled. This is an important aspect to consider in Econophysics as many financial time series are subjected to common factors which may skew the results of an analysis.

In the Korean market, the market index, KOSPI, has a strong influence on stocks, driving correlation between them higher. By

References (25)

  • R. Cont

    Empirical properties of asset returns: stylized facts and statistical issues

    Quant. Finance

    (2001)
  • X. Gabaix et al.

    A theory of power-law distributions in financial market fluctuations

    Nature

    (2003)
  • Cited by (31)

    • Forecasting price movements of global financial indexes using complex quantitative financial networks[Formula presented]

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Christodoulakis [32] suggested an evolving correlation matrix and found stock co-movement based on a correlation coefficient. Jung et al. [33] analyzed the financial market structure by performing agglomerative hierarchical clustering through correlation analyses. However, a true correlation is difficult to achieve based on empirical correlation in the financial market due to white noise and market-wide movement [16].

    • Clustering framework based on multi-scale analysis of intraday financial time series

      2021, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      In recent years, with the growth of financial market data and the improvement of computing power, researchers and investors are taking advantage of various data mining and artificial intelligence technologies to find the basic laws in financial market and investment opportunities from huge financial data. There are many studies on applying data mining and artificial intelligence technologies to financial field, such as applying deep learning models to financial prediction [11,12], utilizing text mining technologies to monitor financial market sentiment [13,14], using various clustering algorithms to classify different stocks [15,16], and developing new machine learning models dedicated to financial forecasting or stock selection [17–19]. However, in most studies related to applying data mining and artificial intelligence technologies to financial field, the financial data analyzed are usually daily rather than intraday [18,19].

    • Time-varying comovement and changes of comovement structure in the Chinese stock market: A causal network method

      2019, Economic Modelling
      Citation Excerpt :

      Our study complements the literature by extending the analysis to the comovement of individual stocks in the Chinese stock market, an emerging market, to provide additional empirical evidence of the patterns of stock returns' comovement. The methods used to measure the comovements of asset prices include correlation (e.g., Forbes and Rigobon, 2002; Tse et al., 2010), partial correlation (Jung and Chang, 2016), cointegration relationships (Awokuse et al., 2009; Tu, 2014), variance decomposition (Diebold and Yılmaz, 2014), and causal linkages (Masih and Masih, 1999; Billio et al., 2012). Recently, some studies have used complex network topology to address this issue (Mantegna, 1999; Bonanno et al., 2001; Tse et al., 2010; Peralta and Zareei, 2016; Billio et al., 2012; Tu, 2014).

    • Cluster analysis on the structure of the cryptocurrency market via Bitcoin–Ethereum filtering

      2019, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      From the work of [10], the complex system also possesses a large portion of Econophysics by analyzing the structure of various financial markets. Especially, many correlation-based approaches have discovered the collective market behavior [11–15], clustering phenomenon [16–21], topology of financial markets based on the minimum spanning tree(MST) [18,22–26], and its application to the portfolio management [27–30]. In particular, the studies on the crisis-driven structural changes in the financial markets have presented high correlations during the crisis [31].

    View all citing articles on Scopus
    View full text