Next Article in Journal
Sectorial Fuzzy Controller Plus Feedforward for the Trajectory Tracking of Robotic Arms in Joint Space
Previous Article in Journal
Nonlocal Sequential Boundary Value Problems for Hilfer Type Fractional Integro-Differential Equations and Inclusions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model

School of Mathematics Science, Shanghai Jiao Tong University, Shanghai 200240, China
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(6), 614; https://doi.org/10.3390/math9060614
Submission received: 23 January 2021 / Revised: 9 March 2021 / Accepted: 12 March 2021 / Published: 15 March 2021

Abstract

:
Gini covariance plays a vital role in analyzing the relationship between random variables with heavy-tailed distributions. In this papaer, with the existence of a finite second moment, we establish the Gini–Yule–Walker equation to estimate the transition matrix of high-dimensional periodic vector autoregressive (PVAR) processes, the asymptotic results of estimators have been established. We apply this method to study the Granger causality of the heavy-tailed PVAR process, and the results show that the robust transfer matrix estimation induces sign consistency in the value of Granger causality. Effectiveness of the proposed method is verified by both synthetic and real data.

1. Introduction

The heavy-tailed distribution is graphically thicker in the tails and sharper in the peaks than the normal distribution. Intuitively, this means that the probability of extreme values is greater for these data than for normally distributed data. The heavy-tailed phenomenon has been encountered empirically in various fields: physics, earth sciences, economics and political science, etc. Periodicity is a wave-like or oscillatory movement around a long-term trend presented in a time series. It is well-known that cyclicality caused by business and economic activities is usually different from trend movements, not in a single direction of continuous movement, but in alternating fluctuations between ups and downs. When these components, trend and cyclicality, do not evolve independently, traditional differencing filters may not be suitable (see for example, Franses and Paap [1], Bell et al. [2], Aliat and Hamdi [3]). Periodic time series models are used to model periodically stationary time series. Periodic vector autoregressive (PVAR) models extend the classical vector autoregressive (VAR) models by allowing the parameters to vary with the cyclicality.
X i T + v = k = 1 q ( v ) α k ( v ) X i T + v k + ϵ i T + v .
For fixed v and predetermined value T, the random vector X i T + v represents the realization during the vth season, with v { 1 , , T } , at year i , i { 1 , , n } . The PVAR model order at season v is given by q ( v ) , whereas α k ( v ) , k = 1 , , v , represent the PVAR model coefficients during season v. The error process ϵ = ϵ i T + v corresponds to a periodic white noise, with E ϵ i T + v = 0 and var ϵ i T + v = σ 2 ( v ) I , where 0 is a zero vector of order d and I is a unit matrix of order d.
This paper seeks to establish a PVAR model to simulate the heavy-tailed time series. Let random vectors X 1 , . . . , X n be from a stationary stochastic process ( X k ) k = , X k = ( X k T + 1 , , X k T + T ) , where the transpose (indicated by ⊤) of a row vector is a column vector.
We consider first-order PVAR model,
X k T + v = A X k T + v 1 + E k T + v ,
where k { 1 , , n } , v { 2 , , T } , T is the period of X k , E k T + v denotes the latent heavy-tailed innovation, and A is the transition matrix. In this paper, we assume that the entries of the heavy-tailed innovation E k T + v = ( E k T + v 1 , , E k T + v d ) obey a power-law distribution with P ( | E k T + v i | > x ) C x α i and α i 2 , then the second moment is finite and the third moment is infinite. From the Equation (2) we know that, given k, the vector X k follows a first-order vector autoregressive (VAR). We call the first-order PVAR process stable if all eigenvalues of transition matrix A have modulus less than 1, the condition is equivalent to det I d A z 0 for all | z | 1 (see for example, Section 2.1 in Lütkepohl [4]). The transition matrix characterizes the temporal dependence for sequence data, and plays a fundamental role in forecasting. Moreover, the zero and nonzero entries of the transition matrix is often closely related to Granger causality. This manuscript focuses on estimating the transition matrix of high-dimensional heavy-tailed PVAR processes.
PVAR models have been extensively studied under the Gaussian assumption, the Gaussian PVAR models assume that the latent innovations are independent identity distribution Gaussian random vectors. Under this model, there are two kinds of methods to estimate the transition matrix under high dimensional setting, one is the Lasso-based estimation procedures, see [5,6,7,8], and the other is Dantzig-selector-type estimators, see [9,10,11,12]. Under the non-Gaussian VAR process, Qiu et al. [13] proposed a quantile-based dantzig-selector-type estimator of the transition matrix for elliptical VAR process. Wong et al. [14] provided an alternative proof of the consistency of the Lasso for sparse non-Gaussian VAR models. Maleki et al. [15] extended the multivariate setting of autoregressive process, by considering the multivariate scale mixture of skew-normal distributions for VAR innovations.
The statistical second-order information contained in the data is usually expressed by the variance and covariance, most of the literature dealing with time series measure dependence using the variance and covariance. To investigate the validity of the variance estimates, we need the presence of the fourth order moment of the random variables. For the heavy-tailed nature of financial data, the third-order moments of random variables are usually non-existent. Schechtman and Yitzhaki [16] proposed the concept of Gini covariance, which has been used widely to measure the dependence of heavy-tailed distributions. Let H be the joint distribution of the random variables X and Y with the marginal distribution functions F 1 and F 2 , respectively. The standard Gini covariance is defined as
Gcov ( X , Y ) = cov ( X , F 2 ( Y ) ) and Gcov ( Y , X ) = cov ( Y , F 1 ( X ) ) ,
assuming the random variables with only a finite first moment. The Gini covariance has an advantage when analyzing bivariate data defined by both variate values and ranks of the values. The representation of Gini covariance Gcov ( X , Y ) indicates that it has mixed properties of the variable X and the rank of the variable Y, and thus complements the usual covariance and rank covariance [16,17,18]. In terms of balance between effciency and robustness, Gini covariance plays an important role in measuring association for variables from heavy-tailed distributions [19].
The Yule–Walker equations arise naturally in the problem of linear prediction of any zero-mean weakly stationary process based on a finite number of contiguous observations. The Yule–Walker equations provide a straightforward connection between the autoregressive model parameters and the covariance function of the process. In this paper, relaxing the strong assumption of the existence of higher order moments of the regressors, we use a non-parametric method to estimate the Gini covariance matrix, establish the Gini–Yule–Walker equation to estimate the sparse transition matrix of stationary PVAR processes. The estimator falls into the category of Dantzig-selector-type estimators. With existence of a finite second moment, we investigate the asymptotic behavior of the estimator in high dimensions.
The paper is orginazed as follows: In Section 2, we establish the Gini–Yule–Walker equation and estimate the sample Gini covariance matrix. In Section 3, we derive the convergence rate of transfer matrix estimation. In Section 4, we discuss the characterization and estimation of Granger causality under the heavy-tailed PVAR model. In Section 5, both synthetic and real data are used to demonstrate the empirical performance of the proposed methodology.

2. Model

In this section, the notations are set. Then, we establish the Gini–Yule–Walker equation, obtain simple non-parametric estimators for Gini covariance matrix and investigate the convergence rate of the sample Gini covariance matrix.

2.1. Notation

Let v = ( v 1 , . . . , v d ) be a d-dimensional real vector, and M = [ M j i ] R d 1 × d 2 be a d 1 × d 2 matrix. For 0 < q < , we define the vector l q norm of v as || v || q = ( j = 1 d v j q ) 1 / q , and the vector l norm of v as || v || = max j = 1 d v j . Let the matrix l 1 norm of M as || M || 1 = max i j = 1 d M j i , the matrix l norm || M || = max j i = 1 d M j i , X k = ( X 1 k , . . . , X d k ) and Y k = ( Y 1 k , . . . , Y d k ) be two random vectors.

2.2. Gini–Yule–Walker Equation

In this paper, we model the time series vector by a stationary PVAR process under the existence of second moment. For each k , 1 k n , X 1 k , . . . , X T k R d follow a lag-one VAR process, with E t k independent of X ( t 1 ) k , and E ( X t k ) = 0 , E ( E t k ) = 0 .
We define
X k = ( X 1 k , · · · , X ( T 1 ) k ) ,
this VAR process may be performed by concatenating the equation systems to analyze the following equation,
y k = Y k β + ε k ,
where Y k = X k I d , and
y k = y 1 k y 2 k y d k , y i k = X 2 i k X 3 i k X T i k , β = β 1 β 2 β d , β i = A i 1 A i 2 A i d , ε k = ε 1 k ε 2 k ε d k , ε i k = E 2 i k E 3 i k E T i k ,
the ⊗ is a Kronecker product operator.
Since the matrix Y k is a block diagonal matrix, the estimation problem can be decomposed into d independent sub-problems. Let us consider the i-th equation of the system.
X 2 i k X 3 i k X T i k = X 11 k X 12 k X 1 d k X 21 k X 22 k X 2 d k X ( T 1 ) 1 k X ( T 1 ) 2 k X ( T 1 ) d k A i 1 A i 2 A i d + E 2 i k E 3 i k E T i k ,
the above equation system can be considered as a multiple regression
X t i k = X ( t 1 ) 1 k A i 1 + X ( t 1 ) 2 k A i 2 + . . . + X ( t 1 ) d k A i d + ϵ t i k , t { 2 , 3 , , T } ,
this equation system can be abbreviated as
z i k = x 1 k A i 1 + x 2 k A i 2 + . . . + x d k A i d + ϵ ˜ i k ,
with T 1 samples to estimate the i-th line of transition matrix A.
Let F ( x i k ) be the distribution of x i k , we assume the independence between ϵ ˜ i k and F ( x i k ) , for i = 1 , 2 , . . . , d . Then we get the Gini covariance matrix equation issue from Equation (5),
cov ( z i k , F ( x 1 k ) ) cov ( z i k , F ( x 2 k ) ) cov ( z i k , F ( x d k ) ) = cov ( x 1 k , F ( x 1 k ) ) cov ( x 2 k , F ( x 1 k ) ) cov ( x d k , F ( x 1 k ) ) cov ( x 1 k , F ( x 2 k ) ) cov ( x 2 k , F ( x 2 k ) ) cov ( x d k , F ( x 2 k ) ) cov ( x 1 k , F ( x d k ) ) cov ( x 2 k , F ( x k d ) ) cov ( x d k , F ( x d k ) ) A i 1 A i 2 A i d .
From the above equation, we obtain the so called Gini–Yule–Walker Equation
G = G ˜ A ,
where G = 1 n k = 1 n G k , G ˜ = 1 n k = 1 n G ˜ k . The entries of G k are given by G i j k = Gcov ( z j k , x i k ) , and the entries of G ˜ k are given by G ˜ i j k = Gcov ( x j k , x i k ) .

2.3. Sample Gini Covariance Matrix

We use a U statistic method to estimate the Gini covariance matrix G k and G ˜ k . From Equation (6), the elements of the covariance matrix G k and G ˜ k can be divided into two categories: Gini covariance Gcov ( X ˜ k , Y ˜ k ) and Gini mean difference Gcov ( Y ˜ k , Y ˜ k ) , X ˜ k Y ˜ k with X ˜ k z 1 k , z 2 k , , z d k , x 1 k , x 2 k , , x d k , Y ˜ k x 1 k , x 2 k , , x d k .
For the Gini covariance Gcov ( X ˜ k , Y ˜ k ) , we have sample space
{ ( X ˜ 1 k , Y ˜ 1 k ) , ( X ˜ 2 k , Y ˜ 2 k ) , , ( X ˜ ( T 1 ) k , Y ˜ ( T 1 ) k ) } .
The i-th ordered variable of X ˜ k is expressed by X ˜ i : ( T 1 ) k and the associated variable of Y ˜ k (matched with X ˜ i : ( T 1 ) k ) is expressed by Y ˜ [ i : ( T 1 ) ] k , which is the concomitant of the i-th order statistic. In this set-up, in the context of non-parametric estimation of a regression function, Yang [20] proposed a statistic of the form
R F ( T 1 ) ( X ˜ k , Y ˜ k ) = 1 ( T 1 ) i = 1 ( T 1 ) J i ( T 1 ) h X ˜ i : ( T 1 ) k , Y ˜ [ i : ( T 1 ) ] k ,
where J ( . ) is a bounded smooth function, h ( . ) is a real valued function of ( X ˜ k , Y ˜ k ) , and F T 1 ( . ) is the empirical distribution function corresponding to F X ˜ k , Y ˜ k ( . ) . The Gini covariance defined in Equation (3) can be rewritten as
Gcov ( X ˜ k , Y ˜ k ) = 0 0 1 2 x ˜ 2 F Y ˜ k ( y ˜ ) 1 d F X ˜ k , Y ˜ k ( x ˜ , y ˜ ) .
Choosing J F Y ˜ k ( y ˜ ) = 2 F Y ˜ k ( y ˜ ) 1 and h ( x ˜ , y ˜ ) = 1 2 x ˜ , from Equation (7), we obtain an estimator of G ( X ˜ k , Y ˜ k ) as
U ( X ˜ k , Y ˜ k ) = 2 ( T 1 ) 2 i = 1 T 1 ( 2 i T ) X ˜ [ i : ( T 1 ) ] k .
For the Gini mean difference Gcov ( Y ˜ k , Y ˜ k ) , we have sample space { Y ˜ 1 k , Y ˜ 2 k , . . . , Y ˜ ( T 1 ) k } . Let Y ˜ 1 k and Y ˜ 2 k be two independent random variables with distribution function F Y ˜ k , the Gini mean difference Gcov ( Y ˜ k , Y ˜ k ) can be expressed as
Gcov ( Y ˜ k , Y ˜ k ) = 1 2 0 2 y ˜ F Y ˜ k ( y ˜ ) d F Y ˜ k ( y ˜ ) 0 y ˜ d F Y ˜ k ( y ˜ ) = 1 2 E max Y ˜ 1 k , Y ˜ 2 k Y ˜ 1 k .
The estimator of Gcov ( Y ˜ k , Y ˜ k ) based on U-statistics is given by
U ( Y ˜ k , Y ˜ k ) = 1 T 1 2 i = 1 T 1 i < j , j = 2 T 1 k Y ˜ i k , Y ˜ j k ,
where k Y ˜ 1 k , Y ˜ 2 k = 1 4 2 max Y ˜ 1 k , Y ˜ 2 k Y ˜ 1 k Y ˜ 2 k . After some simplification, we obtain
U ( Y ˜ k , Y ˜ k ) = 1 2 ( T 1 ) ( T 2 ) i = 1 T 1 ( 2 i T 2 ) Y ˜ i : ( T 1 ) k ,
where Y ˜ i : ( T 1 ) k is the i -th order statistic based on the sample space { Y ˜ 1 k , , Y ˜ ( T 1 ) k } .

2.4. Convergence Rates of the Estimator U ( X ˜ k , Y ˜ k ) and U ( Y ˜ k , Y ˜ k )

In this subsection, with the truncation method, we use the Bernstein’s inequality to investigate the convergence rates of the estimator U ( X ˜ k , Y ˜ k ) and U ( Y ˜ k , Y ˜ k ) . From Equations (8) and (9), we define γ k = U ( X ˜ k , Y ˜ k ) , η k = U ( Y ˜ k , Y ˜ k ) , 1 k n , and Δ = max { D ( γ k ) , D ( η k ) } , where D ( γ k ) is the variance of the variable γ k .
For analysis, we require the following three assumptions on the time series and the size of variables ( d , T , n ) :
Assumption A1.
From Equation (2), suppose that the entries of the heavy-tailed innovation E k T + v = ( E k T + v 1 , , E k T + v d ) obey a power-law distribution with P ( | E k T + v i | > x ) C x α i and α i 2 , then the second moment is finite and the third moment is infinite.
Assumption A2.
Suppose that log n T 0 , d T n c , c is a finite constant, for T , n .
Assumption A3.
Suppose d n 0 , for n .
Lemma 1.
Let { X k } k Z be a stationary PVAR process from Equation (2), and X 1 , , X n be a sequence of observations from { X k } k Z . Suppose that Assumptions (A1)–(A2) are satisfied. Then, for T and n large enough, with probability no smaller than 1 ( 8 Δ log n d 2 T + 2 n 2 ) , we have
| 1 n k = 1 n ( γ k E γ k ) | d 8 T log n n ,
| 1 n k = 1 n ( η k E η k ) | d 8 T log n n .
Proof. 
Assuming that m and λ are constants greater than 0, and
γ ˜ k = ( γ k E γ k ) 1 | γ k E γ k | m .
Then, γ ˜ k is a bounded random variable and satisfies the property of independent identical distribution, it follows from the Bernstein’s inequality that
P ( | k = 1 n γ ˜ k | λ ) 2 exp λ 2 ( 2 σ 2 + 2 3 m λ ) 1 ,
where σ 2 = n D ( γ ˜ k ) n Δ .
Let m = d T n 8 log n , δ = d 8 T log n n and λ = n δ , we have
P ( | 1 n k = 1 n ( γ k E γ k ) | δ ) n P ( | γ k E γ k | > m ) + P ( | k = 1 n γ ˜ k | n δ ) n D ( γ k ) m 2 + 2 exp 8 d 2 T log n 2 D ( γ k ) + 2 3 d 2 T 8 Δ log n d 2 T + 2 n 2 .
As T and n , assuming log n T 0 and d T n c , then 8 Δ log n d 2 T + 2 n 2 0 .
With similar proof methods, we obtain that
P ( | 1 n k = 1 n ( η k E η k ) | δ ) 8 Δ log n d 2 T + 2 n 2 .
This completes the proof. □
From Equations (5) and (6), we define the sample estimation of Gini covariance matrix G as G ^ = G ^ i j , G ^ i j = 1 n k = 1 n U ( z j k , x i k ) and the sample estimation of Gini covariance matrix G ˜ as G ˜ ^ = G ˜ ^ i j , G ˜ ^ i j = 1 n k = 1 n U ( x j k , x i k ) , 1 i , j d .
Next, we investigate the convergence rates of the estimator G ^ and G ˜ ^ under the l norm.
Lemma 2.
Let { X k } k Z be a stationary PVAR process from Equation (2), and X 1 , , X n be a sequence of observations from { X k } k Z . Suppose that Assumptions (A1)–(A3) are satisfied. Then, for T and n large enough, with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) , we have
|| G ^ G || d 8 T log n n ,
|| G ˜ ^ G ˜ || d 8 T log n n .
Proof. 
Let δ = d 8 T log n n , based on the Lemma 1, we have
P G ^ G δ i , j = 1 d P | G ^ i j G i j | δ d 2 P | G ^ i j G i j | δ = d 2 P | 1 n k = 1 n U ( z j k , x i k ) Gcov ( z j k , x i k ) | δ d 2 P | 1 n k = 1 n ( γ k E γ k ) | δ 8 Δ log n T + 2 d 2 n 2 .
As T and n , assuming log n T 0 , d T n c and d n 0 , then 8 Δ log n T + 2 d 2 n 2 0 .
Next, we study the convergence of sample Gini covariance matrix G ˜ ^ .
Let δ = d 8 T log n n , based on the Lemma 1,
P | G ˜ ^ j j G ˜ j j | δ = P | 1 n k = 1 n U ( x j k , x j k ) Gcov ( x j k , x j k ) | δ = P ( | 1 n k = 1 n ( η k E η k ) | δ ) 8 Δ log n d 2 T + 2 n 2 .
Now, for the off-diagonal entries, we have
P | G ˜ ^ i j G ˜ i j | δ = P | 1 n k = 1 n U ( x j k , x i k ) Gcov ( x j k , x i k ) | δ = P ( | 1 n k = 1 n ( γ k E γ k ) | δ ) 8 Δ log n d 2 T + 2 n 2 .
Combining Equations (15) and (16), we obtain
P G ˜ ^ G ˜ δ i , j = 1 d P | G ˜ ^ i j G ˜ i j | δ d 2 8 Δ log n d 2 T + 2 n 2 = 8 Δ log n T + 2 d 2 n 2 .
This completes the proof. □

3. Theoretical Properties

Using the Gini covariance matrix Equation (6), we propose to estimate A by
A ^ = arg min M R d × d || M j k || s . t . || G ^ G ˜ ^ M || λ ˜ .
The above optimization problem can be further decomposed into d subproblems. The i-th row of transition matrix A by
A ^ i * = arg min v R d × 1 || v || 1 s . t . || G ^ * i G ˜ ^ v || λ ˜ .
Compared to the lasso-type procedures, the proposed method can be solved in parallel and Equation (17) can be solved efficiency using the parametric simplex method for sparse learning in [21].
Based on Lemma 2, we can further deliver the rates of convergence of A ^ under the matrix l norm and l 1 norm. We start with some additional notation. For α [ 0 , 1 ) , η > 0 , and M d > 0 that may scale with d, we define the matrix class
M ( α , η , M d ) = { M R d × d : max 1 j d k = 1 d | M j k | α η , || M || 1 M d } .
M ( α , η , M d ) requires the transition matrices are sparse in rows. If α = 0 , then the maximum number of nonzeros in rows of transition matrice is at most η . M ( α , η , M d ) is also investigated in [12].
Theorem 1.
Let { X k } k Z be a stationary PVAR process from Equation (2). Suppose that Assumptions (A1)–(A3) are satisfied. The transition matrix A M ( α , η , M d ) , if we choose the tuning parameter λ ˜ = ( 1 + M d ) d 8 T log n n . Then, for T and n large enough, with probability no small than 1 ( 8 Δ log n T + 2 d 2 n 2 ) , we have
|| A ^ A || 2 λ ˜ || ( G ˜ ) 1 || 1 ,
|| A ^ A || 1 4 η ( 2 λ ˜ || G ˜ 1 || 1 ) 1 α .
Proof. 
We first show that with large probability, A is feasible to the optimization problem. By the Gini–Yule–Walker equation, we have
|| G ˜ ^ A G ^ || = || G ˜ ^ G ˜ 1 G G ^ || = || G ˜ ^ G ˜ 1 G G + G G ^ || || ( G ˜ ^ G ˜ 1 I ) G || + || G G ^ || || G ˜ ^ G ˜ || || A || 1 + || G G ^ || || G ˜ ^ G ˜ || M d + || G G ^ || .
The last inequality is due to A M ( α , η , M d ) , by Lemma 2, with probability no small than 1 ( 8 Δ log n T + 2 d 2 n 2 ) that,
|| G ˜ ^ A G ^ || ( 1 + M d ) d 8 T log n n = λ ˜ .
A is feasible in the optimization equation, by checking the Equation (17), we have || A ^ || A with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) .
Next, we prove Equation (20). Let A = G ˜ 1 G , we have
|| A ^ A || = || A ^ G ˜ 1 G || = || G ˜ 1 ( G ˜ A ^ G ) || = || G ˜ 1 ( G ˜ A ^ G ˜ ^ A ^ + G ^ G + G ˜ ^ A ^ G ^ ) || || G ˜ 1 || 1 ( || G ˜ G ˜ ^ || || A ^ || 1 + || G ^ G || + || G ˜ ^ A ^ G ^ ) || ) 2 λ ˜ || G ˜ 1 || 1
with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) .
Let λ 1 0 and S j = { m : | A j m | λ 1 } , j = 1 , , d . Define
D ˜ ( λ 1 ) = max j d j = 1 d min | A j m | λ 1 , 1 .
Then, we have
|| ( A ^ A ) j * || 1 || A ^ j , S j c || 1 + || A j , S j c || 1 + || ( A ^ A ) j , S j || 1 = || A j * || 1 || A ^ j , S j || 1 + || A j , S j c || 1 + || ( A ^ A ) j , S j || 1 2 || A j , S j c || 1 + 2 || ( A ^ A ) j , S j || 1 2 || A j , S j c || 1 + 2 || A ^ A || · | S j | 2 D ˜ ( λ 1 ) ( λ 1 + 2 λ ˜ || G ˜ 1 || 1 ) .
Suppose max j i = 1 d A i j α η and setting λ 1 = 2 λ ˜ || G ˜ 1 || 1 , we have
λ 1 S j = max 1 j d i = 1 d min A i j , λ 1 λ 1 max 1 j d i = 1 d min A i j α / λ 1 α , 1 λ 1 1 α η .
Therefore, we have
|| ( A ^ A ) j * || 1 4 λ 1 D ˜ ( λ 1 ) 4 η λ 1 1 α = 4 η ( 2 λ ˜ || G ˜ 1 || 1 ) 1 α .
Since the above equation holds for any j = 1 , , d , we complete the proof.

4. Granger Causality

In this section, the practical example is conducted to verify the effectiveness of the proposed methods, moreover, the characterization and estimation of Granger causality under the heavy-tailed PVAR model are discussed. Firstly, we give the definition of Granger causality.
Definition 1.(Granger [22]) Let X t Z be a stationary process, where X t = X t 1 , , X t d T . For j k { 1 , , d } , X t k t Z Granger causes X t j t Z if and only if there exists a measurable set A such that
P X ( t + 1 ) j A X s s t P X ( t + 1 ) j A X s , \ k s t
for all t Z , where X s , \ k is the subvector obtained by removing X s k from X s
For a Gaussian VAR process X t t Z , we have that X t k t Z Granger causes X t j t Z if and only if the ( j , k ) entry of the transition matrix is non-zero [4]. For the heavy-tailed PVAR process, let X k k Z be a stationary PVAR process from Equation (2), we define
X ˇ t = 1 n k = 1 n X t k , E ˇ t = 1 n k = 1 n E t k , 1 t T .
In the next theorem, we show that a similar property holds for the heavy-tailed PVAR process.
Theorem 2.
Let X k k Z be a stationary PVAR process from Equation (2). Suppose that Assumptions (A1)–(A3) are satisfied, and Gcov X ˇ t i , X ˇ s i X ˇ s , \ t s t 0 for any i { 1 , , d } . Then, for j i { 1 , , d } , we have
1. 
If A j i 0 , then X ˇ t i t Z Granger causes X ˇ t j t Z .
2. 
If we further assume that E ˇ t + 1 is independent of X ˇ s s t for any t Z , we have that X ˇ t i t Z Granger causes X ˇ t j t Z if and only if A j i 0 .
Proof. 
In order to prove Issue 1 , we only need to prove that X ˇ t i t Z doesn’t Granger cause X ˇ t j t Z implies A j i = 0 . Suppose for some t Z , we have
P X ˇ ( t + 1 ) j A X ˇ s s t = P X ˇ ( t + 1 ) j A X ˇ s , \ i s t
for any measurable set A. The above equation implies that conditioning on X ˇ s , \ i s t , X ˇ ( t + 1 ) j is independent of X ˇ s j s t . Hence, we have
Gcov X ˇ ( t + 1 ) j , X ˇ t i X ˇ s , \ i s t = 0 .
Plugging X ˇ ( t + 1 ) j = i = 1 d A j i X ˇ t i + E ˇ ( t + 1 ) j into the above equation, we have
0 = Gcov A j i X ˇ t i , X ˇ t i X ˇ s , \ i s t + Gcov i j A j i X ˇ t i , X ˇ t j X ˇ s , \ i s t + Gcov E ˇ ( t + 1 ) j , X ˇ t i X ˇ s , i s t .
The second term on the right hand side is 0 , since given X ˇ s i s t , i j A j i X ˇ t i is constant. Since E ˇ t k and X ˇ s s < t are independent for any t Z , we have Gcov E ˇ ( t + 1 ) j , X ˇ s i = 0 for any s t . Using Theorem 2.18 in Fang et al. [23], we have the third term is also 0 . Thus, we have Gcov X ˇ t i , X ˇ t i X ˇ s , \ i s t = 0 and hence A j i = 0 . This proves Issue 1.
Given Issue 1 , to prove Issue 2 , it remains to prove that A j i = 0 implies that X ˇ t i t Z doesn’t Granger cause X ˇ t j t Z . Since A j i = 0 , we have
p X ˇ ( t + 1 ) j , X ˇ s i s t X ˇ s , \ i s t = p X ˇ ( t + 1 ) j X ˇ s , \ i s t p X ˇ s i s t X ˇ s , \ i s t .
Here p is the conditional probability density function. The last equation is because E t + 1 k is independent of X ˇ s s t , and the fact that i k A j i X ˇ t i is constant given X ˇ s \ i s t . Hence, we have
p X ˇ ( t + 1 ) j , X ˇ s i s t X ˇ s , \ i s t = p X ˇ ( t + 1 ) j X ˇ s , \ i s t p X ˇ s i s t X ˇ s , \ i s t
and thus
p X ˇ ( t + 1 ) j X ˇ s s t = p X ˇ ( t + 1 ) j X ˇ s , \ i s t .
Remark 1.
The assumption that Gcov X ˇ t i , X ˇ s \ i X ˇ s , \ i s t 0 requires that X ˇ t i cannot be perfectly predictable from the past or from the other observed random variables at time t. Otherwise, we can simply remove X ˇ t i t Z from the process X ˇ t t Z , since predicting X ˇ t i t Z is trivial. Assuming that E ˇ t + 1 is independent of X ˇ s s t for any t Z , the Granger causality relations among the processes X ˇ j t t Z : j = 1 , , d is characterized by the non-zero entries of A . To estimate the Granger causality relations, we define A ˜ = A ˜ j i , where
A ˜ j i : = A ^ j i I A ^ j i γ
for some threshold parameter γ. To evaluate the consistency between A ˜ and A regarding sparsity pattern, we define function sign ( x ) : = I ( x > 0 ) I ( x < 0 ) . For a matrix M, define sign ( M ) : = sign M j i . The next theorem gives the rate of γ such that A ˜ recovers the sparsity pattern of A with high probability.
Theorem 3.
Let X k k Z be a stationary PVAR process from Equation (2). Suppose that Assumptions 1–3 are satisfied. The transition matrix A M ( α , η , M d ) , if we set
γ = 2 λ ˜ || ( G ˜ ) 1 || 1 ,
then, with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) , we have sign ( A ˜ ) = sign ( A ) , provided that
min ( j , i ) : A j i > 0 A j i 2 γ .
Proof. 
The proof is a consequence of Theorem 1. In detail, if A j i > 0 , by Equation (25), we have A j i 2 γ . By Theorem 1, with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) , we have A ^ j i A j i γ . Thus, we have A ^ j i γ with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) . By the definition of A, we have A ˜ j i = A ^ j i γ > 0 .
If A j i < 0 , by Equation (25), we have A j i 2 γ . Using Theorem 1, we have A ^ j i γ with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) , By the definition of A ˜ , we have A ˜ j i = A ^ j i γ < 0 .
If A j i = 0 , using Theorem 1, we have A ^ j i < γ with probability no smaller than 1 ( 8 Δ log n T + 2 d 2 n 2 ) , since P A ^ j i = γ = 0. By the definition of A ˜ , we have A ˜ j i = 0 .

5. Experiments

This section provides some numerical results on synthetic and real data. We consider Lasso and Dantzig selector for comparison.
  • Lasso: an l 1 regularized estimator defined as
    A ^ = argmin 1 n k = 1 n M R d × d || Y k M X k || F 2 + ρ i j M i j ,
    where Y k : = X 1 k , , X ( T 1 ) k R d × ( T 1 ) , X k : = X 2 k , , X T k R d × ( T 1 ) .
  • Dantzig selector: the estimator proposed in [12],
    A ^ = arg min M R d × d j k | M j k | s . t . 1 n k = 1 n || S k M S 1 k || m a x λ ,
    S k and S 1 k be the marginal and lag one sample covariance matrices of Equation (2).
  • G-Dantzig selector: the estimator described in Equation (17).

5.1. Synthetic Data

In this subsection, we compare the performance of our method with the lasso and Dantzig under synthetic data. The “flare” package in R is used to create the transition matrix according to three patterns: cluster, hub, random. We rescale A such that || A || 2 = 0.8 , generate Σ such that || Σ || 2 = 2 || A || 2 , then calculate the covariance matrix Ψ of the noise vector E t as Ψ = Σ A T Σ A . Let the periodical time series length n = 2000 , the period length T = 100 and the dimension d = 150 , with A , Σ and Ψ , we simulate periodical time series according to the model described in Equation (2). Specifically, we consider three models:
  • Model 1: Data generated from Equation (2), where the errors E t N ( 0 , Ψ ) . There is no outliers under Model 1.
  • Model 2: Data generated from Equation (2), where the errors E t t ( 0 , Ψ , 2 ) . No outliers appear and the tail of the error distribution is heavier than that of the normal distribution.
  • Model 3: Data generated from Equation (2) with the same error distribution as Model 1. Then replace X i by X i + ( 1 ) I ( U 1 < 1 / 2 ) ( 20 + 10 U 2 ) I for i = 1 , . . . , T / 5 , where U 1 and U 2 are independently generated from U [ 0 , 1 ] distribution, there are T / 5 outliers that deviate from the majority of the observations.
The tuning parameter λ is chosen by cross validation. We construct 20,000 replicates and compare the three methods described above. Table 1 presents averaged estimation errors under three matrix norms. From this table, we have the following findings: Under the Gaussian model (Model 1), G-Dantzig selector has comparable performance as Dantzig selector and out performs Lasso. Under Model 2 and 3, our methods are more stable than Lasso and Dantzig selector. Thus, we conclude that the G-Dantzig selector is robust to the heavy-tailedness of data and the possible presence of outliers.
Figure 1 plots the prediction errors against sparsity for the three transition matrix estimators. We observe that G-Dantzig selector achieves smaller prediction errors compared to Lasso and Dantzig selector.

5.2. Real Data

We further compared the three methods on a real equity data from Yahoo Finance. We collected the 10-minutes time intervals price in Monday of each week for 50 stocks with the highest volatility that were consistently in the S & P 500 index from 1 January 2008 to 31 December 2020. Then, the periodical time series length was n = 672 , the period length was T = 36 and the dimension was d = 50 . Since we chose the data points on the Monday of each week, the data points can be seen as independent of each other. Estimations of the transition matrix A ^ s were obtained by the Lasso, Dantzig selector and G-Dantzig selector, where s [ 0 , 1 ] is the fraction of non-zero entries of A ^ s and can be controled by the tuning parameters λ and ρ . We define the prediction error associated with A ^ s to be
ϵ s : = 1 n k = 1 n 1 T 1 t = 2 T || X t k A ^ s X ( t 1 ) k || 2 .

6. Conclusions

In this paper, we developed a Gini–Yule–Walker equation for modeling and estimating the heavy-tailedness data and the possible presence of outliers in high dimensions. Our contributions are three-fold. (i) At the model level, we generalized the Gaussian process to time series with the existence of merely second order moments. (ii) Methodologically, we proposed a Gini–Yule–Walker-based estimator of the transition matrix. Experimental results demonstrate that the proposed estimator is robust to heavy-tailedness of data and the possible presence of outliers. (iii) Theoretically, we proved that the adopted method yields a parametric convergence rate in the matrix l and l 1 norm. In this manuscript, we focused on the stationary vector autoregressive model and our method is designed for such stationary process. The stationary requirement is a common assumption in analysis and is adopted by most recent works, for example, see and [14,24]. We notice that there are works in handling time-varying PVAR models, checking for example in [25]. We would like to explore this problem in the future.

Author Contributions

Conceptualization, J.Z. and D.H.; methodology, D.H. and J.Z.; software, J.Z.; validation, J.Z.; formal analysis, J.Z.; investigation, J.Z.; resources, J.Z.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z.; visualization, J.Z.; supervision, J.Z. and D.H.; project administration, J.Z.; funding acquisition, D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundations of China [grant number 11531001].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Franses, P.H.; Paap, R. Periodic Time Series Models; Oxford U Press: Oxford, UK, 2004. [Google Scholar]
  2. Bell, W.R.; Holan, S.H.; McElroy, T.S. Economic Time Series: Modeling and Seasonality; Chapman and Hall/CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
  3. Aliat, B.; Hamdi, F. On Markov-switching periodic ARMA models. Commun. Stat. Theory Methods. 2018, 47, 344–364. [Google Scholar] [CrossRef]
  4. Lütkepohl, H. New Introduction to Multiple Time Series Analysis; Springer Science & Business Media: Berlin, Germany, 2005. [Google Scholar]
  5. Baek, C.; Davis, R.A.; Pipiras, V. Sparse seasonal and periodic vector autoregressive modeling. Comput. Stat. Data Anal. 2017, 106, 103–126. [Google Scholar] [CrossRef]
  6. Gao, W.; Yang, H.; Yang, L. Change points detection and parameter estimation for multivariate time series. Soft Comput. 2020, 24, 6395–6407. [Google Scholar] [CrossRef]
  7. Basu, S.; Michailidis, G. Regularized estimation in sparse high-dimensional time series models. Ann. Stat. 2015, 43, 1535–1567. [Google Scholar] [CrossRef] [Green Version]
  8. Bai, P.; Safikhani, A.; Michailidis, G. Multiple Change Points Detection in Low Rank and Sparse High Dimensional Vector Autoregressive Models. IEEE Trans. Signal Process. 2020, 68, 3074–3089. [Google Scholar] [CrossRef]
  9. Han, F.; Liu, H. Transition Matrix Estimation in High Dimensional Vector Autoregressive Models. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 172–180. [Google Scholar]
  10. Hong, D.; Gu, Q.; Whitehouse, K. High-dimensional time series clustering via cross-predictability. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 642–651. [Google Scholar]
  11. Chen, X.; Xu, M.; Wu, W.B. Regularized estimation of linear functionals of precision matrices for high-dimensional time series. IEEE Trans. Signal Process. 2016, 64, 6459–6470. [Google Scholar] [CrossRef]
  12. Han, F.; Lu, H.; Liu, H. A Direct Estimation of High Dimensional Stationary Vector Autoregressions. J. Mach. Learn. Res. 2015, 16, 3115–3150. [Google Scholar]
  13. Qiu, H.; Xu, S.; Han, F.; Liu, H.; Caffo, B. Robust estimation of transition matrices in high dimensional heavy-tailed vector autoregressive processes. In Proceedings of the International Conference on Machine Learning. International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1843–1851. [Google Scholar]
  14. Wong, K.C.; Li, Z.; Tewari, A. Lasso guarantees for mixing heavy-tailed time series. Ann. Stat. 2020, 48, 1124–1142. [Google Scholar] [CrossRef]
  15. Maleki, M.; Wraith, D.; Mahmoudi, M.R.; Contreras-Reyes, J.E. Asymmetric heavy-tailed vector auto-regressive processes with application to financial data. J. Stat. Comput. Simul. 2020, 90, 324–340. [Google Scholar] [CrossRef]
  16. Schezhtman, E.; Yitzhaki, S. A measure of association based on gin’s mean difference. Commun. Stat. Theory Methods 1987, 16, 207–231. [Google Scholar] [CrossRef]
  17. Schechtman, E.; Yitzhaki, S. On the proper bounds of the Gini correlation. Econ. Lett. 1999, 63, 133–138. [Google Scholar] [CrossRef]
  18. Schechtman, E.; Yitzhaki, S. A family of correlation coefficients based on the extended Gini index. J. Econ. Inequal. 2003, 1, 129–146. [Google Scholar] [CrossRef]
  19. Yitzhaki, S.; Schechtman, E. The Gini Methodology: A Primer on a Statistical Methodology; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
  20. Yang, S.S. Linear functions of concomitants of order statistics with application to nonparametric estimation of a regression function. J. Am. Stat. Assoc. 1981, 76, 658–662. [Google Scholar] [CrossRef]
  21. Pang, H.; Liu, H.; Vanderbei, R. The fastclime package for linear programming and large-scale precision matrix estimation in R. J. Mach. Learn. Res. JMLR 2014, 15, 89–493. [Google Scholar]
  22. Granger, C.W. Testing for causality: A personal viewpoint. J. Econ. Dyn. Control. 1980, 2, 329–352. [Google Scholar] [CrossRef]
  23. Fang, K.W. Symmetric Multivariate and Related Distributions; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  24. Song, S.; Bickel, P.J. Large vector auto regressions. arXiv 2011, arXiv:1106.3915. [Google Scholar]
  25. Haslbeck, J.M.; Bringmann, L.F.; Waldorp, L.J. A Tutorial on Estimating Time-Varying Vector Autoregressive Models. Multivar. Behav. Res. 2020, 4, 1–30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Prediction errors in stock prices plotted against the sparsity of the estimated transition matrix.
Figure 1. Prediction errors in stock prices plotted against the sparsity of the estimated transition matrix.
Mathematics 09 00614 g001
Table 1. Comparison of estimation errors of three methods under different setups. The standard deviations are shown in parentheses. Here L 1 , L , L F are the L 1 , L and Frobenius matrix norms respectively.
Table 1. Comparison of estimation errors of three methods under different setups. The standard deviations are shown in parentheses. Here L 1 , L , L F are the L 1 , L and Frobenius matrix norms respectively.
LassoDantzigOur Method
L 1 L L F L 1 L L F L 1 L L F
Model 1 c l u s t e r 1.131.621.671.031.551.581.231.741.79
(0.17)(0.16)(0.55)(0.16)(0.07)(0.03)(0.27)(0.37)(0.43)
h u b 1.372.162.391.181.251.361.091.171.28
(0.34)(0.45)(0.22)(0.17)(0.07)(0.08)(0.46)(0.28)(0.09)
r a n d o m 1.412.582.180.971.081.751.461.171.95
(0.43)(0.67)(0.17)(0.36)(0.73)(0.18)(0.13)(0.25)(0.15)
Model 2 c l u s t e r 1.251.761.791.131.641.770.950.961.18
(0.64)(0.45)(0.37)(0.21)(0.36)(0.27)(0.38)(0.46)(0.47)
h u b 1.512.761.581.581.641.781.181.591.09
(0.41)(0.56)(0.14)(0.19)(0.43)(0.54)(0.85)(1.12)(0.93)
r a n d o m 1.362.242.171.241.451.851.131.061.07
(0.32)(0.17)(0.17)(0.29)(0.15)(0.09)(0.22)(0.19)(0.27)
Model 3 c l u s t e r 1.542.131.961.121.321.711.051.291.59
(0.29)(0.13)(0.18)(0.25)(0.46)(0.52)(0.17)(0.19)(0.19)
h u b 1.522.671.411.451.911.451.111.291.31
(0.26)(0.18)(0.16)(0.07)(0.09)(0.12)(0.24)(0.06)(0.09)
r a n d o m 1.782.562.211.661.342.151.031.231.76
(0.22)(0.26)(0.54)(0.18)(0.09)(0.12)(0.11)(0.18)(0.06)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zou, J.; Han, D. Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model. Mathematics 2021, 9, 614. https://doi.org/10.3390/math9060614

AMA Style

Zou J, Han D. Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model. Mathematics. 2021; 9(6):614. https://doi.org/10.3390/math9060614

Chicago/Turabian Style

Zou, Jin, and Dong Han. 2021. "Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model" Mathematics 9, no. 6: 614. https://doi.org/10.3390/math9060614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop