Skip to main content
Log in

Efficient computation of the Bergsma–Dassios sign covariance

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In an extension of Kendall’s \(\tau \), Bergsma and Dassios (Bernoulli 20(2):1006–1028, 2014) introduced a covariance measure \(\tau ^*\) for two ordinal random variables that vanishes if and only if the two variables are independent. For a sample of size n, a direct computation of \(t^*\), the empirical version of \(\tau ^*\), requires \(O(n^4)\) operations. We derive an algorithm that computes the statistic using only \(O \left( n^2\log (n)\right) \) operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. See https://cran.r-project.org/web/packages/TauStar/index.html.

  2. R code to reproduce the results of Tables 1, 2 and 3 can be found on the first author’s webpage: http://www.stat.washington.edu/~lucaw/public_resources/eff_comp_2015/tables.R.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Weihs.

Appendices

Appendix 1: Modifications for the V-statistic

This section provides an overview of necessary modifications to Algorithm 2 in order to compute the V-statistic version of \(t^*\). Suppose, as usual, that we have reordered the pairs \((x_1,y_1),\ldots ,(x_n,y_n)\) so that \(x_1\le x_2\le \cdots \le x_n\). Then the V-statistic for \(\tau ^*\) is

$$\begin{aligned} t_V^*&= \frac{1}{n^4}\sum _{1\le i,j,k,l\le n} a(x_i,x_j,x_k,x_l)a(y_i,y_j,y_k,y_l) \\&= \frac{1}{n^4}\left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk} + b_{ijjk} + b_{iijk}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4}\right) \\&= \frac{1}{n^4}\left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk}+ b_{iijk}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4}\right) . \end{aligned}$$

Here, the second equality holds since \(a(x_i,x_j,x_k,x_l)a(y_i,y_j,y_k,y_l)=0\) if any three of ijkl are equal. The third equality holds because \(b_{ijjk}=0\) for all \(i< j< k\); indeed, \(x_i\le x_j \le x_k\) implies that \(b_{ijjk}\) corresponds to an inseparable collection of points. Note that, in the above equations, we have coefficients of \(\frac{1}{2}\) on \(b_{ijkk},b_{iijk}\) and \(\frac{1}{4}\) on \(b_{iikk}\), these are corrective factors to account for the fact that the number of permutations of four elements where exactly two are equal is \(|S_4|/2\) while the number of permutations where exactly two pairs of two are equal is \(|S_4|/4\). Now we may continue to rewrite \(t^*_V\) as

$$\begin{aligned} t_V^*&= \frac{1}{n^4} \left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk}+ b_{iijk}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4} \right) \\&= \frac{1}{n^4} \left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk}}{2}+ \sum _{1\le i< k< l\le n}\frac{b_{iikl}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4} \!\right) \\&= \frac{1}{n^4}\sum _{3\le k\le n} \Bigg (\sum _{k<l\le n}\left( \sum _{1\le i < j<k} b_{ijkl} \!+\! \sum _{1\le i < k}\frac{b_{iikl}}{2}\right) \!+\! \sum _{1\le i< j < k} \frac{b_{ijkk}}{2} \!+\! \sum _{1\le i < k} \frac{b_{iikk}}{4} \Bigg ). \end{aligned}$$

If \(k=n\) then \(\sum _{k<l\le n}\) is the empty sum which we define to equal 0. For a fixed \(k<l\) we know already, from Sect. 3, how to compute \(\sum _{1\le i < j<k} b_{ijkl}\) efficiently using a red–black tree and since \(b_{iikl},b_{ijkk}\), and \(b_{iikk}\) can only correspond to inseparable or concordant quadruples it is easy to see that

$$\begin{aligned} \sum _{1\le i <k} \frac{1}{2} b_{iikl}&= 8\times \left( top(k,l) + bot(k,l)\right) , \end{aligned}$$
(15)
$$\begin{aligned} \sum _{1\le i < j <k} \frac{1}{2} b_{ijkk}&= 8\times \left( {top(k,k) \atopwithdelims ()2} + {bot(k,k)\atopwithdelims ()2}\right) , \end{aligned}$$
(16)
$$\begin{aligned} \sum _{1\le i <k} \frac{1}{4}b_{iikk}&= 4\times \left( top(k,k) + bot(k,k)\right) . \end{aligned}$$
(17)

Thus we may compute \(t_V^*\) by running Algorithm 2 with the following modifications:

  1. (i)

    Change line 9 to

    figure c

    This corresponds to the outer sum of \(t^*_V\).

  2. (ii)

    After line 14 add the lines:

    figure d

    This accounts for the effect of (16) and (17).

  3. (iii)

    Change line 23 to

    figure e

    This corresponds to (15).

  4. (iv)

    Change line 42 to

    figure f

Finally, note that this Algorithm for computing \(t^*_V\) clearly remains \(O(n^2\log (n))\).

Appendix 2: Proof of Lemma 1

By permutation invariance, suppose we have relabeled so that \(x_1\le x_2\le x_3\le x_4\). We have 3 cases:

  1. (i)

    The points in A are inseparable. The fact that \(b_{1234}=0\) is an immediate consequence of Eq. (2).

  2. (ii)

    The points in A are concordant. In this case we must have that \(x_2<x_3\) and either \(\max (y_1,y_2) < \min (y_3,y_4)\) or \(\min (y_1,y_2) > \max (y_3,y_4)\). By symmetry we need only consider the case when \(\max (y_1,y_2) < \min (y_3,y_4)\). By Eq. (2) it follows, with some thought, that \(a(x_{\pi (1,2,3,4)}) = a(y_{\pi (1,2,3,4)})\) for all permutations \(\pi \in S_4\) and thus, for any \(\pi \in S_4\) we have \(a(x_{\pi (1,2,3,4)})a(y_{\pi (1,2,3,4)}) = a(x_{\pi (1,2,3,4)})^2\) with

    $$\begin{aligned} a(x_{\pi (1,2,3,4)})^2&=\left\{ \begin{array}{lll} 1 &{} \quad \text{ if } \max (x_{\pi (1)}, x_{\pi (2)}) < \min (x_{\pi (3)}, x_{\pi (4)})\text { or} \\ &{} \min (x_{\pi (1)}, x_{\pi (2)}) > \max (x_{\pi (3)}, x_{\pi (4)})\text { or}\\ &{} \max (x_{\pi (1)}, x_{\pi (3)}) < \min (x_{\pi (2)}, x_{\pi (4)})\text { or}\\ &{} \min (x_{\pi (1)}, x_{\pi (3)}) > \max (x_{\pi (2)}, x_{\pi (4)}),\\ 0 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$

    But since \(x_1\le x_2 < x_3\le x_4\) we have that \(a(x_{\pi (1,2,3,4)})a(y_{\pi (1,2,3,4)})=1\) if and only if \(\{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\) or \(\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}\). There are exactly \(2^4=16\) such permutations and thus \(b_{1234}=16\).

  3. (iii)

    The points in A are discordant. Once again we must have that \(x_2<x_3\). It then follows, by the definition of discordant, that \(y_1\not =y_2\) and \(y_3\not =y_4\). We prove an intermediary lemma:

Lemma 2

Suppose that \((x_1,y_1),\ldots ,(x_4,y_4)\) are discordant and \(x_1\le x_2< x_3\le x_4\). Let

$$\begin{aligned} (x_5,y_5)&=\! (x_1,y_2),&(x_6,y_6)&=\! (x_2,y_1),&(x_7,y_7)&=\! (x_3,y_3),&(x_8,y_8)&=\! (x_4,y_4), \end{aligned}$$

so that \((x_5,y_5),\ldots ,(x_8,y_8)\) are simply \((x_1,y_1),\ldots ,(x_4,y_4)\) with \(y_1,y_2\) switched. Then \(b_{1234} = b_{5678}\). Moreover, the same result is true if we flipped \(y_3,y_4\) instead of \(y_1,y_2\).

Proof

First note that, trivially, \(a(x_{\pi (1,2,3,4)}) = a(x_{\pi (5,6,7,8)})\) for any \(\pi \in S_4\). Let \(\pi \) be any permutation so that \(a(x_{\pi (1,2,3,4)})^2=1\). From case (ii) we know that we must have \(\{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\) or \(\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}\). Suppose that \(\{\pi (1),\pi (2)\}=\{1,2\}\), and let \(\pi '\in S_4\) be the permutation where

$$\begin{aligned} \pi '(1) = \pi (2),\ \ \pi '(2) = \pi (1),\ \ \pi '(3) = \pi (3),\ \ \pi '(4) = \pi (4). \end{aligned}$$

Then clearly \( a(x_{\pi (1,2,3,4)}) = a(x_{\pi '(1,2,3,4)}) = a(x_{\pi (5,6,7,8)}) = a(x_{\pi '(5,6,7,8)})\) but

$$\begin{aligned} a(y_{\pi (1,2,3,4)})&= a(y_{\pi '(5,6,7,8)}),&a(y_{\pi '(1,2,3,4)})&= a(y_{\pi (5,6,7,8)}), \end{aligned}$$

and thus

$$\begin{aligned}&a(x_{\pi (1,2,3,4)})a(x_{\pi (1,2,3,4)}) + a(x_{\pi '(1,2,3,4)})a(x_{\pi '(1,2,3,4)}) \\&\quad = a(x_{\pi (5,6,7,8)})a(x_{\pi (5,6,7,8)}) + a(x_{\pi '(5,6,7,8)})a(x_{\pi '(5,6,7,8)}). \end{aligned}$$

As we may perform a similar procedure to all \(\pi \in S_4\) with \(a(x_{\pi (1,2,3,4)})^2=1\) (changing the choice of \(\pi '\)), we see that \(b_{1234} = b_{5678}\) as claimed.

Finally, pairing \(\pi \) with \(\pi '\) given by

$$\begin{aligned} \pi '(1) = \pi (1),\ \ \pi '(2) = \pi (2),\ \ \pi '(3) = \pi (4),\ \ \pi '(4) = \pi (3) \end{aligned}$$

shows that this result still holds if we had flipped \(y_3,y_4\) instead of \(y_1,y_2\). \(\square \)

By Lemma 2, we may assume that \(x_1\le x_2<x_3\le x_4\) and \(y_1<y_2\) and \(y_3<y_4\). Note that, by the definition of discordant, we must have that \(y_2 > y_3\) and \(y_1<y_4\). From case (ii) we know that there are only 16 permutations \(\pi \) for which \(a(x_{\pi (1,2,3,4)}) \not = 0\) and they satisfy

$$\begin{aligned} \{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\text { or }\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}. \end{aligned}$$

If \(\{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\) and \(\{\pi (1),\pi (3)\} \in \{\{1,4\},\{2,3\}\}\), then we have \(a(y_{\pi (1,2,3,4)}) = 0\). Similarly, \(a(y_{\pi (1,2,3,4)}) = 0\) if \(\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}\) and \(\{\pi (1),\pi (2)\} \in \{\{1,4\},\{2,3\}\}\). This leaves only 8 permutations \(\pi \in S_4\) for which \(a(x_{\pi (1,2,3,4)})a(y_{\pi (1,2,3,4)})\) may be non-zero, and we check these explicitly:

$$\begin{aligned} a(x_{1,2,3,4})a(y_{1,2,3,4})&= -1\times 1 = -1,&a(x_{2,1,4,3})a(y_{2,1,4,3})&= -1\times 1 = -1, \\ a(x_{3,4,1,2})a(y_{3,4,1,2})&= -1\times 1 = -1,&a(x_{4,3,2,1})a(y_{4,3,2,1})&= -1\times 1 = -1, \\ a(x_{1,3,2,4})a(y_{1,3,2,4})&= 1\times -1 = -1,&a(x_{2,4,1,3})a(y_{2,4,1,3})&= 1\times -1 = -1, \\ a(x_{3,1,4,2})a(y_{3,1,4,2})&= 1\times -1 = -1,&a(x_{4,2,3,1})a(y_{4,2,3,1})&= 1\times -1 = -1. \end{aligned}$$

We conclude that \(b_{1234} = -8\) as claimed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weihs, L., Drton, M. & Leung, D. Efficient computation of the Bergsma–Dassios sign covariance. Comput Stat 31, 315–328 (2016). https://doi.org/10.1007/s00180-015-0639-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0639-x

Keywords

Navigation