Skip to main content

Correlation Clustering

  • Reference work entry
  • First Online:

Synonyms

Clustering with advice; Clustering with constraints; Clustering with qualitative information; Clustering with side information

Definition

In its rawest form, correlation clustering is graph optimization problem. Consider a clusteringC to be a mapping from the elements to be clustered, V, to the set \(\{1,\ldots,\vert V \vert \}\), so that u and v are in the same cluster if and only if C[u] = C[v]. Given a collection of items in which each pair (u, v) has two weights \(w_{uv}^{+}\) and \(w_{uv}^{-}\), we must find a clustering C that minimizes

$$\displaystyle{ \sum _{C[u]=C[v]}w_{uv}^{-} +\sum _{ C[u]\neq C[v]}w_{uv}^{+}, }$$
(1)

or, equivalently, maximizes

$$\displaystyle{ \sum _{C[u]=C[v]}w_{uv}^{+} +\sum _{ C[u]\neq C[v]}w_{uv}^{-}. }$$
(2)

Note that although \(w_{uv}^{+}\) and \(w_{uv}^{-}\) may be thought of as positive and negative evidence towards coassociation, the actual weights are nonnegative.

Motivation and Background

The notion of clustering with advice, that is...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  • Ailon N, Charikar M, Newman A (2005) Aggregating inconsistent information: ranking and clustering. In: Proceedings of the thirty-seventh ACM symposium on the theory of computing. ACM Press, New York, pp 684–693

    Google Scholar 

  • Alon N, Makarychev K, Makarychev Y, Naor A (2006) Quadratic forms on graphs. Invent Math 163(3):499–522

    Article  MathSciNet  MATH  Google Scholar 

  • Arora S, Berger E, Hazan E, Kindler G, Safra S (2005) On non-approximability for quadratic programs. In: Proceedings of forty-sixth symposium on foundations of computer science. IEEE Computer Society, Washington, DC, pp 206–215

    Google Scholar 

  • Bansal N, Blum A, Chawla S (2002) Correlation clustering. In: Correlation clustering. IEEE Computer Society, Washington, DC pp 238–247

    MATH  Google Scholar 

  • Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6:281–297

    Article  Google Scholar 

  • Bertolacci M, Wirth A (2007) Are approximation algorithms for consensus clustering worthwhile? In: Proceedings of seventh SIAM international conference on data mining. SIAM, Philadelphia, pp 437–442

    Google Scholar 

  • Charikar M, Guruswami V, Wirth A (2003) Clustering with qualitative information. In: Proceedings of forty fourth FOCS, Cambridge, pp 524–533

    MATH  Google Scholar 

  • Charikar M, Wirth A (2004) Maximizing quadratic programs: extending Grothendieck’s inequality. In: Proceedings of forty fifth FOCS, Rome, pp 54–60

    Google Scholar 

  • Daume H (2006) Practical structured learning techniques for natural language processing. PhD thesis, University of Southern California

    Google Scholar 

  • Davidson I, Ravi S (2005) Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of fifth SIAM international conference on data mining, Newport Beach

    Google Scholar 

  • Demaine E, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theor Comput Sci 361(2):172–187

    Article  MathSciNet  MATH  Google Scholar 

  • Demaine E, Immorlica N (2003) Correlation clustering with partial information. In: Proceedings of sixth workshop on approximation algorithms for combinatorial optimization problems, pp 1–13

    MATH  Google Scholar 

  • Emanuel D, Fiat A (2003) Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In: Proceedings of eleventh European symposium on algorithms, Budapest, pp 208–220

    MATH  Google Scholar 

  • Ferligoj A, Batagelj V (1982) Clustering with relational constraint. Psychometrika 47(4):413–426

    Article  MathSciNet  MATH  Google Scholar 

  • Finley T, Joachims T (2005) Supervised clustering with support vector machines. In: Proceedings of twenty-second international conference on machine learning, Bonn

    Book  Google Scholar 

  • Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: Proceedings of twenty-first international conference on data engineering, Tokyo

    Book  Google Scholar 

  • Gramm J, Guo J, Hüffner F, Niedermeier R (2004) Automated generation of search tree algorithms for hard graph modification problems. Algorithmica 39(4):321–347

    Article  MathSciNet  MATH  Google Scholar 

  • Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of twenty-second international conference on machine learning, Bonn, pp 457–464

    Google Scholar 

  • McCallum A, Wellner B (2005) Conditional models of identity uncertainty with application to noun coreference. In: Saul L, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 905–912

    Google Scholar 

  • Meilă M (2003) Comparing clusterings by the variation of information. In: Proceedings of sixteenth conference on learning theory, pp 173–187

    MATH  Google Scholar 

  • Shamir R, Sharan R, Tsur D (2004) Cluster graph modification problems. Discr Appl Math 144:173–182

    Article  MathSciNet  MATH  Google Scholar 

  • Swamy C (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of fifteenth ACM-SIAM symposium on discrete algorithms, pp 519–520

    Google Scholar 

  • Tan J (2007) A note on the inapproximability of correlation clustering. Technical report 0704.2092, eprint arXiv, 2007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Wirth, A. (2017). Correlation Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_176

Download citation

Publish with us

Policies and ethics