research-article

Aggregating inconsistent information: Ranking and clustering

Authors:
Nir Ailon

Google Research, New York, NY

Google Research, New York, NY
View Profile

,
Moses Charikar

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

,
Alantha Newman

DIMACS, Rutgers University, New Brunswick, NJ

DIMACS, Rutgers University, New Brunswick, NJ
View Profile

Authors Info & Claims

Journal of the ACM Volume 55 Issue 5Article No.: 23pp 1–27https://doi.org/10.1145/1411509.1411513

Published:05 November 2008Publication History

Journal of the ACM

Abstract

We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the extent of disagreement with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc set problem on tournaments, and correlation and consensus clustering. We show that for all these problems (and various weighted versions of them), we can obtain improved approximation factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.

References

Ailon, N. 2008. Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica. DOI 10.1007/s00453-008-9211-1. Google ScholarDigital Library
Ailon, N., and Charikar, M. 2005. Fitting tree metrics: Hierarchical clustering and phylogeny. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 73--82. Google ScholarDigital Library
Ailon, N., Charikar, M., and Newman, A. 2005. Aggregating inconsistent information: Ranking and clustering. In Proceedings of the 37th Annual Symposium on the Theory of Computing (STOC) (Boston, MA). ACM, New York, 684--693. Google ScholarDigital Library
Ailon, N., and Mohri, M. 2008. An efficient reduction of ranking to classification. In Conference on Learning Theory (COLT) (Helsinki, Finland).Google Scholar
Alon, N. 2006. Ranking tournaments. SIAM J. Disc. Math. 20, 1, 137--142. Google ScholarDigital Library
Alon, N., and Spencer, J. H. 1992. The Probabilistic Method. Wiley, New York.Google Scholar
Arora, S., Frieze, A., and Kaplan, H. 1996. A new rounding procedure for the assignment problem with applications to dense graph arrangement problems. In Proceedings of the 37th Annual Symposium on the Foundations of Computer Science (FOCS) (Burlington, VT). IEEE Computer Society Press, Los Alamitos, CA, 24--33. Google ScholarDigital Library
Balcan, M.-F., Bansal, N., Beygelzimer, A., Coppersmith, D., Langford, J., and Sorkin, G. B. 2007. Robust reductions from ranking to classification. In Proceedings of the Conference on Learning Theory (COLT). Lecture Notes in Computer Science, vol. 4539. Springer-Verlag, New York, 604--619. Google ScholarDigital Library
Bang-Jensen, J., and Thomassen, C. 1992. A polynomial algorithm for the 2-path problem in semicomplete graphs. SIAM J. Disc. Math. 5, 3, 366--376. Google ScholarDigital Library
Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Mach. Learn. J. (Special Issue on Theoretical Advances in Data Clustering) 56, 1--3, 89--113. (Extended abstract appeared in FOCS 2002, pages 238--247.) Google ScholarDigital Library
Bartholdi, J., Tovey, C. A., and Trick, M. 1989. Voting schemes for which it can be difficult to tell who won the election. Social Choice Welf. 6, 2, 157--165.Google ScholarCross Ref
Borda, J. C. 1781. Mémoire sur les élections au scrutin. Histoire de l'Académie Royale des Sciences.Google Scholar
Cai, M.-C., Deng, X., and Zang, W. 2001. An approximation algorithm for feedback vertex sets in tournaments. SIAM J. Comput. 30, 6, 1993--2007. Google ScholarDigital Library
Charikar, M., Guruswami, V., and Wirth, A. 2003. Clustering with qualitative information. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Boston, MA). IEEE Computer Society Press, Los Alamitos, CA, 524--533. Google ScholarDigital Library
Chaudhuri, K., Chen, K., Mihaescu, R., and Rao, S. 2006. On the tandem duplication-random loss model of genome rearrangement. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). ACM, New York, 564--570. Google ScholarDigital Library
Condorcet, M.-J. 1785. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix.Google Scholar
Coppersmith, D., Fleischer, L., and Rudra, A. 2006. Ordering by weighted number of wins gives a good ranking for weighted tournaments. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). ACM, New York, 776--782. Google ScholarDigital Library
Diaconis, P., and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Stat. Soc., Ser. B 39, 2, 262--268.Google ScholarCross Ref
Dinur, I., and Safra, S. 2002. On the importance of being biased. In Proceedings of the 34th Annual Symposium on the Theory of Compututing (STOC). ACM, New York, 33--42. Google ScholarDigital Library
Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001a. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on the World Wide Web (WWW10) (Hong Kong, China), 613--622. Google ScholarDigital Library
Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001b. Rank aggregation revisited. Manuscript. (Available from: http://www.eecs.harvard.edu/~michaelm/CS222/rank2.pdf.)Google Scholar
Even, G., Naor, J. S., Sudan, M., and Schieber, B. 1998. Approximating minimum feedback sets and multicuts in directed graphs. Algorithmica 20, 2, 151--174.Google ScholarCross Ref
Fagin, R., Kumar, R., and Sivakumar, D. 2003. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (San Diego, CA). ACM, New York, 301--312. Google ScholarDigital Library
Filkov, V., and Skiena, S. 2003. Integrating microarray data by consensus clustering. In Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI) (Sacramento, CA). 418--425. Google ScholarDigital Library
Frieze, A., and Kannan, R. 1999. Quick approximations to matrices and applications. Combinatorica 19, 2, 175--220.Google ScholarCross Ref
Gionis, A., Mannila, H., and Tsaparas, P. 2005. Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering (ICDE) (Tokyo, Japan). Google ScholarDigital Library
Hästad, J. 2001. Some optimal inapproximability results. J. ACM 48, 798--859. Google ScholarDigital Library
Karp, R. M. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Plenum Press, New York, 85--104.Google Scholar
Kemeny, J. G. 1959. Mathematics without numbers. Daedalus 88, 571--591.Google Scholar
Kemeny, J., and Snell, J. 1962. Mathematical Models in the Social Sciences. Blaisdell, New York. (Reprinted by MIT Press, Cambridge, 1972.)Google Scholar
Kenyon-Mathieu, C., and Schudy, W. 2007. How to rank with few errors. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC) (New York, NY). ACM, New York, 95--103. Google ScholarDigital Library
Newman, A. 2000. Approximating the maximum acyclic subgraph. M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
Newman, A., and Vempala, S. 2001. Fences are futile: On relaxations for the linear ordering problem. In Proceedings of the 8th Conference on Integer Programming and Combinatorial Optimization (IPCO). 333--347. Google ScholarDigital Library
Potts, C. N. 1980. An algorithm for the single machine sequencing problem with precedence constraints. Math. Prog. 13, 78--87.Google ScholarCross Ref
Seymour, P. 1995. Packing directed circuits fractionally. Combinatorica 15, 281--288.Google ScholarCross Ref
Speckenmeyer, E. 1989. On feedback problems in digraphs. Graph Theoretic Concepts in Computer Science, Lecture Notes in Computer Science, vol. 411, Springer-Verlag, New York, 218--231. Google ScholarDigital Library
Strehl, A. 2002. PhD dissertation. Ph.D. thesis, University of Texas at Austin.Google Scholar
Wakabayashi, Y. 1998. The complexity of computing medians of relations. Resenhas 3, 3, 323--349.Google Scholar
Williamson, D. P., and van Zuylen, A. 2007. Deterministic algorithms for rank aggregation and other ranking and clustering problems. In Proceedings of the 5th Workshop on Approximation and Online Algorithms (WAOA). Google ScholarDigital Library

Index Terms

Aggregating inconsistent information: Ranking and clustering
1. Mathematics of computing
  1. Discrete mathematics
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Aggregating inconsistent information: ranking and clustering
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing

We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the number of disagreements with the respective inputs. Specifically, the problems we ...
Read More
On the Approximation of Correlation Clustering and Consensus Clustering

The Correlation Clustering problem has been introduced recently [N. Bansal, A. Blum, S. Chawla, Correlation Clustering, in: Proc. 43rd Symp. Foundations of Computer Science, FOCS, 2002, pp. 238-247] as a model for clustering data when a binary ...
Read More
Parameterized algorithms for feedback set problems and their duals in tournaments
Parameterized and exact computation

The parameterized feedback vertex (arc) set problem is to find whether there are k vertices (arcs) in a given graph whose removal makes the graph acyclic. The parameterized complexity of this problem in general directed graphs is a long standing open ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 55, Issue 5
October 2008
164 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1411509
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2008
- Accepted: 1 August 2008
- Revised: 1 May 2008
- Received: 1 January 2006
Published in jacm Volume 55, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Rank aggregation
consensus clustering
correlation clustering
minimum feedback arc-set
tournaments
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 369
  Total Citations
  View Citations
- 3,275
  Total Downloads
- Downloads (Last 12 months)228
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Aggregating inconsistent information: Ranking and clustering

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Aggregating inconsistent information: ranking and clustering

On the Approximation of Correlation Clustering and Consensus Clustering

Parameterized algorithms for feedback set problems and their duals in tournaments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Aggregating inconsistent information: Ranking and clustering

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Aggregating inconsistent information: ranking and clustering

On the Approximation of Correlation Clustering and Consensus Clustering

Parameterized algorithms for feedback set problems and their duals in tournaments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media