skip to main content
research-article

Aggregating inconsistent information: Ranking and clustering

Published:05 November 2008Publication History
Skip Abstract Section

Abstract

We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the extent of disagreement with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc set problem on tournaments, and correlation and consensus clustering. We show that for all these problems (and various weighted versions of them), we can obtain improved approximation factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.

References

  1. Ailon, N. 2008. Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica. DOI 10.1007/s00453-008-9211-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ailon, N., and Charikar, M. 2005. Fitting tree metrics: Hierarchical clustering and phylogeny. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ailon, N., Charikar, M., and Newman, A. 2005. Aggregating inconsistent information: Ranking and clustering. In Proceedings of the 37th Annual Symposium on the Theory of Computing (STOC) (Boston, MA). ACM, New York, 684--693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ailon, N., and Mohri, M. 2008. An efficient reduction of ranking to classification. In Conference on Learning Theory (COLT) (Helsinki, Finland).Google ScholarGoogle Scholar
  5. Alon, N. 2006. Ranking tournaments. SIAM J. Disc. Math. 20, 1, 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alon, N., and Spencer, J. H. 1992. The Probabilistic Method. Wiley, New York.Google ScholarGoogle Scholar
  7. Arora, S., Frieze, A., and Kaplan, H. 1996. A new rounding procedure for the assignment problem with applications to dense graph arrangement problems. In Proceedings of the 37th Annual Symposium on the Foundations of Computer Science (FOCS) (Burlington, VT). IEEE Computer Society Press, Los Alamitos, CA, 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Balcan, M.-F., Bansal, N., Beygelzimer, A., Coppersmith, D., Langford, J., and Sorkin, G. B. 2007. Robust reductions from ranking to classification. In Proceedings of the Conference on Learning Theory (COLT). Lecture Notes in Computer Science, vol. 4539. Springer-Verlag, New York, 604--619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bang-Jensen, J., and Thomassen, C. 1992. A polynomial algorithm for the 2-path problem in semicomplete graphs. SIAM J. Disc. Math. 5, 3, 366--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Mach. Learn. J. (Special Issue on Theoretical Advances in Data Clustering) 56, 1--3, 89--113. (Extended abstract appeared in FOCS 2002, pages 238--247.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bartholdi, J., Tovey, C. A., and Trick, M. 1989. Voting schemes for which it can be difficult to tell who won the election. Social Choice Welf. 6, 2, 157--165.Google ScholarGoogle ScholarCross RefCross Ref
  12. Borda, J. C. 1781. Mémoire sur les élections au scrutin. Histoire de l'Académie Royale des Sciences.Google ScholarGoogle Scholar
  13. Cai, M.-C., Deng, X., and Zang, W. 2001. An approximation algorithm for feedback vertex sets in tournaments. SIAM J. Comput. 30, 6, 1993--2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Charikar, M., Guruswami, V., and Wirth, A. 2003. Clustering with qualitative information. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Boston, MA). IEEE Computer Society Press, Los Alamitos, CA, 524--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chaudhuri, K., Chen, K., Mihaescu, R., and Rao, S. 2006. On the tandem duplication-random loss model of genome rearrangement. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). ACM, New York, 564--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Condorcet, M.-J. 1785. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix.Google ScholarGoogle Scholar
  17. Coppersmith, D., Fleischer, L., and Rudra, A. 2006. Ordering by weighted number of wins gives a good ranking for weighted tournaments. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). ACM, New York, 776--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Diaconis, P., and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Stat. Soc., Ser. B 39, 2, 262--268.Google ScholarGoogle ScholarCross RefCross Ref
  19. Dinur, I., and Safra, S. 2002. On the importance of being biased. In Proceedings of the 34th Annual Symposium on the Theory of Compututing (STOC). ACM, New York, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001a. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on the World Wide Web (WWW10) (Hong Kong, China), 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001b. Rank aggregation revisited. Manuscript. (Available from: http://www.eecs.harvard.edu/~michaelm/CS222/rank2.pdf.)Google ScholarGoogle Scholar
  22. Even, G., Naor, J. S., Sudan, M., and Schieber, B. 1998. Approximating minimum feedback sets and multicuts in directed graphs. Algorithmica 20, 2, 151--174.Google ScholarGoogle ScholarCross RefCross Ref
  23. Fagin, R., Kumar, R., and Sivakumar, D. 2003. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (San Diego, CA). ACM, New York, 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Filkov, V., and Skiena, S. 2003. Integrating microarray data by consensus clustering. In Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI) (Sacramento, CA). 418--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Frieze, A., and Kannan, R. 1999. Quick approximations to matrices and applications. Combinatorica 19, 2, 175--220.Google ScholarGoogle ScholarCross RefCross Ref
  26. Gionis, A., Mannila, H., and Tsaparas, P. 2005. Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering (ICDE) (Tokyo, Japan). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hästad, J. 2001. Some optimal inapproximability results. J. ACM 48, 798--859. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Karp, R. M. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Plenum Press, New York, 85--104.Google ScholarGoogle Scholar
  29. Kemeny, J. G. 1959. Mathematics without numbers. Daedalus 88, 571--591.Google ScholarGoogle Scholar
  30. Kemeny, J., and Snell, J. 1962. Mathematical Models in the Social Sciences. Blaisdell, New York. (Reprinted by MIT Press, Cambridge, 1972.)Google ScholarGoogle Scholar
  31. Kenyon-Mathieu, C., and Schudy, W. 2007. How to rank with few errors. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC) (New York, NY). ACM, New York, 95--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Newman, A. 2000. Approximating the maximum acyclic subgraph. M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA.Google ScholarGoogle Scholar
  33. Newman, A., and Vempala, S. 2001. Fences are futile: On relaxations for the linear ordering problem. In Proceedings of the 8th Conference on Integer Programming and Combinatorial Optimization (IPCO). 333--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Potts, C. N. 1980. An algorithm for the single machine sequencing problem with precedence constraints. Math. Prog. 13, 78--87.Google ScholarGoogle ScholarCross RefCross Ref
  35. Seymour, P. 1995. Packing directed circuits fractionally. Combinatorica 15, 281--288.Google ScholarGoogle ScholarCross RefCross Ref
  36. Speckenmeyer, E. 1989. On feedback problems in digraphs. Graph Theoretic Concepts in Computer Science, Lecture Notes in Computer Science, vol. 411, Springer-Verlag, New York, 218--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Strehl, A. 2002. PhD dissertation. Ph.D. thesis, University of Texas at Austin.Google ScholarGoogle Scholar
  38. Wakabayashi, Y. 1998. The complexity of computing medians of relations. Resenhas 3, 3, 323--349.Google ScholarGoogle Scholar
  39. Williamson, D. P., and van Zuylen, A. 2007. Deterministic algorithms for rank aggregation and other ranking and clustering problems. In Proceedings of the 5th Workshop on Approximation and Online Algorithms (WAOA). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregating inconsistent information: Ranking and clustering

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of the ACM
        Journal of the ACM  Volume 55, Issue 5
        October 2008
        164 pages
        ISSN:0004-5411
        EISSN:1557-735X
        DOI:10.1145/1411509
        Issue’s Table of Contents

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 November 2008
        • Accepted: 1 August 2008
        • Revised: 1 May 2008
        • Received: 1 January 2006
        Published in jacm Volume 55, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader