ABSTRACT
Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.
- Ailon, N., Charikar, M., & Newman, A. (2005). Aggregating inconsistent information: Ranking and clustering. ACM Symposium on the Theory of Computing. Google ScholarDigital Library
- Beckett, L. A. (1993). Maximum likelihood estimation in Mallows' model using partially ranked data. In M. A. Fligner and J. S. Verducci (Eds.), Probability models and statistical analyses for ranking data.Google Scholar
- Critchlow, D. (1985). Metric methods for analyzing partially ranked data. Springer.Google Scholar
- Diaconis, P. (1988). Group representations in probability and statistics. Institute of Mathematical Statistics.Google Scholar
- Diaconis, P. (1989). A generalization of spectral analysis with applications to ranked data. Annals of Statistics, 17, 949--979.Google ScholarCross Ref
- Fligner, M. A., & Verducci, J. S. (1986). Distance based rank models. Journal of the Royal Statistical Society B, 48, 359--369.Google Scholar
- Hofmann, T., & Buhmann, J. (1997). Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 1--14. Google ScholarDigital Library
- Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30, 81--93.Google ScholarCross Ref
- Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671--680.Google ScholarCross Ref
- Lebanon, G., & Lafferty, J. (2002). Cranking: Combining rankings using conditional probability models on permutations. International Conference on Machine Learning. Google ScholarDigital Library
- Mallows, C. L. (1957). Non-null ranking models I. Biometrika, 44, 114--130.Google ScholarCross Ref
- Marden, J. I. (1995). Analyzing and modeling rank data. Chapman & Hall.Google Scholar
- McLachlan, G. J., & Krishnan, T. (1997). The EM algorithm and extensions. John Wiley & Sons.Google Scholar
- Murphy, T. B., & Martin, D. (2003). Mixtures of distance-based models for ranking data. Computational Statistics and Data Analysis, 41, 645--655. Google ScholarDigital Library
- Cluster analysis of heterogeneous rank data
Recommendations
Effective rank aggregation for metasearching
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
Enhanced Learning to Rank using Cluster-loss Adjustment
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataMost Learning To Rank (LTR) algorithms like Ranking SVM, RankNet, LambdaRank and LambdaMART use only relevance label judgments as ground truth for training. But in common scenarios like ranking of information cards (google now, other personal assistants)...
Learning to re-rank: query-dependent image re-ranking using click data
WWW '11: Proceedings of the 20th international conference on World wide webOur objective is to improve the performance of keyword based image search engines by re-ranking their original results. To this end, we address three limitations of existing search engines in this paper. First, there is no straight-forward, fully ...
Comments