ABSTRACT
Crowdsourced top- k computation aims to utilize the human ability to identify Top- k objects from a given set of objects. Most of existing studies employ a pairwise comparison based method, which first asks workers to compare each pair of objects and then infers the Top- k results based on the pairwise comparison results. Obviously, it is quadratic to compare every object pair and these methods involve huge monetary cost, especially for large datasets. To address this problem, we propose a rating-ranking-based approach, which contains two types of questions to ask the crowd. The first is a rating question, which asks the crowd to give a score for an object. The second is a ranking question, which asks the crowd to rank several (e.g., 3) objects. Rating questions are coarse grained and can roughly get a score for each object, which can be used to prune the objects whose scores are much smaller than those of the Top- k objects. Ranking questions are fine grained and can be used to refine the scores. We propose a unified model to model the rating and ranking questions, and seamlessly combine them together to compute the Top- k results. We also study how to judiciously select appropriate rating or ranking questions and assign them to a coming worker. Experimental results on real datasets show that our method significantly outperforms existing approaches.
- R. M. Adelsman and A. B. Whinston. Sophisticated voting with information for two voting functions. Journal of Economic Theory, 15(1):145--159, 1977.Google ScholarCross Ref
- R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, pages 324--345, 1952.Google ScholarCross Ref
- R. Busa-Fekete, B. Szorenyi, W. Cheng, P. Weng, and E. Hullermeier. Top-k selection based on adaptive sampling of noisy preferences. In ICML, pages 1094--1102, 2013. Google ScholarDigital Library
- C. Chai, G. Li, J. Li, D. Deng, and J. Feng. Cost-effective crowdsourced entity resolution: A partial-order approach. In SIGMOD, pages 969--984, 2016. Google ScholarDigital Library
- X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarDigital Library
- Y. Chen and C. Suh. Spectral MLE: top-k rank aggregation from pairwise comparisons. In ICML, pages 371--380, 2015. Google ScholarDigital Library
- S. B. Davidson, S. Khanna, T. Milo, and S. Roy. Using the crowd for top-k and group-by queries. In ICDT, pages 225--236, 2013. Google ScholarDigital Library
- B. Eriksson. Learning to top-k search using pairwise comparisons. In AISTATS, pages 265--273, 2013.Google Scholar
- J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030, 2015. Google ScholarDigital Library
- J. Fan, M. Zhang, S. Kok, M. Lu, and B. C. Ooi. Crowdop: Query optimization for declarative crowdsourcing systems. IEEE Trans. Knowl. Data Eng., 27(8):2078--2092, 2015.Google ScholarDigital Library
- U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. pages 1001--1018, 1994. Google ScholarDigital Library
- J. Feng, G. Li, H. Wang, and J. Feng. Incremental quality inference in crowdsourcing. In DASFAA, pages 453--467, 2014.Google ScholarCross Ref
- R. Fletcher. Practical methods of optimization. John Wiley & Sons, 2013.Google Scholar
- M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarDigital Library
- J. Guiver and E. Snelson. Bayesian inference for plackett-luce ranking models. In ICML, pages 377--384, 2009. Google ScholarDigital Library
- S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385--396, 2012. Google ScholarDigital Library
- A. R. Khan and H. Garcia-Molina. Hybrid strategies for finding the max with the crowd: Technical report. Technical report, Stanford University, February 2014.Google Scholar
- N. M. Kou, Y. Li, H. Wang, L. H. U, and Z. Gong. Crowdsourced top-k queries by confidence-aware pairwise judgments. In SIGMOD, pages 1415--1430, 2017. Google ScholarDigital Library
- G. Li. Human-in-the-loop data integration. PVLDB, 10(12):2006--2017, 2017. Google ScholarDigital Library
- G. Li, C. Chai, J. Fan, X. Weng, J. Li, Y. Zheng, Y. Li, X. Yu, X. Zhang, and H. Yuan. CDB: optimizing queries with crowd-based selections and joins. In SIGMOD, pages 1463--1478, 2017. Google ScholarDigital Library
- G. Li, J. Wang, Y. Zheng, and M. J. Franklin. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng., 28(9):2296--2319, 2016. Google ScholarDigital Library
- G. Li, Y. Zheng, J. Fan, J. Wang, and R. Cheng. Crowdsourced data management: Overview and challenges. In SIGMOD, pages 1711--1716, 2017. Google ScholarDigital Library
- A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. PVLDB, 6(2):109--120, 2012. Google ScholarDigital Library
- A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13--24, 2011. Google ScholarDigital Library
- A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google Scholar
- J. I. Marden. Analyzing and modeling rank data. CRC Press, 1996.Google Scholar
- S. Negahban, S. Oh, and D. Shah. Iterative ranking from pair-wise comparisons. In NIPS, pages 2483--2491, 2012. Google ScholarDigital Library
- A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: algorithms for filtering data with humans. In SIGMOD, pages 361--372, 2012. Google ScholarDigital Library
- H. Park, R. Pang, A. G. Parameswaran, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: A system for declarative crowdsourcing. PVLDB, 5(12):1990--1993, 2012. Google ScholarDigital Library
- H. Park and J. Widom. Crowdfill: collecting structured data from the crowd. In SIGMOD, pages 577--588, 2014. Google ScholarDigital Library
- T. Pfeiffer, X. A. Gao, Y. Chen, A. Mao, and D. G. Rand. Adaptive polling for information aggregation. In AAAI, 2012. Google ScholarDigital Library
- J.-C. Pomerol and S. Barba-Romero. Multicriterion decision in management: principles and practice, volume 25. Springer, 2000. Google ScholarDigital Library
- A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and A. Y. Halevy. Crowd-powered find algorithms. In ICDE, pages 964--975, 2014.Google Scholar
- C. Shan, N. Mamoulis, G. Li, R. Cheng, Z. Huang, and Y. Zheng. T-crowd: Effective crowdsourcing for tabular data. ICDE, abs/1708.02125, 2018.Google Scholar
- L. L. Thurstone. The method of paired comparisons for social values. The Journal of Abnormal and Social Psychology, 21(4):384, 1927.Google ScholarCross Ref
- B. Trushkowsky, T. Kraska, M. J. Franklin, and P. Sarkar. Crowdsourced enumeration queries. In ICDE, pages 673--684, 2013. Google ScholarDigital Library
- P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. Max algorithms in crowdsourcing environments. In WWW, pages 989--998, 2012. Google ScholarDigital Library
- J. Wang, T. Kraska, M. J. Franklin, and J. Feng. CrowdER: crowdsourcing entity resolution. PVLDB, 5(11):1483--1494, 2012. Google ScholarDigital Library
- J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, 2013. Google ScholarDigital Library
- F. L. Wauthier, M. I. Jordan, and N. Jojic. Efficient ranking from pairwise comparisons. In ICML, pages 109--117, 2013. Google ScholarDigital Library
- X. Weng, G. Li, H. Hu, and J. Feng. Crowdsourced selection on multi-attribute data. In CIKM, pages 307--316, 2017. Google ScholarDigital Library
- T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In MobiSys. Google ScholarDigital Library
- P. Ye, U. EDU, and D. Doermann. Combining preference and absolute judgements in a crowd-sourced setting. In ICML '13 Workshop, 2013.Google Scholar
- X. Zhang, G. Li, and J. Feng. Crowdsourced top-k algorithms: An experimental evaluation. PVLDB, 9(8):612--623, 2016. Google ScholarDigital Library
- Y. Zheng, G. Li, and R. Cheng. DOCS: domain-aware crowdsourcing system. PVLDB, 10(4):361--372, 2016. Google ScholarDigital Library
- Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng. Truth inference in crowdsourcing: Is the problem solved? PVLDB, 10(5):541--552, 2017. Google ScholarDigital Library
- Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015. Google ScholarDigital Library
- Y. Zhuang, G. Li, Z. Zhong, and J. Feng. Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In CIKM, pages 1917--1926, 2017. Google ScholarDigital Library
Index Terms
- A Rating-Ranking Method for Crowdsourced Top-k Computation
Recommendations
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Quality and Leniency in Online Collaborative Rating Systems
The emerging trend of social information processing has resulted in Web users’ increased reliance on user-generated content contributed by others for information searching and decision making. Rating scores, a form of user-generated content contributed ...
Efficient crowdsourced best objects finding via superiority probability based ordering for decision support systems
AbstractBest objects finding is a fundamental operation in decision support systems and applications. When numerical values of objects cannot be obtained from existing computer systems or in a machine learning manner, crowdsourcing proves a viable ...
Highlights- Leveraging crowdsourcing to obtain object values for best objects finding problem.
- Reducing both monetary cost and latency using the filtering-verification framework.
- An object of a high superiority probability is likely to be a ...
Comments