skip to main content
10.1145/2983323.2983736acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Data Summarization with Social Contexts

Authors Info & Claims
Published:24 October 2016Publication History

ABSTRACT

While social data is being widely used in various applications such as sentiment analysis and trend prediction, its sheer size also presents great challenges for storing, sharing and processing such data. These challenges can be addressed by data summarization which transforms the original dataset into a smaller, yet still useful, subset. Existing methods find such subsets with objective functions based on data properties such as representativeness or informativeness but do not exploit social contexts, which are distinct characteristics of social data. Further, till date very little work has focused on topic preserving data summarization, despite the abundant work on topic modeling. This is a challenging task for two reasons. First, since topic model is based on latent variables, existing methods are not well-suited to capture latent topics. Second, it is difficult to find such social contexts that provide valuable information for building effective topic-preserving summarization model. To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset. We take Twitter data as a case study. Through analyzing Twitter data, we discover two social contexts which are important for topic generation and dissemination, namely (i) CrowdExp topic score that captures the influence of both the crowd and the expert users in Twitter and (ii) Retweet topic score that captures the influence of Twitter users' actions. We conduct extensive experiments on two real-world Twitter datasets using two applications. The experimental results show that, by leveraging social contexts, our proposed solution can enhance topic-preserving data summarization and improve application performance by up to 18%.

References

  1. Machine learning for language toolkit. http://mallet.cs.umass.edu/.Google ScholarGoogle Scholar
  2. Twitter public apis. https://dev.twitter.com/overview/documentation.Google ScholarGoogle Scholar
  3. Twitter public search apis. https://dev.twitter.com/rest/public/search.Google ScholarGoogle Scholar
  4. S. Auty and R. Elliott. Being like or being liked: identity vs. approval in a social context. Advances in Consumer Research, 28(1), 2001.Google ScholarGoogle Scholar
  5. A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 671--680. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S.-A. Bahrainian and A. Dengel. Sentiment analysis and summarization of twitter data. In Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on, pages 227--234. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Cha, B. Bi, C.-C. Hsieh, and J. Cho. Incorporating popularity in topic models for social network analysis. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 223--232. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Chakrabarti and K. Punera. Event summarization using tweets. ICWSM, 11:66--73, 2011.Google ScholarGoogle Scholar
  10. Y. Chang, X. Wang, Q. Mei, and Y. Liu. Towards twitter context summarization with user in uence models. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 527--536. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, pages 457--479, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Fujishige. Polymatroidal dependence structure of a set of random variables. Information and Control, 39(1):55--72, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  13. Z. Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys (CSUR), 18(1):23--38, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 36--{370. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In Proceedings of the 21st international conference on World Wide Web, pages 769--778. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Hu, L. Tang, J. Tang, and H. Liu. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 537--546. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Lin and J. Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 510--520. Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Liu and K. Aberer. Soco: a social network aided context-aware recommender system. In Proceedings of the 22nd international conference on World Wide Web, pages 781--802. International World Wide Web Conferences Steering Committee, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web, pages 691--700. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 287--296. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Mehrotra and E. Yilmaz. Representative & informative query selection for learning to rank using submodular functions. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 545--554. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In Proceedings of the 17th international conference on World Wide Web, pages 101--110. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques, pages 234--243. Springer, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  25. B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondrák, and A. Krause. Lazier than lazy greedy. arXiv:1409.7938, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems, pages 2049--2057, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Morstatter, J. Pfe er, H. Liu, and K. M. Carley. Is the sample good enough? comparing data from twitter's streaming api with twitter's rehose. arXiv preprint arXiv:1306.5204, 2013.Google ScholarGoogle Scholar
  28. T. T. Nguyen, Q. V. H. Nguyen, M. Weidlich, and K. Aberer. Result selection and summarization for web table search. In 31st IEEE International Conference on Data Engineering, number EPFL-CONF-203577, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pages 189--198. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Pan, W. Wang, A. K. Tung, and J. Yang. Finding representative set from massive data. In Data Mining, Fifth IEEE International Conference on, pages 8--pp. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Sankaran, M. Ghazvininejad, X. He, D. Kale, and L. Cohen. Learning and optimization with submodular functions. arXiv preprint arXiv:1505.01576, 2015.Google ScholarGoogle Scholar
  32. J. Steinberger and K. Jezek. Using latent semantic analysis in text summarization and summary evaluation. In Proc. ISIM'04, pages 93--100, 2004.Google ScholarGoogle Scholar
  33. J. Surowiecki. The wisdom of crowds. Anchor, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. P. Vanchinathan, A. Marfurt, C.-A. Robelin, D. Kossmann, and A. Krause. Discovering valuable items from massive data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1195--1204. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M. Thalmann. Who, where, when and what: discover spatio-temporal topics for twitter users. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 605--613. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Zhu, C. Vondrick, D. Ramanan, and C. Fowlkes. Do we need more training data or better models for object detection?. In BMVC, volume 3, page 5. Citeseer, 2012.Google ScholarGoogle Scholar
  38. H. Zhuang, I. Filali, R. Rahman, and K. Aberer. Coshare: A cost-effective data sharing system for data center networks. In 2015 IEEE Conference on Collaboration and Internet Computing (CIC), pages 11--18. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data Summarization with Social Contexts

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
          October 2016
          2566 pages
          ISBN:9781450340731
          DOI:10.1145/2983323

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 October 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader