research-article

Data Summarization with Social Contexts

Authors:
Hao Zhuang

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
View Profile

,
Rameez Rahman

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
View Profile

,
Xia Hu

Texas A&M University, Texas, USA

Texas A&M University, Texas, USA
View Profile

,
Tian Guo

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
View Profile

,
Pan Hui

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Karl Aberer

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
View Profile

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016Pages 397–406https://doi.org/10.1145/2983323.2983736

Published:24 October 2016Publication History

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 397–406

ABSTRACT

While social data is being widely used in various applications such as sentiment analysis and trend prediction, its sheer size also presents great challenges for storing, sharing and processing such data. These challenges can be addressed by data summarization which transforms the original dataset into a smaller, yet still useful, subset. Existing methods find such subsets with objective functions based on data properties such as representativeness or informativeness but do not exploit social contexts, which are distinct characteristics of social data. Further, till date very little work has focused on topic preserving data summarization, despite the abundant work on topic modeling. This is a challenging task for two reasons. First, since topic model is based on latent variables, existing methods are not well-suited to capture latent topics. Second, it is difficult to find such social contexts that provide valuable information for building effective topic-preserving summarization model. To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset. We take Twitter data as a case study. Through analyzing Twitter data, we discover two social contexts which are important for topic generation and dissemination, namely (i) CrowdExp topic score that captures the influence of both the crowd and the expert users in Twitter and (ii) Retweet topic score that captures the influence of Twitter users' actions. We conduct extensive experiments on two real-world Twitter datasets using two applications. The experimental results show that, by leveraging social contexts, our proposed solution can enhance topic-preserving data summarization and improve application performance by up to 18%.

References

Machine learning for language toolkit. http://mallet.cs.umass.edu/.Google Scholar
Twitter public apis. https://dev.twitter.com/overview/documentation.Google Scholar
Twitter public search apis. https://dev.twitter.com/rest/public/search.Google Scholar
S. Auty and R. Elliott. Being like or being liked: identity vs. approval in a social context. Advances in Consumer Research, 28(1), 2001.Google Scholar
A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 671--680. ACM, 2014. Google ScholarDigital Library
S.-A. Bahrainian and A. Dengel. Sentiment analysis and summarization of twitter data. In Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on, pages 227--234. IEEE, 2013. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
Y. Cha, B. Bi, C.-C. Hsieh, and J. Cho. Incorporating popularity in topic models for social network analysis. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 223--232. ACM, 2013. Google ScholarDigital Library
D. Chakrabarti and K. Punera. Event summarization using tweets. ICWSM, 11:66--73, 2011.Google Scholar
Y. Chang, X. Wang, Q. Mei, and Y. Liu. Towards twitter context summarization with user in uence models. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 527--536. ACM, 2013. Google ScholarDigital Library
G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, pages 457--479, 2004. Google ScholarDigital Library
S. Fujishige. Polymatroidal dependence structure of a set of random variables. Information and Control, 39(1):55--72, 1978.Google ScholarCross Ref
Z. Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys (CSUR), 18(1):23--38, 1986. Google ScholarDigital Library
A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 36--{370. Association for Computational Linguistics, 2009. Google ScholarDigital Library
L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In Proceedings of the 21st international conference on World Wide Web, pages 769--778. ACM, 2012. Google ScholarDigital Library
L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarDigital Library
X. Hu, L. Tang, J. Tang, and H. Liu. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 537--546. ACM, 2013. Google ScholarDigital Library
H. Lin and J. Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 510--520. Association for Computational Linguistics, 2011. Google ScholarDigital Library
X. Liu and K. Aberer. Soco: a social network aided context-aware recommender system. In Proceedings of the 22nd international conference on World Wide Web, pages 781--802. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web, pages 691--700. ACM, 2010. Google ScholarDigital Library
H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 287--296. ACM, 2011. Google ScholarDigital Library
R. Mehrotra and E. Yilmaz. Representative & informative query selection for learning to rank using submodular functions. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 545--554. ACM, 2015. Google ScholarDigital Library
Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In Proceedings of the 17th international conference on World Wide Web, pages 101--110. ACM, 2008. Google ScholarDigital Library
M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques, pages 234--243. Springer, 1978.Google ScholarCross Ref
B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondrák, and A. Krause. Lazier than lazy greedy. arXiv:1409.7938, 2014. Google ScholarDigital Library
B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems, pages 2049--2057, 2013. Google ScholarDigital Library
F. Morstatter, J. Pfe er, H. Liu, and K. M. Carley. Is the sample good enough? comparing data from twitter's streaming api with twitter's rehose. arXiv preprint arXiv:1306.5204, 2013.Google Scholar
T. T. Nguyen, Q. V. H. Nguyen, M. Weidlich, and K. Aberer. Result selection and summarization for web table search. In 31st IEEE International Conference on Data Engineering, number EPFL-CONF-203577, 2015.Google ScholarCross Ref
J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pages 189--198. ACM, 2012. Google ScholarDigital Library
F. Pan, W. Wang, A. K. Tung, and J. Yang. Finding representative set from massive data. In Data Mining, Fifth IEEE International Conference on, pages 8--pp. IEEE, 2005. Google ScholarDigital Library
B. Sankaran, M. Ghazvininejad, X. He, D. Kale, and L. Cohen. Learning and optimization with submodular functions. arXiv preprint arXiv:1505.01576, 2015.Google Scholar
J. Steinberger and K. Jezek. Using latent semantic analysis in text summarization and summary evaluation. In Proc. ISIM'04, pages 93--100, 2004.Google Scholar
J. Surowiecki. The wisdom of crowds. Anchor, 2005. Google ScholarDigital Library
H. P. Vanchinathan, A. Marfurt, C.-A. Robelin, D. Kossmann, and A. Krause. Discovering valuable items from massive data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1195--1204. ACM, 2015. Google ScholarDigital Library
Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M. Thalmann. Who, where, when and what: discover spatio-temporal topics for twitter users. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 605--613. ACM, 2013. Google ScholarDigital Library
W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011. Google ScholarDigital Library
X. Zhu, C. Vondrick, D. Ramanan, and C. Fowlkes. Do we need more training data or better models for object detection?. In BMVC, volume 3, page 5. Citeseer, 2012.Google Scholar
H. Zhuang, I. Filali, R. Rahman, and K. Aberer. Coshare: A cost-effective data sharing system for data center networks. In 2015 IEEE Conference on Collaboration and Internet Computing (CIC), pages 11--18. IEEE, 2015. Google ScholarDigital Library

Index Terms

Data Summarization with Social Contexts
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Document topic models
    2. Retrieval tasks and goals
      1. Summarization
  2. World Wide Web
    1. Web mining
      1. Data extraction and integration

Recommendations

Social context summarization
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard documents. With the rapid growth of online social networks, abundant user ...
Read More
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Read More
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data summarization
social context
submodular optimization
topic model
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 336
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data Summarization with Social Contexts

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Social context summarization

Topic-driven reader comments summarization

Research on Multi-document Summarization Based on LDA Topic Model