Skip to main content

Adapting LDA Model to Discover Author-Topic Relations for Email Analysis

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

Abstract

Analyzing the author and topic relations in email corpus is an important issue in both social network analysis and text mining. The Author-Topic model is a statistical model that identifies the author-topic relations. However, in its inference process, it ignores the information at the document level, i.e., the co-occurrence of words within documents are not taken into account in deriving topics. This may not be suitable for email analysis. We propose to adapt the Latent Dirichlet Allocation model for analyzing email corpus. This method takes into account both the author-document relations and the document-topic relations. We use the Author-Topic model as the baseline method and propose measures to compare our method against the Author-Topic model. We did empirical analysis based on experimental results on both simulated data sets and the real Enron email data set to show that our method obtains better performance than the Author-Topic model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    Article  MATH  Google Scholar 

  2. Dredze, M., Lau, T.A., Kushmerick, N.: Automatically classifying emails into activities. In: Intelligent User Interfaces, Sydney, Australia, January, 2006, pp. 70–77 (2006)

    Google Scholar 

  3. Gilks, W., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Chapman & Hall, New York (1996)

    MATH  Google Scholar 

  4. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)

    Google Scholar 

  5. Huang, Y., Govindaraju, D., Mitchell, T.M., de Carvalho, V.R., Cohen, W.W.: Inferring ongoing activities of workstation users by clustering email. In: Proceedings of the First Conference on Email and Anti-Spam, Mountain View, California, USA (July 2004)

    Google Scholar 

  6. Khoussainov, R., Kushmerick, N.: Email task management: An iterative relational learning approach. In: Proceedings of the Second Conference on Email and Anti-Spam. Stanford University, California (2005)

    Google Scholar 

  7. Li, H., Shen, D., Zhang, B., Chen, A., Yang, Q.: Adding semantics to email clustering. In: Proceedings of the 6th IEEE International Conference on Data Mining, Hong Kong, China, pp. 938–942 (2006)

    Google Scholar 

  8. McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on Enron and academic email. Journal of Artificial Intelligence Research 30, 249–272 (2007)

    Google Scholar 

  9. Rosen-Zvi, M., Griggiths, T.L., Smyth, P., Steyvers, M.: Learning author topic models from text corpora, http://citeseer.ist.psu.edu/rosen-zvi05learning.html

  10. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.L.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, USA, August, 2004, pp. 306–315 (2004)

    Google Scholar 

  11. Enron email data set, http://www.isi.edu/~adibi/Enron/Enron.htm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geng, L., Wang, H., Wang, X., Korba, L. (2008). Adapting LDA Model to Discover Author-Topic Relations for Email Analysis. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics