skip to main content
10.1145/3394231.3397918acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article
Open Access

An Automated Pipeline for Character and Relationship Extraction from Readers Literary Book Reviews on Goodreads.com

Published:06 July 2020Publication History

ABSTRACT

Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the “consensus narrative framework”. We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of >80% and an average edge detection rate of >89%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others. 1

Skip Supplemental Material Section

Supplemental Material

3394231.3397918.mov

mov

115.8 MB

References

  1. Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. The berkeley framenet project. In Proceedings of the 17th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 86–90.Google ScholarGoogle Scholar
  2. Purnima Bholowalia and Arvind Kumar. 2014. EBK-means: A clustering technique based on elbow method and k-means in WSN. International Journal of Computer Applications 105, 9(2014).Google ScholarGoogle Scholar
  3. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146.Google ScholarGoogle ScholarCross RefCross Ref
  4. Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web. 355–366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:cs.CL/1810.04805Google ScholarGoogle Scholar
  6. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1535–1545.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational linguistics 28, 3 (2002), 245–288.Google ScholarGoogle Scholar
  8. AJ Greimas. 1973. Les actants, les acteurs et les figures in sémiotique narrative et textuelle coll. L. paris (1973).Google ScholarGoogle Scholar
  9. Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, and Fernando Pereira. 2015. Plato: A Selective Context Model for Entity Resolution. Transactions of the Association for Computational Linguistics 3 (2015), 503–515. https://doi.org/10.1162/tacl_a_00154 arXiv:https://doi.org/10.1162/tacl_a_00154Google ScholarGoogle ScholarCross RefCross Ref
  10. Harper Lee. 1960. To Kill a Mockingbird. Philadelphia & New York.Google ScholarGoogle Scholar
  11. O-Joun Lee and Jason Jung. 2018. Explainable Movie Recommendation Systems by using Story-based Similarity.Google ScholarGoogle Scholar
  12. Wendy G Lehnert. 1980. Narrative Text Summarization.. In AAAI. 337–339.Google ScholarGoogle Scholar
  13. Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55–60.Google ScholarGoogle ScholarCross RefCross Ref
  14. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:cs.CL/1301.3781Google ScholarGoogle Scholar
  15. John W Mohr, Robin Wagner-Pacifici, Ronald L Breiger, and Petko Bogdanov. 2013. Graphing the grammar of motives in National Security Strategies: Cultural interpretation, automated text analysis and the drama of global politics. Poetics 41, 6 (2013), 670–700.Google ScholarGoogle ScholarCross RefCross Ref
  16. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational linguistics 31, 1 (2005), 71–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162Google ScholarGoogle Scholar
  18. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.Google ScholarGoogle ScholarCross RefCross Ref
  19. Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R Voss, and Jiawei Han. 2015. Clustype: Effective entity recognition and typing by relation phrase-based clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 995–1004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mattia Samory and Tanushree Mitra. 2018. Conspiracies online: User discussions in a conspiracy community following dramatic events. In Twelfth International AAAI Conference on Web and Social Media.Google ScholarGoogle ScholarCross RefCross Ref
  21. Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, 2012. Open language learning for information extraction. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, 523–534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mary Shelley. 2015. Frankenstein. London: Lackington, Hughes, Harding, Mavor, and Jones, 1818. Ed. Stuart Curran. Romantic Circles Electronic Editions 16 (2015).Google ScholarGoogle Scholar
  23. Robyn Speer, Joshua Chin, and Catherine Havasi. 2016. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. arxiv:cs.CL/1612.03975Google ScholarGoogle Scholar
  24. J Steinbeck. 1937. Of Mice and Men. New York: Covici & Friede.Google ScholarGoogle Scholar
  25. Timothy R Tangherlini, Vwani Roychowdhury, Beth Glenn, Catherine M Crespi, Roja Bandari, Akshay Wadia, Misagh Falahi, Ehsan Ebrahimzadeh, and Roshan Bastani. 2016. “Mommy Blogs” and the vaccination exemption narrative: results from a machine-learning approach for story aggregation on parenting social media sites. JMIR public health and surveillance 2, 2 (2016), e166.Google ScholarGoogle Scholar
  26. Mike Thelwall and Karen Bourrier. 2019. The reading background of Goodreads book club members: a female fiction canon?Journal of Documentation(2019).Google ScholarGoogle Scholar
  27. Mike Thelwall and Kayvan Kousha. 2017. Goodreads: A social network site for book readers. Journal of the Association for Information Science and Technology 68, 4(2017), 972–983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. John Ronald Reuel Tolkien. 2012. The hobbit. Houghton Mifflin Harcourt.Google ScholarGoogle Scholar
  29. Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018, Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.). ACM, 86–94. https://doi.org/10.1145/3240323.3240369Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-Grained Spoiler Detection from Large-Scale Review Corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2605–2610. https://doi.org/10.18653/v1/p19-1248Google ScholarGoogle ScholarCross RefCross Ref
  31. Fei Wu and Daniel S Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 118–127.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WebSci '20: Proceedings of the 12th ACM Conference on Web Science
    July 2020
    361 pages
    ISBN:9781450379892
    DOI:10.1145/3394231

    Copyright © 2020 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 July 2020

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate218of875submissions,25%

    Upcoming Conference

    Websci '24
    16th ACM Web Science Conference
    May 21 - 24, 2024
    Stuttgart , Germany

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format