research-article

Open Access

An Automated Pipeline for Character and Relationship Extraction from Readers Literary Book Reviews on Goodreads.com

Authors:
Shadi Shahsavari

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Ehsan Ebrahimzadeh

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Behnam Shahbazi

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Misagh Falahi

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Pavan Holur

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Roja Bandari

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Timothy R. Tangherlini

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

,
Vwani Roychowdhury

University of California Los Angeles, USA

University of California Los Angeles, USA
View Profile

WebSci '20: Proceedings of the 12th ACM Conference on Web ScienceJuly 2020Pages 277–286https://doi.org/10.1145/3394231.3397918

Published:06 July 2020Publication History

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

Pages 277–286

ABSTRACT

Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the “consensus narrative framework”. We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of >80% and an average edge detection rate of >89%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others. 1

Supplemental Material

3394231.3397918.mov

mov

115.8 MB

Download

References

Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. The berkeley framenet project. In Proceedings of the 17th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 86–90.Google Scholar
Purnima Bholowalia and Arvind Kumar. 2014. EBK-means: A clustering technique based on elbow method and k-means in WSN. International Journal of Computer Applications 105, 9(2014).Google Scholar
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146.Google ScholarCross Ref
Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web. 355–366.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:cs.CL/1810.04805Google Scholar
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1535–1545.Google ScholarDigital Library
Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational linguistics 28, 3 (2002), 245–288.Google Scholar
AJ Greimas. 1973. Les actants, les acteurs et les figures in sémiotique narrative et textuelle coll. L. paris (1973).Google Scholar
Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, and Fernando Pereira. 2015. Plato: A Selective Context Model for Entity Resolution. Transactions of the Association for Computational Linguistics 3 (2015), 503–515. https://doi.org/10.1162/tacl_a_00154 arXiv:https://doi.org/10.1162/tacl_a_00154Google ScholarCross Ref
Harper Lee. 1960. To Kill a Mockingbird. Philadelphia & New York.Google Scholar
O-Joun Lee and Jason Jung. 2018. Explainable Movie Recommendation Systems by using Story-based Similarity.Google Scholar
Wendy G Lehnert. 1980. Narrative Text Summarization.. In AAAI. 337–339.Google Scholar
Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55–60.Google ScholarCross Ref
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:cs.CL/1301.3781Google Scholar
John W Mohr, Robin Wagner-Pacifici, Ronald L Breiger, and Petko Bogdanov. 2013. Graphing the grammar of motives in National Security Strategies: Cultural interpretation, automated text analysis and the drama of global politics. Poetics 41, 6 (2013), 670–700.Google ScholarCross Ref
Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational linguistics 31, 1 (2005), 71–106.Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162Google Scholar
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.Google ScholarCross Ref
Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R Voss, and Jiawei Han. 2015. Clustype: Effective entity recognition and typing by relation phrase-based clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 995–1004.Google ScholarDigital Library
Mattia Samory and Tanushree Mitra. 2018. Conspiracies online: User discussions in a conspiracy community following dramatic events. In Twelfth International AAAI Conference on Web and Social Media.Google ScholarCross Ref
Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, 2012. Open language learning for information extraction. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, 523–534.Google ScholarDigital Library
Mary Shelley. 2015. Frankenstein. London: Lackington, Hughes, Harding, Mavor, and Jones, 1818. Ed. Stuart Curran. Romantic Circles Electronic Editions 16 (2015).Google Scholar
Robyn Speer, Joshua Chin, and Catherine Havasi. 2016. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. arxiv:cs.CL/1612.03975Google Scholar
J Steinbeck. 1937. Of Mice and Men. New York: Covici & Friede.Google Scholar
Timothy R Tangherlini, Vwani Roychowdhury, Beth Glenn, Catherine M Crespi, Roja Bandari, Akshay Wadia, Misagh Falahi, Ehsan Ebrahimzadeh, and Roshan Bastani. 2016. “Mommy Blogs” and the vaccination exemption narrative: results from a machine-learning approach for story aggregation on parenting social media sites. JMIR public health and surveillance 2, 2 (2016), e166.Google Scholar
Mike Thelwall and Karen Bourrier. 2019. The reading background of Goodreads book club members: a female fiction canon?Journal of Documentation(2019).Google Scholar
Mike Thelwall and Kayvan Kousha. 2017. Goodreads: A social network site for book readers. Journal of the Association for Information Science and Technology 68, 4(2017), 972–983.Google ScholarDigital Library
John Ronald Reuel Tolkien. 2012. The hobbit. Houghton Mifflin Harcourt.Google Scholar
Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018, Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.). ACM, 86–94. https://doi.org/10.1145/3240323.3240369Google ScholarDigital Library
Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-Grained Spoiler Detection from Large-Scale Review Corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2605–2610. https://doi.org/10.18653/v1/p19-1248Google ScholarCross Ref
Fei Wu and Daniel S Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 118–127.Google Scholar

Recommendations

Early narrative experience: positive segue to narrative gameplay
ACE '06: Proceedings of the 2006 ACM SIGCHI international conference on Advances in computer entertainment technology

This paper theorizes that children segue into digital narrative game play easily and 'without pause' because of the perception of narrative they develop through their early print narrative experiences. These experiences are multimodal and socially ...
Read More
Early narrative experience: positive segue to narrative gameplay
ACE '06: Proceedings of the 2006 ACM SIGCHI international conference on Advances in computer entertainment technology

This paper theorizes that children segue into digital narrative game play easily and 'without pause' because of the perception of narrative they develop through their early print narrative experiences. These experiences are multimodal and socially ...
Read More
The Narrative-Communication Structure in Interactive Narrative Works
ICIDS '09: Proceedings of the 2nd Joint International Conference on Interactive Digital Storytelling: Interactive Storytelling

Interactive work on new media platforms differs from familiar work on more traditional media, such as literature, theatre, cinema and television, in terms of their narrative-communication situation. Interactive works, unlike cinematic works, allow the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

WebSci '20: Proceedings of the 12th ACM Conference on Web Science
July 2020
361 pages
ISBN:9781450379892
DOI:10.1145/3394231

Copyright © 2020 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2020
Check for updates
Author Tags
graph theory
knowledge base
machine learning
narrative theory
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate218of875submissions,25%
Upcoming Conference
Websci '24

Sponsor:

sigweb

16th ACM Web Science Conference

May 21 - 24, 2024

Stuttgart , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 799
  Total Downloads
- Downloads (Last 12 months)210
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Automated Pipeline for Character and Relationship Extraction from Readers Literary Book Reviews on Goodreads.com

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Early narrative experience: positive segue to narrative gameplay

Early narrative experience: positive segue to narrative gameplay

The Narrative-Communication Structure in Interactive Narrative Works

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Automated Pipeline for Character and Relationship Extraction from Readers Literary Book Reviews on Goodreads.com

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Early narrative experience: positive segue to narrative gameplay

Early narrative experience: positive segue to narrative gameplay

The Narrative-Communication Structure in Interactive Narrative Works

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media