Skip to main content
Log in

Lingual markers for automating personality profiling: background and road ahead

  • Survey Article
  • Published:
Journal of Computational Social Science Aims and scope Submit manuscript

Abstract

Personality is a psychological concept which embodies the unique characteristics of an individual. An individual’s distinct traits are embodied by the psychological concept of personality. The Lexical Hypothesis states that language use and the terms people use to describe one another can help us decide personality qualities. Huge improvements in data collecting and processing have been brought about by technological breakthroughs. These could help to develop autonomous personality assessment models by deriving linguistic markers from the data present in social media, telecommunication signals, and even signals collected from human–machine interaction. Numerous studies have cantered on using machine learning to automate personality recognition from text. However, there are questions in terms of their performance, reliability as well as ethical usage. To find solutions, we extensively review and analyse the existing research in the field of personality computing using lingual markers in text. A content-oriented classification of the techniques used is provided. We also examine the existing literature for gaps and limitations with a detailed comparative analysis. The field of personality computing has the potential to impact every field of human life but the progress as of now is limited. Our review will help researchers to build from what has been achieved so far for faster progress in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

Essays: [James W. Pennebaker, Laura A. King]. ([Year & Month of dataset creation]). [Essays], [Version 1]. Retrieved [18-08-2022] from [essays.csv | Kaggle]. (https://www.kaggle.com/datasets/manjarinandimajumdar/essayscsv) MyPersonality: The dataset was collected from Facebook by David Stillwell and Michal Kosinki for the myPersonality project. [David Stillwell, Michal Kosinski]. ([2012; Month of dataset creation]). [MyPersonality], [Version of the dataset]. Retrieved [18-08-2022] from [wiki:mypersonality_final]. Citation for the MyPersonality Project. Kosinski, M., Matz, S., Gosling, S., Popov, V. & Stillwell, D. (2015) Facebook as a Social Science Research Tool: Opportunities, Challenges, Ethical Considerations and Practical Guidelines. American Psychologist.(https://web.archive.org/web/20180428085315/http://mypersonality.org/wiki/lib/exe/fetch.php?media=wiki:mypersonality_final.zip) PAN-AP-15: [Francisco Rangel, Fabio Celli, Paolo Rosso, Martin Potthast, Benno Stein, and Walter Daelemans] ([2015; September]). [PAN15-Author-Profiling], [Version 1]. Retrieved [Date Retrieved] from [PAN Data (webis.de)]. (https://pan.webis.de/data.html#pan15-author-profiling) YouTube: [Joan-Isaac Biel, Daniel Gatica-Perez] ([2012]). [Youtube Personality], [Version 1]. The dataset was originally released by Idiap institute, no longer available for download on updated website [Youtube Personality—English (idiap.ch)] Retrieved [18-08-2022]. However, we were able to download it from OpenM [Youtube] Retrieved [18-08-2022]. (https://www.openml.org/search?type=data&sort=runs&id=41411&status=active)

References

  1. Theophrastus. (4th Century BC). The characters.

  2. Papurt, M. J. (1930). A study of the Woodworth psychoneurotic inventory with suggested revision. The Journal of Abnormal and Social Psychology, 25(3), 335.

    Article  Google Scholar 

  3. Cattell, H. E., & Mead, A. D. (2008). The sixteen personality factor questionnaire (16PF).

  4. Costa Jr, P. T., & McCrae, R. R. (2008). The revised neo personality inventory (neo-pi-r). Sage.

  5. Briggs, K. C. (1976). Myers–Briggs type indicator. Consulting Psychologists Press.

    Google Scholar 

  6. Vinciarelli, A., & Mohammadi, G. (2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3), 273–291.

    Article  Google Scholar 

  7. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296–1312. https://doi.org/10.1037/0022-3514.77.6.1296

    Article  Google Scholar 

  8. Celli, F., Pianesi, F., Stillwell, D., & Kosinski, M. (2013, June). Workshop on computational personality recognition: Shared task. In Proceedings of the international AAAI conference on web and social media (Vol. 7, No. 1).

  9. Rangel Pardo, F. M., Celli, F., Rosso, P., Potthast, M., Stein, B., & Daelemans, W. (2015). Overview of the 3rd Author Profiling Task at PAN 2015. In CLEF 2015 evaluation labs and workshop working notes papers (pp. 1–8).

  10. Biel, J. I., & Gatica-Perez, D. (2012). The youtube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia, 15(1), 41–55.

    Article  Google Scholar 

  11. Hurst, M. F. (2006). Temporal text mining. In AAAI spring symposium: Computational approaches to analyzing weblogs (pp. 73–77).

  12. Cutting, D., Kupiec, J., Pedersen, J., & Sibun, P. (1992, March). A practical part-of-speech tagger. In 3rd conference on applied natural language processing (pp. 133–140).

  13. Zhang, Y., Jin, R., & Zhou, Z. H. (2010). Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1(1), 43–52.

    Article  Google Scholar 

  14. Grishman, R., & Sundheim, B. M. (1996). Message understanding conference-6: A brief history. In COLING 1996 volume 1: The 16th international conference on computational linguistics.

  15. Chung, C., & Pennebaker, J. W. (2007). The psychological functions of function words. Social Communication, 1, 343–359.

    Google Scholar 

  16. Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–480.

    Google Scholar 

  17. Chen, K., Zhang, Z., Long, J., & Zhang, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications, 66, 245–260.

    Article  Google Scholar 

  18. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188–1196). PMLR.

  19. Davison, A. (1984). Readability—Appraising text difficulty. Learning to read in American schools: Basal readers and content texts (pp. 121–139).

  20. Kelledy, F., & Smeaton, A. F. (1997, April). Automatic phrase recognition and extraction from text. In Proceedings of the 19th annual BCS-IRSG colloquium on IR research 19 (pp. 1–9).

  21. Wallach, H. M. (2006, June). Topic modelling: Beyond bag-of-words. In Proceedings of the 23rd international conference on machine learning (pp. 977–984).

  22. Lapponi, E., Read, J., & Øvrelid, L. (2012, December). Representing and resolving negation for sentiment analysis. In 2012 IEEE 12th international conference on data mining workshops (pp. 687–692). IEEE.

  23. Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. Preprint arXiv:1801.06146.

  24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 1.

    Google Scholar 

  25. Niu, L., Xinyu, D., Jianbing, Z., & Jiajun, C. (2015). Topic2Vec: Learning distributed representations of topics. In 2015 international conference on Asian language processing (IALP) (pp. 193–196). IEEE.

  26. Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

  27. Young, J. C., & Rusli, A. (2019, August). Review and visualization of Facebook's FastText pretrained word vector model. In 2019 international conference on engineering, science, and industrial applications (ICESI) (pp. 1–6). IEEE.

  28. Yao, D., Bi, J., Huang, J., & Zhu, J. (2015, July). A word distributed representation based framework for large-scale short text classification. In 2015 international joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.

  29. Matthew, E. (2018). Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. Deep contextualized word representations. In Proc. of NAACL.

  30. Dey, R., & Salem, F. M. (2017, August). Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS) (pp. 1597–1600). IEEE.

  31. Merity, S., Keskar, N. S., & Socher, R. (2017). Regularizing and optimizing LSTM language models. Preprint arXiv:1708.02182.

  32. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805.

  33. Zhou, J., Zhang, Z., Zhao, H., & Zhang, S. (2019). Limit-Bert: Linguistic informed multi-task Bert. Preprint arXiv:1910.14296.

  34. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. Preprint arXiv:1907.11692.

  35. Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30, 457–500.

    Article  Google Scholar 

  36. Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497–505.

    Article  Google Scholar 

  37. Whissell, C., Fournier, M., Pelland, R., Weir, D., & Makarec, K. (1986). A dictionary of affect in language: IV. Reliability, validity, and applications. Perceptual and Motor Skills, 62(3), 875–888.

    Article  Google Scholar 

  38. Moffitt, K., Giboney, J., Ehrhardt, E., Burgoon, J. K., & Nunamaker, J. F. (2010). Structured programming for linguistic cue extraction. The Center for the Management of Information, 1, 1.

    Google Scholar 

  39. Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The general inquirer: A computer approach to content analysis.

  40. Cambria, E., & Hussain, A. (2015). SenticNet. In Sentic computing (pp. 23–71). Springer, Cham.

  41. Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. Preprint arXiv:1103.2903.

  42. Mohammad, S. M., & Turney, P. D. (2013). Nrc emotion lexicon. National Research Council, Canada, 2.

  43. Mohammad, S. (2018, July). Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long papers) (pp. 174–184).

  44. Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  45. Havasi, C., Speer, R., & Alonso, J. (2007, September). ConceptNet 3: A flexible, multilingual semantic network for common sense knowledge. In Recent advances in natural language processing (pp. 27–29). John Benjamins.

  46. Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., & Bandyopadhyay, S. (2013). Enhanced SenticNet with affective labels for concept-based opinion mining. IEEE Intelligent Systems, 28(2), 31–38.

    Article  Google Scholar 

  47. Searle, J. R. (1975). Indirect speech acts. In Speech acts (pp. 59–82). Brill.

  48. Searle, J. R. (1976). A classification of illocutionary acts1. Language in Society, 5(1), 1–23.

    Article  Google Scholar 

  49. Walker, M., & Whittaker, S. (1995). Mixed initiative in dialogue: An investigation into discourse segmentation. Preprint arXiv:cmp-lg/9504007.

  50. McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2), 175–215.

    Article  Google Scholar 

  51. Allport, G. W., & Odbert, H. S. (1936). Trait-names: A psycho-lexical study. Psychological Monographs, 47(1), i.

    Article  Google Scholar 

  52. Schwartz, S. H. (2007). Basic human values: Theory, measurement, and applications. Revue Française de Sociologie, 47(4), 929.

    Article  Google Scholar 

  53. Eysenck, H. J. (1982). Personality, genetics, and behavior: Selected papers.

  54. Newman, J. (1981). Myers, Isabel Briggs. The Myers-Briggs type indicator. Palo Alto, CA, Consulting Psychologists Press, 1976. Myers, Isabel Briggs (with Peter B. Myers). Gifts Differing. Palo Alto, CA, Consulting Psychologists Press, 1980.

  55. Paulhus, D. L., & Williams, K. M. (2002). The dark triad of personality: Narcissism, Machiavellianism, and psychopathy. Journal of Research in Personality, 36(6), 556–563.

    Article  Google Scholar 

  56. Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26.

    Article  Google Scholar 

  57. Ashton, M. C., & Lee, K. (2007). Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review, 11(2), 150–166.

    Article  Google Scholar 

  58. Gill, A. J., & Oberlander, J. (2002). Taking care of the linguistic features of extraversion. In Proceedings of the annual meeting of the cognitive science society (Vol. 24, No. 24).

  59. Gosling, S. D., Ko, S. J., Mannarelli, T., & Morris, M. E. (2002). A room with a cue: Personality judgments based on offices and bedrooms. Journal of Personality and Social Psychology, 82(3), 379.

    Article  Google Scholar 

  60. Vazire, S., & Gosling, S. D. (2004). e-Perceptions: Personality impressions based on personal websites. Journal of Personality and Social Psychology, 87(1), 123.

    Article  Google Scholar 

  61. Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90(5), 862.

    Article  Google Scholar 

  62. Gosling, S. D., Gaddis, S., & Vazire, S. (2007). Personality impressions based on Facebook profiles. Icwsm, 7, 1–4.

    Google Scholar 

  63. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.

    Article  Google Scholar 

  64. Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality, 44(3), 363–373.

    Article  Google Scholar 

  65. Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of Research in Personality, 45(1), 92–99.

    Article  Google Scholar 

  66. Iacobelli, F., Gill, A. J., Nowson, S., & Oberlander, J. (2011, October). Large scale personality classification of bloggers. In International conference on affective computing and intelligent interaction (pp. 568–577). Springer.

  67. Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you tweet: Personality expression and perception on Twitter. Journal of Research in Personality, 46, 710–718. https://doi.org/10.1016/j.jrp.2012.08.008

    Article  Google Scholar 

  68. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8(9), e73791.

    Article  Google Scholar 

  69. Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., & Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934.

    Article  Google Scholar 

  70. Hassanein, M. M., Rady, S., Hussein, W., & Gharib, T. (2021). Extracting relationships between Big Five model and personality characteristics in social networks. International Journal of Intelligent Computing and Information Sciences, 21(2), 41–49.

    Article  Google Scholar 

  71. Štajner, S., & Yenikent, S. (2021, April). Why is MBTI personality detection from texts a difficult task? In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: Main volume (pp. 3580–3589).

  72. Giorgi, S., Nguyen, K. L., Eichstaedt, J. C., Kern, M. L., Yaden, D. B., Kosinski, M., Seligman, M. E., Ungar, L. H., Schwartz, H. A., & Park, G. (2022). Regional personality assessment through social media language. Journal of Personality, 90(3), 405–425.

    Article  Google Scholar 

  73. Celli, F., Lepri, B., Biel, J. I., Gatica-Perez, D., Riccardi, G., & Pianesi, F. (2014, November). The workshop on computational personality recognition 2014. In Proceedings of the 22nd ACM international conference on multimedia (pp. 1245–1246).

  74. Buchanan, T., & Smith, J. L. (1999). Using the Internet for psychological research: Personality testing on the World Wide Web. British Journal of Psychology, 90(1), 125–144.

    Article  Google Scholar 

  75. Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036–1040.

    Article  Google Scholar 

  76. Novikov, P., Mararitsa, L., & Nozdrachev, V. (2021). Inferred vs. traditional personality assessment: Are we predicting the same thing? arXiv e-prints. arXiv-2103.

  77. Argamon, S., Dhawle, S., Koppel, M., & Pennebaker, J. W. (2005, June). Lexical predictors of personality type. In Proceedings of the 2005 joint annual meeting of the interface and the classification society of North America (pp. 1–16).

  78. Mairesse, F., & Walker, M. (2006). Words mark the nerds: Computational models of personality recognition through language. In Proceedings of the annual meeting of the cognitive science society (Vol. 28, No. 28).

  79. Oberlander, J., & Nowson, S. (2006, July). Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the COLING/ACL 2006 main conference poster sessions (pp. 627–634).

  80. Nowson, S., & Oberlander, J. (2007, March). Identifying more bloggers: Towards large scale personality classification of personal weblogs. In Proceedings of the international conference on weblogs and social.

  81. Estival, D., Gaustad, T., Pham, S. B., Radford, W., & Hutchinson, B. (2007, September). Author profiling for English emails. In Proceedings of the 10th conference of the Pacific association for computational linguistics (Vol. 263, p. 272).

  82. Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October). Predicting personality from twitter. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 149–156). IEEE.

  83. Golbeck, J., Robles, C., & Turner, K. (2011). Predicting personality with social media. In CHI'11 extended abstracts on human factors in computing systems (pp. 253–262).

  84. Quercia, D., Kosinski, M., Stillwell, D., & Crowcroft, J. (2011, October). Our twitter profiles, our selves: Predicting personality with twitter. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 180–185). IEEE.

  85. Adali, S., & Golbeck, J. (2012, August). Predicting personality with social behavior. In 2012 IEEE/ACM international conference on advances in social networks analysis and mining (pp. 302–309). IEEE.

  86. Bai, S., Zhu, T., & Cheng, L. (2012). Big-five personality prediction based on user behaviours at social network sites. Preprint arXiv:1204.4809.

  87. Kermanidis, K. L. (2012, May). Mining authors’ personality traits from Modern Greek spontaneous text. In Proceedings of workshop on corpora for research on emotion sentiment and social signals, in conjunction with LREC (pp. 90–93).

  88. Wald, R., Khoshgoftaar, T., & Sumner, C. (2012, August). Machine prediction of personality from Facebook profiles. In 2012 IEEE 13th international conference on information reuse and integration (IRI) (pp. 109–115). IEEE.

  89. Shen, J., Brdiczka, O., & Liu, J. (2013, June). Understanding email writers: Personality prediction from email messages. In International conference on user modelling, adaptation, and personalization (pp. 318–330). Springer.

  90. Alam, F., Stepanov, E. A., & Riccardi, G. (2013). Personality traits recognition on social network-Facebook. In Proceedings of the international AAAI conference on web and social media (Vol. 7, No. 2, pp. 6–9).

  91. Verhoeven, B., Daelemans, W., & De Smedt, T. (2013, June). Ensemble methods for personality recognition. In Seventh international AAAI conference on weblogs and social media.

  92. Farnadi, G., Zoghbi, S., Moens, M. F., & De Cock, M. (2013, June). Recognising personality traits using Facebook status updates. In Proceedings of the international AAAI conference on web and social media (Vol. 7, No. 1).

  93. Tomlinson, M. T., Hinote, D., & Bracewell, D. B. (2013, June). Predicting conscientiousness through semantic analysis of Facebook posts. In Seventh international AAAI conference on weblogs and social media.

  94. Markovikj, D., Gievska, S., Kosinski, M., & Stillwell, D. J. (2013, June). Mining Facebook data for predictive personality modelling. In Seventh international AAAI conference on weblogs and social media.

  95. Iacobelli, F., & Culotta, A. (2013, June). Too neurotic, not too friendly: Structured personality classification on textual data. In Seventh international AAAI conference on weblogs and social media.

  96. Appling, D., Briscoe, E., Hayes, H., & Mappus, R. (2013, June). Towards automated personality identification using speech acts. In Proceedings of the international AAAI conference on web and social media (Vol. 7, No. 1).

  97. Mohammad, S., & Kiritchenko, S. (2013, June). Using nuances of emotion to identify personality. In Seventh international AAAI conference on weblogs and social media.

  98. Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., & Howard, N. (2013, November). Common sense knowledge based personality recognition from text. In Mexican international conference on artificial intelligence (pp. 484–496). Springer

  99. Zuo, X., Feng, B., Yao, Y., Zhang, T., Zhang, Q., Wang, M., & Zuo, W. (2013, September). A weighted ML-KNN model for predicting users’ personality traits. In Proc. Int. Conf. Inf. Sci. Comput. Appl. (ISCA) (pp. 345–350).

  100. Gou, L., Zhou, M. X., & Yang, H. (2014, April). KnowMe and ShareMe: Understanding automatically discovered personality traits from social media and user sharing preferences. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 955–964).

  101. Pratama, B. Y., & Sarno, R. (2015, November). Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In 2015 international conference on data and software engineering (ICoDSE) (pp. 170–174). IEEE.

  102. Arroju, M., Hassan, A., & Farnadi, G. (2015). Age, gender and personality recognition using tweets in a multilingual setting. In 6th conference and labs of the evaluation forum (CLEF 2015): Experimental IR meets multilinguality, multimodality, and interaction (Vol. 23, p. 31).

  103. Poddar, S., Kattagoni, V., & Singh, N. (2015). Personality mining from biographical data with the" Adjectival Marker" Technique. In BD (pp. 39–47).

  104. Lukito, L. C., Erwin, A., Purnama, J., & Danoekoesoemo, W. (2016, October). Social media user personality classification using computational linguistic. In 2016 8th international conference on information technology and electrical engineering (ICITEE) (pp. 1–6). IEEE.

  105. Pramodh, K. C., & Vijayalata, Y. (2016, October). Automatic personality recognition of authors using big five factor model. In 2016 IEEE international conference on advances in computer applications (ICACA) (pp. 32–37). IEEE.

  106. Ong, V., Rahmanto, A. D. S., Williem, W., Suhartono, D., Nugroho, A. E., Andangsari, E. W., & Suprayogi, M. N. (11 2017). Personality prediction based on Twitter information in Bahasa Indonesia. In 2017 federated conference on computer science and information systems (FedCSIS) (pp. 367–372). https://doi.org/10.15439/2017F359.

  107. Tandera, T., Suhartono, D., Wongso, R., & Prasetio, Y. L. (2017). Personality prediction system from Facebook users. Procedia Computer Science, 116, 604–611.

    Article  Google Scholar 

  108. Ahmad, Z., Lutfi, S. L., Kushan, A. L., & Yixing, R. T. (2017). Personality prediction of Malaysian Facebook users: Cultural preferences and features variation. Advanced Science Letters, 23(8), 7900–7903.

    Article  Google Scholar 

  109. Yata, A., Kante, P., Sravani, T., & Malathi, B. (2018). Personality recognition using multi-label classification. International Research Journal of Engineering and Technology (IRJET), 5(03), 1.

    Google Scholar 

  110. Arjaria, S., Shrivastav, A., Rathore, A. S., & Tiwari, V. (2019). Personality trait identification for written texts using MLNB. In Data, engineering and applications (pp. 131–137). Springer.

  111. Artissa, Y. B. N. D., Asror, I., & Faraby, S. A. (5 2019). Personality classification based on Facebook status text using Multinomial Naïve Bayes method (p. 1192). https://doi.org/10.1088/1742-6596/1192/1/012003

  112. Ergu, İ., Işık, Z., & Yankayış, İ. (2019). Predicting personality with twitter data and machine learning models. In 2019 innovations in intelligent systems and applications conference (ASYU) (pp. 1–5). IEEE.

  113. Rohit, G. V., Bharadwaj, K. R., Hemanth, R., Pruthvi, B., & Kumar, M. (2020, August). Machine intelligence based personality prediction using social profile data. In 2020 3rd international conference on smart systems and inventive technology (ICSSIT) (pp. 1003–1008). IEEE.

  114. Ong, V., Rahmanto, A. D. S., Williem, W., Jeremy, N. H., Suhartono, D., & Andangsari, E. W. (2021). Personality modelling of Indonesian Twitter users with XGBoost based on the five factor model. International Journal of Intelligent Engineering and Systems, 14, 248–261. https://doi.org/10.22266/ijies2021.0430.22

    Article  Google Scholar 

  115. Safitri, G., & Setiawan, E. B. (2022). Optimization prediction of big five personality in twitter users. Journal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6, 85–91. https://doi.org/10.29207/resti.v6i1.3529

    Article  Google Scholar 

  116. Vu, X. S., Flekova, L., Jiang, L., & Gurevych, I. (2018, January). Lexical-semantic resources: Yet powerful resources for automatic personality classification. In Proceedings of the 9th global WORDNET conference (pp. 172–181).

  117. Fernandes, B., González-Briones, A., Novais, P., Calafate, M., Analide, C., & Neves, J. (2020). An adjective selection personality assessment method using gradient boosting machine learning. Processes, 8(5), 618.

    Article  Google Scholar 

  118. Kalghatgi, M. P., Ramannavar, M., & Sidnal, N. S. (2015). A neural network approach to personality prediction based on the big-five model. International Journal of Innovative Research in Advanced Engineering (IJIRAE), 2(8), 56–63.

    Google Scholar 

  119. Su, M. H., Wu, C. H., & Zheng, Y. T. (2016). Exploiting turn-taking temporal evolution for personality trait perception in dyadic conversations. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 733–744.

    Article  Google Scholar 

  120. Liu, F., Perez, J., & Nowson, S. (2016). A language-independent and compositional model for personality trait recognition from short texts. Preprint arXiv:1610.04345.

  121. Xianyu, H., Xu, M., Wu, Z., & Cai, L. (2016, July). Heterogeneity-entropy based unsupervised feature learning for personality prediction with cross-media data. In 2016 IEEE international conference on multimedia and Expo (ICME) (pp. 1–6). IEEE.

  122. Sun, X., Liu, B., Cao, J., Luo, J., & Shen, X. (2018, May). Who am I? Personality detection based on deep learning for texts. In 2018 IEEE international conference on communications (ICC) (pp. 1–6). IEEE.

  123. An, G., & Levitan, R. (2018, February). Lexical and acoustic deep learning model for personality recognition. In INTERSPEECH (pp. 1761–1765).

  124. Yılmaz, T., Ergil, A., & İlgen, B. (2019, October). Deep learning-based document modelling for personality detection from Turkish Texts. In Proceedings of the future technologies conference (pp. 729–736). Springer.

  125. Kazameini, A., Fatehi, S., Mehta, Y., Eetemadi, S., & Cambria, E. (2020). Personality trait detection using bagged SVM over BERT word embedding ensembles. Preprint arXiv:2010.01309.

  126. Leonardi, S., Monti, D., Rizzo, G., & Morisio, M. (2020). Multilingual transformer-based personality traits estimation. Information, 11(4), 179.

    Article  Google Scholar 

  127. Xue, X., Feng, J., & Sun, X. (2021). Semantic-enhanced sequential modeling for personality trait recognition from texts. Applied Intelligence, 51(11), 7705–7717.

    Article  Google Scholar 

  128. El-Demerdash, K., El-Khoribi, R. A., Shoman, M. A. I., & Abdou, S. (2021). Deep learning based fusion strategies for personality prediction. Egyptian Informatics Journal, 1, 1.

    Google Scholar 

  129. Christian, H., Suhartono, D., Chowanda, A., & Zamli, K. Z. (2021). Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging. Journal of Big Data, 8(1), 1–20.

    Article  Google Scholar 

  130. Jeremy, N. H., & Suhartono, D. (2021). Automatic personality prediction from Indonesian user on twitter using word embedding and neural networks. Procedia Computer Science, 179, 416–422.

    Article  Google Scholar 

  131. Mavis, G., Toroslu, I. H., & Karagoz, P. (2021). Personality analysis using classification on Turkish tweets. International Journal of Cognitive Informatics and Natural Intelligence, 15, 1–18. https://doi.org/10.4018/ijcini.287596

    Article  Google Scholar 

  132. Kosan, M. A., Karacan, H., & Urgen, B. A. (2022). Predicting personality traits with semantic structures and LSTM-based neural networks. Alexandria Engineering Journal, 61(10), 8007–8025.

    Article  Google Scholar 

  133. Majumder, N., Poria, S., Gelbukh, A., & Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems, 32(2), 74–79.

    Article  Google Scholar 

  134. Yu, J., & Markov, K. (2017, November). Deep learning based personality recognition from Facebook status updates. In 2017 IEEE 8th international conference on awareness science and technology (iCAST) (pp. 383–387). IEEE.

  135. Giménez, M., Paredes, R., & Rosso, P. (2017, April). Personality recognition using convolutional neural networks. In International conference on computational linguistics and intelligent text processing (pp. 313–323). Springer.

  136. Xue, D., Wu, L., Hong, Z., Guo, S., Gao, L., Wu, Z., & Sun, J. (2018). Deep learning-based personality recognition from text posts of online social networks. Applied Intelligence, 48(11), 4232–4246.

    Article  Google Scholar 

  137. Rahman, M. A., Al Faisal, A., Khanam, T., Amjad, M., & Siddik, M. S. (2019, May). Personality detection from text using convolutional neural network. In 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT) (pp. 1–6). IEEE.

  138. Darliansyah, A., Naeem, M. A., Mirza, F., & Pears, R. (2019). SENTIPEDE: A smart system for sentiment-based personality detection from short texts. Journal of Universal Computer Science, 25, 1323–1352. https://doi.org/10.3217/jucs-025-10-1323

    Article  Google Scholar 

  139. Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020, November). Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. In 2020 IEEE international conference on data mining (ICDM) (pp. 1184–1189). IEEE.

  140. Deilami, F. M., Sadr, H., & Nazari, M. (2022). Using machine learning based models for personality recognition. Preprint arXiv:2201.06248.

  141. Deilami, F. M., Sadr, H., & Tarkhan, M. (2022). Contextualized multidimensional personality recognition using combination of deep neural network and ensemble learning. Neural Processing Letters. https://doi.org/10.1007/s11063-022-10787-9

    Article  Google Scholar 

  142. Guan, Z., Wu, B., Wang, B., & Liu, H. (2020, July). Personality2vec: Network representation learning for personality. In 2020 IEEE 5th international conference on data science in cyberspace (DSC) (pp. 30–37). IEEE.

  143. Wang, Z., Wu, C. H., Li, Q. B., Yan, B., & Zheng, K. F. (2020). Encoding text information with graph convolutional networks for personality recognition. Applied Sciences, 10(12), 4081.

    Article  Google Scholar 

  144. Wang, Y., Zheng, J., Li, Q., Wang, C., Zhang, H., & Gong, J. (2021). Xlnet-caps: Personality classification from textual posts. Electronics (Switzerland). https://doi.org/10.3390/electronics10111360

    Article  Google Scholar 

  145. Ramezani, M., Feizi-Derakhshi, M. R., & Balafar, M. A. (2022). Knowledge graph-enabled text-based automatic personality prediction. Preprint arXiv:2203.09103.

  146. Jiang, H., Zhang, X., & Choi, J. D. (2020, April). Automatic text-based personality recognition on monologues and multiparty dialogues using attentive networks and contextual embeddings (student abstract). In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 10, pp. 13821–13822).

  147. Li, Y., Kazemeini, A., Mehta, Y., & Cambria, E. (2022). Multitask learning for emotion and personality traits detection. Neurocomputing, 493, 340–350. https://doi.org/10.1016/j.neucom.2022.04.049

    Article  Google Scholar 

  148. Celli, F. (2012, March). Unsupervised personality recognition for social network sites. In Procedings of sixth international conference on digital society (pp. 59–62).

  149. Celli, F., & Rossi, L. (2012, April). The role of emotional stability in Twitter conversations. In Proceedings of the workshop on semantic analysis in social media (pp. 10–17).

  150. Liu and Zhu proposed use of stacked AutoEncoders for unsupervised learning of Linguistic Representation Feature Vector (LRFV) based on SLIWC and FFT from Sina microblog. The features obtained were used to train a Linear Regression model and results outperform the selected baselines.

  151. Alsadhan, N., & Skillicorn, D. (2017, November). Estimating personality from social media posts. In 2017 IEEE international conference on data mining workshops (ICDMW) (pp. 350–356). IEEE.

  152. Celli, F., & Lepri, B. (2018). Is big five better than MBTI? A personality computing challenge using Twitter data. Computational Linguistics CLiC-it, 2018, 93.

    Google Scholar 

  153. Lima, A. C., & de Castro, L. N. (2013, September). Multi-label semi-supervised classification applied to personality prediction in Tweets. In 2013 BRICS congress on computational intelligence and 11th Brazilian congress on computational intelligence (pp. 195–203). IEEE.

  154. Lima, A. C. E., & De Castro, L. N. (2014). A multi-label, semi-supervised classification approach applied to personality prediction in social media. Neural Networks, 58, 122–130.

    Article  Google Scholar 

  155. Tighe, E. P., Ureta, J. C., Pollo, B. A. L., Cheng, C. K., & de Dios Bulos, R. (2016, July). Personality trait classification of essays with the application of feature reduction. In SAAIP@ IJCAI (pp. 22–28).

  156. Tighe, E., & Cheng, C. (2018, June). Modeling personality traits of Filipino twitter users. In Proceedings of the 2nd workshop on computational modelling of people’s opinions, personality, and emotions in social media (pp. 112–122).

  157. Mao, Y., Zhang, D., Wu, C., Zheng, K., & Wang, X. (2018, December). Feature analysis and optimisation for computational personality recognition. In 2018 IEEE 4th international conference on computer and communications (ICCC) (pp. 2410–2414). IEEE.

  158. Adi, G. Y. N., Tandio, M. H., Ong, V., & Suhartono, D. (2018). Optimization for automatic personality recognition on Twitter in Bahasa Indonesia. Procedia Computer Science, 135, 473–480.

    Article  Google Scholar 

  159. Carducci, G., Rizzo, G., Monti, D., Palumbo, E., & Morisio, M. (2018). Twitpersonality: Computing personality traits from tweets using word embeddings and supervised learning. Information, 9(5), 127.

    Article  Google Scholar 

  160. Dos Santos, W. R., Ramos, R. M., & Paraboni, I. (2019). Computational personality recognition from facebook text: Psycholinguistic features, words and facets. New Review of Hypermedia and Multimedia, 25(4), 268–287.

    Article  Google Scholar 

  161. Akrami, N., Fernquist, J., Isbister, T., Kaati, L., & Pelzer, B. (2019, December). Automatic extraction of personality from text: Challenges and opportunities. In 2019 IEEE international conference on big data (big data) (pp. 3156–3164). IEEE.

  162. Zheng, H., & Wu, C. (2019, February). Predicting personality using Facebook status based on semi-supervised learning. In Proceedings of the 2019 11th international conference on machine learning and computing (pp. 59–64).

  163. Tighe, E., Aran, O., & Cheng, C. (2020). Exploring neural network approaches in automatic personality recognition of Filipino Twitter users.

  164. Pabón, F. O. L., & Arroyave, J. R. O. (12 2021). Automatic personality evaluation from transliterations of YouTube Vlogs using classical and state of the art word embeddings. Ingeniería e Investigación, 42, e93803. https://doi.org/10.15446/ing.investig.93803

  165. Alamsyah, A., Putra, M. R. D., Fadhilah, D. D., Nurwianti, F., & Ningsih, E. (2018, May). Ontology modelling approach for personality measurement based on social media activity. In 2018 6th international conference on information and communication technology (ICoICT) (pp. 507–513). IEEE.

  166. Alamsyah, A., Nurwiant, F., Rachman, M. F., Hudaya, C. S., Putra, R. P., Rifkyano, A. I., & Nurwianti, F. (2019a). A progress on the personality measurement model using ontology based on social media text cite this paper personality measurement design for ontology-based plat form using social media text. In Andry Alamsyah ontology modelling approach for personality measurement based on social media activity a progress on the personality measurement model using ontology based on social media text.

  167. Alamsyah, A., Dudija, N., & Widiyanesti, S. (2021). New approach of measuring human personality traits using ontology-based model from social media data. Information (Switzerland). https://doi.org/10.3390/info12100413

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohmad Azhar Teli.

Ethics declarations

Conflict of interest

We declare there are no known competing monetary interests or personal relationships that could influence the work reported in this paper.

Limitations of the survey

We tried our best to include all the works in the domain of Automatic Personality Recognition from text, which makes comparing the computational techniques extremely difficult. Further, non-inclusion of trait theories other than the Big 5 model might have led to missing computational techniques that have better performance.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables

Table 5 Performance Comparison of APRT (Essays)

5,

Table 6 Performance Comparison of APRT (Facebook)

6,

Table 7 Performance comparison of APRT (YouTube)

7 and

Table 8 Performance comparison of APRT (Twitter)

8.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teli, M.A., Chachoo, M.A. Lingual markers for automating personality profiling: background and road ahead. J Comput Soc Sc 5, 1663–1707 (2022). https://doi.org/10.1007/s42001-022-00184-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42001-022-00184-6

Keywords

Navigation