Skip to main content
Log in

Citation contexts as a data source for evaluation of scholarly consumption

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In recent years, large datasets of citation contexts from research publications have become available for scientometric studies. Such citation contexts contain different characteristics of relationships between citing and cited papers, including information about publications that were in some way used by citing authors, about the motivations of this use, etc. Some of these characteristics can be considered as indicators of scholarly consumption of the citing authors. Based on the citation contexts data, the scholarly consumption can be characterized by four indicators: (a) data on cited (consumed) publications and their authors (suppliers); (b) types of scholarly consumption; (c) its thematics; and (d) temporary changes in these data. The indicators can be grouped and merged in various ways based on belonging to common citation contexts and/or on the coincidence of their values. By this way, one can create datasets for various objects and tasks of scientometric evaluation of scholarly consumption. The article proposes a general approach for building the scholarly consumption indicators, and presents the results of the experiments on evaluating a thematic structure of scholarly consumption. For this, thematically significant groups of words (topics) were selected from the citation contexts by using the LDA topic modeling method. Topics are obtained from the citation contexts for three groups of publications: (1) publications of a given author, (2) publications cited by a given author (suppliers), and (3) publications citing a given author (consumers). Thematic structures of scholarly consumption for a given author, as well as for his suppliers and consumers have been built. The features of the thematic structure representation in the forms of a tree of words and a flowchart are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

Supplementary material is available at https://clck.ru/SpUFH.

Code availability

Links to the code provided in the paper.

Notes

  1. https://www.ranepa.ru/eng/.

  2. see data about all of them at the Cirtec project workplacehttp://cirtec.ranepa.ru/.

  3. https://authors.repec.org/.

  4. http://citec.repec.org/.

  5. https://github.com/ufal/udpipe.

  6. list of stop-wordshttps://github.com/sparinov/CitEcCyr/blob/master/stopwords_en.txt.

  7. https://radimrehurek.com/gensim/models/ldamodel.html.

  8. https://developers.google.com/chart/interactive/docs/gallery/wordtree.

  9. https://www.amcharts.com/demos/traceable-sankey-diagram/.

References

  • Bakhti, K., Niu, Z., Yousif, A., & Nyamawe, A. S. (2018). Citation Function Classification Based on Ontologies and Convolutional Neural Networks. In International Workshop on Learning Technology for Education in Cloud (Pp. 105115). Springer, Cham.

  • Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval co-located with 36th European Conference on Information Retrieval (ECIR 2014) (Vol. 1143, pp. 512). (2014).

  • Bertin, M., Atanassova, I., Sugimoto, C. R., et al. (2016). The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics, 109(14171434), 2016. https://doi.org/10.1007/s11192-016-2134-8

    Article  Google Scholar 

  • Bertin, M., Jonin, P., Armetta, F., & Atanassova, I. (2019). Identifying the conceptual space of citation contexts using coreferences. Proceedings of the 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019) co-located with the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Paris, France, July 25, 2019.

  • Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 7784. https://doi.org/10.1145/2133806.2133826

    Article  Google Scholar 

  • Daud, A., Li, J., Zhou, L., and Muhammad, F. (2010). Knowledge discovery through directed probabilistic topic models: a survey. In Proceedings of Frontiers of Computer Science in China, 280301.

  • DeRose, S. J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1), 3139.

    Google Scholar 

  • Hernandez-Alvarez, M., Soriano, J. M. G., & Martínez-Barco, P. (2017). Citation function, polarity and influence classification. Natural Language Engineering, 23(4), 561588.

    Article  Google Scholar 

  • Ihsan, I., & Qadir, M. (2019). CCRO: Citations context & reasons ontology. IEEE Access, 7, 3042330436.

    Article  Google Scholar 

  • Iqbal, S., Hassan, S. U., Aljohani, N. R., Alelyani, S., Nawaz, R., & Bornmann, L. (2020). A decade of in-text citation analysis based on natural language processing and machine learning techniques: An overview of empirical studies. arXiv preprint arXiv: 2008.13020. https://arxiv.org/pdf/2008.13020.pdf

  • Jebari, C., Cobo, M. J., & Herrera-Viedma, E. (2018). A new approach for implicit citation extraction. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 121129). Springer, Cham. https://doi.org/10.1007/978-3-030-03496-2_14

  • Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2018). Latent Dirichlet Allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 1516915211.

    Article  Google Scholar 

  • Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2016). Citation classification for behavioral analysis of a scientific field. arXiv preprint arXiv: 1609.00435. https://arxiv.org/pdf/1609.00435.pdf

  • Kilicoglu, H., Peng, Z., Tafreshi, S., Tran, T., Rosemblat, G., & Schneider, J. (2019). Confirm or refute: A comparative study on citation sentiment classification in clinical research publications. Journal of Biomedical Informatics, 91, 103123.

    Article  Google Scholar 

  • Kim, H. J., An, J., Jeong, Y. K., & Song, M. (2016). Exploring the leading authors and journals in major topics by citation sentences and topic modeling. In Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL) (Pp. 4250).

  • Knoth, P., & Herrmannova, D. (2014). Towards semantometrics: A new semantic similarity based measure for assessing a research publications contribution. D-Lib Magazine, 20(11), 8.

    Google Scholar 

  • Kogalovsky, M., Krichel, T., Lyapunov, V., Medvedeva, O., Parinov, S., & Sergeeva, V. (2019). Open citation content data. In E. Garoufallou, F. Sartori, R. Siatri, & M. Zervas (Eds.), Metadata and semantic research. MTSR 2018. Communications in computer and information science. Springer.

    Google Scholar 

  • Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed., p. 413). Sage.

    Google Scholar 

  • Leydesdorff, L., & Nerghes, A. (2017). Co-word maps and topic modeling: A comparison using small and medium-sized corpora (N< 1,000). Journal of the Association for Information Science and Technology, 68(4), 10241035.

    Article  Google Scholar 

  • Lyu, D., Ruan, X., Xie, J., & Cheng, Y. (2021). The classification of citing motivations: A meta-synthesis. Scientometrics. https://doi.org/10.1007/s11192-021-03908-z

    Article  Google Scholar 

  • Milojević, S. (2014). Network analysis and indicators. In Y. Ding, R. Rousseau, & D. Wolfram (Eds.), Measuring Scholarly Impact. Cham: Springer. https://doi.org/10.1007/978-3-319-10377-8_3.

    Chapter  Google Scholar 

  • Nielsen, B. L., Skau, S. L., Meier, F., & Larsen, B. (2019). Optimal citation context window sizes for biomedical retrieval. In BIR@ ECIR (pp. 5163).

  • Parinov, S., & Antonova, V. (2020). Citation content/context data as a source for research cooperation analysis. International Journal of Metadata, Semantics and Ontologies, 14(2), 149157.

    Article  Google Scholar 

  • Parinov, S., Bakarov, A., & Vodolazcky, D. (2020). Layout logical labelling and finding the semantic relationships between citing and cited paper content. International Journal of Metadata, Semantics and Ontologies, 14(1), 5462.

    Article  Google Scholar 

  • Radicchi, F., Fortunato, S., & Vespignani, A. (2012). Citation networks. In Models of Science Dynamics: Encounters Between Complexity Theory and Information Sciences (pp. 233–257). (Understanding Complex Systems). https://doi.org/10.1007/978-3-642-23068-4_7.

  • Savić, M., Ivanović, M., & Jain, L. C. (2019). Complex Networks in Software, Knowledge, and Social Systems. Springer International Publishing. https://doi.org/10.1007/978-3-319-91196-0_5.

    Book  Google Scholar 

  • Tahamtan, I., & Bornmann, L. (2018). Core elements in the process of citing publications: Conceptual overview of the literature. Journal of Informetrics, 12(1), 203216. https://doi.org/10.1016/j.joi.2018.01.002

    Article  Google Scholar 

  • Tahamtan, I., & Bornmann, L. (2019). What do citation counts measure An updated review of studies on citations in scientific documents published between 2006 and 2018. Scientometrics, 121, 1635. https://doi.org/10.1007/s11192-019-03243-4

    Article  Google Scholar 

  • Vayansky, I., & Kumar, S. A. (2020). A review of topic modeling methods. Information Systems, 94, 101582. https://doi.org/10.1016/j.is.2020.101582

    Article  Google Scholar 

  • Yousif, A., Niu, Z., Nyamawe, A. S., & Hu, Y. (2018). Improving citation sentiment and purpose classification using hybrid deep neural network model. In International Conference on Advanced Intelligent Systems and Informatics (pp. 327336). Springer, Cham.

Download references

Acknowledgements

A part of this study, which is a development of an approach for using data from the citation contexts of research publications for the supercomputer simulation of interactions among agents and the research community environment, was funded by the RSF grant (project No. 19-18-00240).

Funding

Partial financial support, to develop an approach of using data from the citation context of research publications for supercomputer simulation of interactions among agents and the research community environment, was received from the Russian Science Foundation (RSF project No. 19-18-00240).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Parinov.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parinov, S. Citation contexts as a data source for evaluation of scholarly consumption. Scientometrics 126, 9249–9265 (2021). https://doi.org/10.1007/s11192-021-04165-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04165-w

Keywords

Navigation