Elsevier

Journal of Informetrics

Volume 13, Issue 3, August 2019, Pages 793-803
Journal of Informetrics

Regular article
Mapping the backbone of the Humanities through the eyes of Wikipedia

https://doi.org/10.1016/j.joi.2019.07.002Get rights and content

Highlights

  • We propose a reproducible methodology to map scientific knowledge in Wikipedia.

  • We analyze how scientific knowledge is established in the field of the Humanities.

  • The citation average to Humanities articles in Wikipedia is lower than the general.

  • Of the 25 most cited journals on Wikipedia, none is open access.

  • History is the key specialty that connects with other areas of humanistic knowledge.

Abstract

The present study aims to establish a valid method by which to apply the co-citation methodology to Wikipedia article references and, subsequently, to map these relationships between scientific papers. This method, originally applied to scientific literature, will be transferred to the digital environment of collective knowledge generation. To this end, a dataset containing Wikipedia references collected from Altmetric and Scopus’ Journal Metrics journals has been used. The articles have been categorized according to the disciplines and specialties established in the All Science Journal Classification (ASJC). They have also been grouped by journal of publication. A set of articles in the Humanities, comprising 25555 Wikipedia articles with 41655 references to 32245 resources, has been selected. Finally, a descriptive statistical study has been conducted and co-citations have been mapped using networks and indicators of degree and betweenness centrality.

Introduction

When Wikipedia was created in 2001 (DiBona, Cooper, & Stone, 2006), few could have imagined that in a short time a voluntary, collective project would become the main encyclopedic work of reference for a large part of Humanity. The birth of Wikipedia, in the middle of the dot-com bubble, occurred during the prelude to the emergence of the Web 2.0 paradigm (O’Reilly, 2005) and was destined to become one of the greatest exponents of the Web’s ability to activate the collective intelligence of Internet users (Surowiecki, 2005). In January 2018, 17 years later, the English language version of Wikipedia accounted for 5.5 million of the 47 million articles in the more than 290 editions of Wikipedia; although it had more than 32 million registered users only 123 966 were active editors.1 The Wikipedia in English—its largest edition—represents approximately 11.7% of the whole of Wikipedia, creating more than 600 new articles per day in 2017. According to the Community Engagement Insights 2018 Report2, prepared by the Wikimedia Foundation, 85% of contributors to Wikimedia communities have post-secondary education (12% have a doctorate).

According to Alexa,3 at the beginning of 2018, Wikipedia ranked 5th among the most visited websites in the world with a remarkable 66.4% of traffic received coming from user searches. These data refer to organic traffic received by the website and demonstrate that, for a wide variety of terms, Wikipedia is one of the first options that search engines offer as a relevant result on the Web. Hence, it constitutes a much-used reference resource that is of great importance for educational purposes in Science, the Humanities, and other fields. For example, as an encyclopedic digital project, Wikipedia is considered a "very fertile ground for the creation of innovative projects related to the Digital Humanities"4 . It is argued that Wikipedia might be the best and the largest educational platform in history (Tramullas, 2016).

Wikipedia is conceived of as a tool for the dissemination of knowledge through articles generated by its users under Creative Commons licenses (attribution-share alike). Wikipedia has overtaken its competitors by revolutionizing the industry through a profound epistemological transformation that focuses on the social dimension (Fallis, 2008; Fuchs, 2008). Over time, Wikipedia has developed complex rules—generated by the community itself—that are not rigid and remain subject to revision but, at the same time, are strictly observed. Articles should always be verifiable and have reliable sources. Insofar as encyclopedic content is concerned, secondary sources that are "reliable, independent and published" prevail. Among these, particular mention is made of specialized publications:

"Many Wikipedia articles rely on scholarly material. When available, academic and peer-reviewed publications, scholarly monographs, and textbooks are usually the most reliable sources. However, some scholarly material may be outdated, in competition with alternative theories, or controversial within the relevant field. Try to cite current scholarly consensus when available, recognizing that this is often absent. Reliable non-academic sources may also be used in articles about scholarly issues, particularly material from high-quality mainstream publications. Deciding which sources are appropriate depends on context. Material should be attributed in-text where sources disagree."5

At the same time, from the perspective of scientific knowledge evaluation, in recent years digital indicators have been used as an alternative measure of academic impact: the so-called altmetrics indicators (Piwowar, 2013a, 2013b; Priem, Taraborelli, Groth, & Neylon, 2010; Torres-Salinas, Cabezas-Clavijo, & Jiménez-Contreras, 2013).

In this context, Wikipedia faces a dual challenge: on the one hand, the call to guarantee rigor in Wikipedia contents by referencing articles published in scientific journals; on the other, the opportunity to use Wikipedia references to scientific articles as a highly valuable altmetric information source to assess the social impact of research. Evidence of the value of references included in Wikipedia is its high weighting in a synthetic indicator such as the Altmetric Attention Score6 . In this indicator, Wikipedia articles receive a rating of 3, which is higher than those corresponding to mentions on Twitter (1) or Facebook (0.25), but lower than references to news feeds (8) and blogs (5).

The connection between Wikipedia as a social platform and scientific articles has been explored in different ways. For example, through the analysis of reference and citation patterns in a specific scientific area (Serrano-López, Ingwersen, & Sanz-Casado, 2017), as a platform for the promotion of open access scientific literature (Teplitskiy, Lu, & Duede, 2016), or by exploring its limitations as a source in the evaluation of scientific activity (Kousha & Thelwall, 2016). Knowledge representation has also been formulated through reference maps connecting articles (Silva, Viana, Travençolo, & Costa, 2011), or by analyzing differences between the Universal Decimal Classification (UDC) category structure and that generated by Wikipedia itself (Salah, Gao, Suchecki, & Scharnhorst, 2012).

From a bibliometric perspective, co-citations constitute a classic instrument (Small, 1973) that allows knowledge to be mapped by taking account of common references received from a third document. Co-citations can be interpreted as a measure of the similarity between two documents. This approach has been used to observe the connections between words (Leydesdorff & Nerghes, 2017), or between areas of knowledge through scientific articles (Leydesdorff, Carley, & Rafols, 2012). More recently, with the development of the Web, this concept has been transferred to this new space by discussing co-link analysis (Thelwall, 2009)—an approach based on sites or web pages that simultaneously link to other sites or web pages. Co-link analysis has proved a useful means of revealing the cognitive or intellectual structure of a field of study (Zuccala, 2006). Moreover, it has allowed investigators to broaden their scope of study beyond scientific production, having been applied to business (Vaughan & Romero-Frías, 2010), politics (Romero-Frías & Vaughan, 2010) or universities (Vaughan, Kipp, & Gao, 2007).

In this regard, to our knowledge, no study has used Wikipedia as a reference to map science by extrapolating classical co-citation methodology to this digital platform in order to discover the structure of journals corresponding to different areas of knowledge and different disciplines. With this approach, scientific knowledge could be mapped from a social perspective, thus offering a radically different view to that of the traditional maps constructed from the relationships between the scientific studies themselves. This approach is in line with the proposal made by Costas, de Rijcke, and Marres (2017) for the study of co-social mediation interaction. Based on this framework, we have focused on the Humanities in order to achieve the following objectives:

  • 1

    to establish a methodology to transfer co-citation methodology to a digital environment taking as a reference an altmetric indicator linked to the collective generation of knowledge in Wikipedia; and,

  • 2

    to analyze how scientific knowledge is established in the field of the Humanities as this is represented in Wikipedia.

Section snippets

Information sources and data processing

This study uses Altmetric.com as its source of information and the Altmetric Explorer to extract the references to scientific articles that are included in Wikipedia articles. To do this we have used the platform’s download functions to obtain a csv file in which each scientific article appears with its basic data and information about the Wikipedia article in which it is referenced. So, all the scientific articles indexed in Altmetric.com and cited in Wikipedia have been downloaded. We have

General data and annual evolution

Table 1 shows descriptive statistics of the Wikipedia article references to scientific articles published in Scopus journals, and of the citations received by these scientific articles both for the whole of Wikipedia (global) and for the Humanities discipline. Note that we only take account of Wikipedia articles that include at least one citation to a scientific journal and scientific journals referenced at least once in Wikipedia. Hence, the minimum mean for references is 1. In total, 784209

Conclusions

In the present study, we have extrapolated the methodology for representing science on the basis of co-citation maps to a different context. Traditionally, science maps have been drawn up from scientific articles, using large databases such as the Web of Science or Scopus and demonstrating their validity as a means of establishing relationships between areas and of determining the structure of science from the scientific knowledge itself (Noyons & Van Raan, 1998). In the present study, these

Author contributions

Daniel Torres – Salinas: Conceived and designed the analysis, Performed the analysis.

Esteban Romero – Frias: Performed the analysis, Wrote the paper.

Wenceslao Arroyo – Machado: Collected the data, Contributed data or analysis tools.

Acknowledgements

This work has been possible thanks to financial support from “Knowmetrics: knowledge evaluation in digital society”, a project funded by scientific research team grants from the BBVA Foundation, 2016. We thank Altmetric.com for the transfer of the data that has allowed us to conduct this study and Elsevier's Research Trends for the figure from the study of Richardson (2013).

References (34)

  • F.N. Silva et al.

    Investigating relationships within and between category networks in Wikipedia

    Journal of Informetrics

    (2011)
  • D. Torres-Salinas et al.

    Mapping citation patterns of book chapters in the Book Citation Index

    Journal of Informetrics

    (2013)
  • R. Costas et al.

    Beyond the dependencies of altmetrics: Conceptualizing ‘heterogeneous couplings’ between social media and science

    The 2017 Altmetrics Workshop

    (2017)
  • C. DiBona et al.

    Open sources 2.0: The continuing evolution

    (2006)
  • D. Fallis

    Toward an epistemology of Wikipedia

    Journal of the American Society for Information Science and Technology

    (2008)
  • C. Fuchs

    Internet and society: Social theory in the information age

    (2008)
  • K. Kousha et al.

    Are Wikipedia citations important evidence of the impact of scholarly articles and books?

    Journal of the Association for Information Science and Technology

    (2016)
  • L. Leydesdorff et al.

    Global maps of science based on the new Web-of-Science categories

    Scientometrics

    (2012)
  • L. Leydesdorff et al.

    Journal maps on the basis of Scopus data: A comparison with the Journal Citation Reports of the ISI

    Journal of the American Society for Information Science and Technology

    (2010)
  • L. Leydesdorff et al.

    The structure of the Arts & Humanities Citation Index: A mapping on the basis of aggregated citations among 1,157 journals

    Journal of the Association for Information Science and Technology

    (2011)
  • L. Leydesdorff et al.

    Co‐word maps and topic modeling: A comparison using small and medium‐sized corpora (N< 1,000)

    Journal of the Association for Information Science and Technology

    (2017)
  • K.W. McCain

    Mapping authors in intellectual space: A technical overview

    Journal of the American Society for Information Science

    (1990)
  • F. Moya-Anegón et al.

    A new technique for building maps of large scientific domains based on the cocitation of classes and categories

    Scientometrics

    (2004)
  • E.C. Noyons et al.

    Advanced mapping of science and technology

    Scientometrics

    (1998)
  • T. O’Reilly

    What is web 2.0? Design patterns and business models for the next generation of software

    (2005)
  • H. Piwowar

    Altmetrics: Value all research products

    Nature

    (2013)
  • H. Piwowar

    Introduction altmetrics: What, why and where?

    Bulletin of the American Society for Information Science and Technology

    (2013)
  • Cited by (0)

    View full text