The academic social network

Fu, Tom Z. J.; Song, Qianqian; Chiu, Dah Ming

doi:10.1007/s11192-014-1356-x

The academic social network

Published: 16 July 2014

Volume 101, pages 203–239, (2014)
Cite this article

Scientometrics Aims and scope Submit manuscript

Tom Z. J. Fu¹,
Qianqian Song³ &
Dah Ming Chiu²

2550 Accesses
17 Citations
Explore all metrics

Abstract

By means of their academic publications, authors form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors select co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Academic Search and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to discover, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities—authors, publication venues, and institutions. We go beyond traditional metrics such as paper counts, citations, and h-index. Specifically, we define metrics such as influence, connections, and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his or her connections by co-authoring with other authors, and especially from other authors with high connections. An author receives exposure by publishing in selective venues where publications have received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and the similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors’ rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Social Network Theories: An Overview

The bibliometric analysis of scholarly production: How great is the impact?

Article Open access 28 July 2015

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations

Article 21 September 2020

Notes

As new publication records are added to the MAS data set from time to time, the real count keeps increasing. Therefore, the statistical information presented here is based on a snapshot taken on the dataset (only for the Computer Science field) at a certain time point.
Conceptually, the definition of CC and BCC metrics are similar to the traditional term of full and fractional citation counting.

References

Anagnostopoulos, A., Kumar, R.,&Mahdian, M. (2008). Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 7–15).
ARWU. (2012). The academic ranking of world universities (arwu by sjtu) 2012 in computer science. http://www.shanghairanking.com/SubjectCS2012.html.
Bakshy, E., Karrer, B., & Adamic, L. A. (2009). Social influence and the diffusion of user-created content. In Proceedings of the 10th ACM conference on electronic commerce (EC) (pp. 325–334).
Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436, 900.
Article Google Scholar
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
Article MathSciNet Google Scholar
Bergstrom, C. (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. College and Research Libraries News, 68(5), 314–316.
Google Scholar
Bollen, J., Rodriquez, M. A., & Van de Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669–687.
Article Google Scholar
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th international conference on Would Wide Web (WWW).
Budalakoti, S., & Bekkerman, R. (2012). Bimodal invitation-navigation fair bets model for authority identification in a social network. In Proceedings of the 21st international conference on World Wide Web, ACM (pp 709–718).
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with google’s pagerank algorithm. Journal of Informetrics, 1(1), 8–15.
Article Google Scholar
Chin, W. S., Juan, Y. C., Zhuang, Y., Wu, F., Tung, H. Y., Yu, T., et al. (2013). Effective string processing and matching for author disambiguation. In Proceedings of the 2013 KDD Cup 2013 workshop (p 7). ACM.
Chiu, D. M., & Fu, T. Z. J. (2010). “Publish or Perish” in the Internet Age: a study of publication statistics in computer networking research. ACM Sigcomm Computer Communication Review (CCR), 40(1), 34–43.
Article Google Scholar
Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., & Suri, S. (2008). Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 160–168).
Ding, Y., Yan, E., Frazho, A., & Caverlee, J. (2009). Pagerank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology, 60(11), 2229–2243.
Article Google Scholar
Easley, D. A., & Kleinberg, J. M. (2010). Networks, crowds, and markets—reasoning about a highly connected world. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Egghe, L. (2006). An improvement of the H-index: The G-index. ISSI Newsletter, 2(1), 8–9.
MathSciNet Google Scholar
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(60), 471–479.
Article Google Scholar
Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018–2019.
Article Google Scholar
Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). Citeseer: An automatic citation indexing system. In Proceedings of the third ACM conference on digital libraries (pp. 89–98).
González-Pereira, B., Guerrero Bote, V. P., & Moya-Anegón, F. (2009). The SJR indicator: A new indicator of journals’ scientific prestige. arXiv:0912.4141v1.
Harzing, A. W. (2008). Reflections on the h-index. http://www.harzing.com/pop_hindex.htm/.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102, 16569–16572.
Article Google Scholar
Jeong, H., Néda, Z., & Barabási, A. L. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61, 567–572.
Article Google Scholar
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 48, 604–632.
Article MathSciNet Google Scholar
Langville, A. N., & Meyer, C. D. (2009). Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton: Princeton University Press.
Google Scholar
Ley, M. (2009). Dblp: Some lessons learned. Proceedings of the VLDB Endowment, 2(2), 1493–1500.
Article MathSciNet Google Scholar
Leydesdorff, L., & Bornmann, L. (2011). How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science. Journal of the American Society for Information Science and Technology, 62(2), 217–229.
Article Google Scholar
Li, P., Yu, J. X., Liu, H., He, J., & Du, X. (2011). Ranking individuals and groups by influence propagation. In Advances in Knowledge Discovery and Data Mining (pp. 407–419), Berlin Heidelberg: Springer.
Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.
Article Google Scholar
Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Siam.
Newman, M. E. J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E, 64(025), 102.
Google Scholar
Newman, M. E. J. (2001b). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences USA, 98(2), 404–409.
Article MATH Google Scholar
Newman, M. E. J. (2004a). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences USA, 101, 5200–5205.
Article Google Scholar
Newman, M. E. J. (2004b). Who is the best connected scientist? A study of scientific coauthorship networks. In E. Ben-Naim, H. Frauenfelder & Z. Toroczkai (eds.), Complex networks (pp. 337–370). Berlin: Springer.
Nie, Z., Wen, J., & Ma, W. (2007). Object-level vertical search. In: Proceedings of the 3rd biennial conference on innovative data systems research (CIDR).
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing & Management, 12(5), 297–312.
Article Google Scholar
QS (2013) The QS world university rankings by subject 2013—computer science & information systems, http://www.topuniversities.com/university-rankings/university-subject-rankings/2013/computer-science-and-information-systems/.
Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(056), 103.
Google Scholar
Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B, 4, 131–134.
Article Google Scholar
Roy, S. B., De Cock, M., Mandava, V., Savanna, S., Dalessandro, B., Perlich, C., et al. (2013). The microsoft academic search dataset and kdd cup 2013. In Proceedings of the 2013 KDD cup 2013 workshop (p 1). ACM.
Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43, 628–638.
Article Google Scholar
Sekercioglu, C. H. (2008). Quantifying coauthors contributions. Science, 322(5900), 371.
Article Google Scholar
de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Article Google Scholar
de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage process. Journal of the American Society for Information Science, 27, 292–306.
Article Google Scholar
Sun, Y., & Giles, C. L. (2007). Popularity weighted ranking for academic digital libraries. In Proceedings of the 29th European conference on information retrieval eesearch (ECIR 2007).
Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries (pp. 39–48), ACM.
US-News. (2010). US News Ranking—the best graduate schools in computer science. http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-science-schools/computer-science-rankings/.
Walker, D., Xie, H., Yan, K.K., Maslov, S. (2007). Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment, 2007(6), P06010 UK: IOP Publishing.
Walter, G., Bloch, S., Hunt, G., & Fisher, K. (2003). Counting on citations: A flawed way to measure quality? Medical Journal of Australia, 178, 280–281.
Google Scholar
Waltman, L., & van Eck, N. J. (2010). The relation between eigenfactor, audience factor, and influence weight. Journal of the American Society for Information Science and Technology, 61(7), 1476–1486.
Article Google Scholar
Yan, E., Ding, Y., & Sugimoto, C. R. (2011). P-rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the American Society for Information Science and Technology, 62(3), 467–477.
Google Scholar
Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In Proceedings of IEEE International Conference on Data Mining (ICDM).
Zitt, M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. Journal of the American Society for Information Science and Technology, 59(11), 1856–1860.
Article Google Scholar

Download references

Acknowledgments

We appreciate the support from the Technology Transfer Office (TBF13ENG004) of the Chinese University of Hong Kong. We also appreciate the valuable comments provided by the reviewers.

Author information

Authors and Affiliations

Illinois at Singapore Pte Ltd, Advanced Digital Sciences Center (ADSC), 1, Fusionopolis Way, #08-10, Connexis North Tower, Singapore, 138632, Singapore
Tom Z. J. Fu
Department of Information Engineering, The Chinese University of Hong Kong, Room 836, Ho Sin Hang Engineering Building, Shatin, NT, Hong Kong
Dah Ming Chiu
Department of Information Engineering, The Chinese University of Hong Kong, Room 725, Ho Sin Hang Engineering Building, Shatin, NT, Hong Kong
Qianqian Song

Authors

Tom Z. J. Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qianqian Song
View author publications
You can also search for this author in PubMed Google Scholar
Dah Ming Chiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Z. J. Fu.

Appendices

Appendix 1: The PageRank algorithm

Given a graph $G=(V,E)$, the PageRank algorithm can be considered as a random walk starting from any node along the edges. After an infinite number of steps, the probability that a node is visited is the PageRank value of that node.

More formally, the probability distribution of visiting each node can be derived by solving a Markov Chain. The transition matrix C’s entries $c_{ij}$ ($i,j=1,2,\dots , n$) represent the transition probability that the random walk will visit node j next given that it is currently at node i. Thus, $c_{ij}$ can be expressed as

$$\begin{aligned} c_{ij} = {\text {Prob}}(j|i) = \frac{e_{ij}}{\sum _k e_{ik}} \end{aligned}$$

(1)

where $e_{ij}$ is from the adjacency matrix for the graph G. If G is the citation graph, for example, then $e_{ij}=1$ if paper i cites paper j; else $e_{ij}=0$.

In general, C is a substochastic matrix with rows summing to either 0 (dangling nodes, see also Brin and Page 1998, for example, representing papers with citing no other papers) or 1 (normal nodes, or papers). For each dangling node, the corresponding row is replaced by $\frac{1}{n}{\mathbf {e}}$, so that C becomes a stochastic matrix.

In order to ensure the Markov Chain C is irreducible, hence a solution is guaranteed to exist, C is further transformed as follows:

$$\begin{aligned} \widetilde{C} = \alpha C + (1-\alpha ){\mathbf {e}}{\mathbf {v}}^{\mathrm{{T}}}, \;\;\alpha \in (0,1). \end{aligned}$$

(2)

Here, ${\mathbf {e}}$ is a special column vector with all 1s, and of dimension n.

In Eq. (2), ${\mathbf {v}}\in {\mathcal {R}}^{n}$ is a probability vector (i.e., its values are between 0 and 1, and sum to 1). It is referred to as the teleportation vector, which can be used to configure some bias into the random walk. For our purposes, we let ${\mathbf {v}} = 1/n{\mathbf {e}}$ as the default setting.

Now, according to the Perron–Frobenius Theorem (Langville and Meyer 2009; Meyer 2000), matrix $\widetilde{C}$ is stochastic, irreducible, and aperiodic, and the equation

$$\begin{aligned} \pi ^{\mathrm{{T}}}=\alpha \pi ^{\mathrm{{T}}}C+(1-\alpha )\frac{1}{n}{\mathbf {e}}^{\mathrm{{T}}},\;\;\alpha \in (0,1) \end{aligned}$$

(3)

which can be solved by iteration methods in practice.

Appendix 2: Definition of metrics in matrix form

We list the matrix form for the five metrics discussed in the previous sections in Table 18.

Table 18 Notations and derivations of the ranking metrics

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, T.Z.J., Song, Q. & Chiu, D.M. The academic social network. Scientometrics 101, 203–239 (2014). https://doi.org/10.1007/s11192-014-1356-x

Download citation

Received: 20 June 2013
Published: 16 July 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11192-014-1356-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The academic social network

Abstract

Access this article

Similar content being viewed by others

Social Network Theories: An Overview

The bibliometric analysis of scholarly production: How great is the impact?

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The PageRank algorithm

Appendix 2: Definition of metrics in matrix form

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The academic social network

Abstract

Access this article

Similar content being viewed by others

Social Network Theories: An Overview

The bibliometric analysis of scholarly production: How great is the impact?

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The PageRank algorithm

Appendix 2: Definition of metrics in matrix form

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation