Abstract
By means of their academic publications, authors form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors select co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Academic Search and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to discover, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities—authors, publication venues, and institutions. We go beyond traditional metrics such as paper counts, citations, and h-index. Specifically, we define metrics such as influence, connections, and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his or her connections by co-authoring with other authors, and especially from other authors with high connections. An author receives exposure by publishing in selective venues where publications have received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and the similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors’ rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors.
Similar content being viewed by others
Notes
As new publication records are added to the MAS data set from time to time, the real count keeps increasing. Therefore, the statistical information presented here is based on a snapshot taken on the dataset (only for the Computer Science field) at a certain time point.
Conceptually, the definition of CC and BCC metrics are similar to the traditional term of full and fractional citation counting.
References
Anagnostopoulos, A., Kumar, R.,&Mahdian, M. (2008). Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 7–15).
ARWU. (2012). The academic ranking of world universities (arwu by sjtu) 2012 in computer science. http://www.shanghairanking.com/SubjectCS2012.html.
Bakshy, E., Karrer, B., & Adamic, L. A. (2009). Social influence and the diffusion of user-created content. In Proceedings of the 10th ACM conference on electronic commerce (EC) (pp. 325–334).
Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436, 900.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
Bergstrom, C. (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. College and Research Libraries News, 68(5), 314–316.
Bollen, J., Rodriquez, M. A., & Van de Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669–687.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th international conference on Would Wide Web (WWW).
Budalakoti, S., & Bekkerman, R. (2012). Bimodal invitation-navigation fair bets model for authority identification in a social network. In Proceedings of the 21st international conference on World Wide Web, ACM (pp 709–718).
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with google’s pagerank algorithm. Journal of Informetrics, 1(1), 8–15.
Chin, W. S., Juan, Y. C., Zhuang, Y., Wu, F., Tung, H. Y., Yu, T., et al. (2013). Effective string processing and matching for author disambiguation. In Proceedings of the 2013 KDD Cup 2013 workshop (p 7). ACM.
Chiu, D. M., & Fu, T. Z. J. (2010). “Publish or Perish” in the Internet Age: a study of publication statistics in computer networking research. ACM Sigcomm Computer Communication Review (CCR), 40(1), 34–43.
Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., & Suri, S. (2008). Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 160–168).
Ding, Y., Yan, E., Frazho, A., & Caverlee, J. (2009). Pagerank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology, 60(11), 2229–2243.
Easley, D. A., & Kleinberg, J. M. (2010). Networks, crowds, and markets—reasoning about a highly connected world. Cambridge: Cambridge University Press.
Egghe, L. (2006). An improvement of the H-index: The G-index. ISSI Newsletter, 2(1), 8–9.
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(60), 471–479.
Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018–2019.
Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). Citeseer: An automatic citation indexing system. In Proceedings of the third ACM conference on digital libraries (pp. 89–98).
González-Pereira, B., Guerrero Bote, V. P., & Moya-Anegón, F. (2009). The SJR indicator: A new indicator of journals’ scientific prestige. arXiv:0912.4141v1.
Harzing, A. W. (2008). Reflections on the h-index. http://www.harzing.com/pop_hindex.htm/.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102, 16569–16572.
Jeong, H., Néda, Z., & Barabási, A. L. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61, 567–572.
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 48, 604–632.
Langville, A. N., & Meyer, C. D. (2009). Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton: Princeton University Press.
Ley, M. (2009). Dblp: Some lessons learned. Proceedings of the VLDB Endowment, 2(2), 1493–1500.
Leydesdorff, L., & Bornmann, L. (2011). How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science. Journal of the American Society for Information Science and Technology, 62(2), 217–229.
Li, P., Yu, J. X., Liu, H., He, J., & Du, X. (2011). Ranking individuals and groups by influence propagation. In Advances in Knowledge Discovery and Data Mining (pp. 407–419), Berlin Heidelberg: Springer.
Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.
Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Siam.
Newman, M. E. J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E, 64(025), 102.
Newman, M. E. J. (2001b). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences USA, 98(2), 404–409.
Newman, M. E. J. (2004a). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences USA, 101, 5200–5205.
Newman, M. E. J. (2004b). Who is the best connected scientist? A study of scientific coauthorship networks. In E. Ben-Naim, H. Frauenfelder & Z. Toroczkai (eds.), Complex networks (pp. 337–370). Berlin: Springer.
Nie, Z., Wen, J., & Ma, W. (2007). Object-level vertical search. In: Proceedings of the 3rd biennial conference on innovative data systems research (CIDR).
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing & Management, 12(5), 297–312.
QS (2013) The QS world university rankings by subject 2013—computer science & information systems, http://www.topuniversities.com/university-rankings/university-subject-rankings/2013/computer-science-and-information-systems/.
Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(056), 103.
Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B, 4, 131–134.
Roy, S. B., De Cock, M., Mandava, V., Savanna, S., Dalessandro, B., Perlich, C., et al. (2013). The microsoft academic search dataset and kdd cup 2013. In Proceedings of the 2013 KDD cup 2013 workshop (p 1). ACM.
Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43, 628–638.
Sekercioglu, C. H. (2008). Quantifying coauthors contributions. Science, 322(5900), 371.
de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage process. Journal of the American Society for Information Science, 27, 292–306.
Sun, Y., & Giles, C. L. (2007). Popularity weighted ranking for academic digital libraries. In Proceedings of the 29th European conference on information retrieval eesearch (ECIR 2007).
Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries (pp. 39–48), ACM.
US-News. (2010). US News Ranking—the best graduate schools in computer science. http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-science-schools/computer-science-rankings/.
Walker, D., Xie, H., Yan, K.K., Maslov, S. (2007). Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment, 2007(6), P06010 UK: IOP Publishing.
Walter, G., Bloch, S., Hunt, G., & Fisher, K. (2003). Counting on citations: A flawed way to measure quality? Medical Journal of Australia, 178, 280–281.
Waltman, L., & van Eck, N. J. (2010). The relation between eigenfactor, audience factor, and influence weight. Journal of the American Society for Information Science and Technology, 61(7), 1476–1486.
Yan, E., Ding, Y., & Sugimoto, C. R. (2011). P-rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the American Society for Information Science and Technology, 62(3), 467–477.
Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In Proceedings of IEEE International Conference on Data Mining (ICDM).
Zitt, M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. Journal of the American Society for Information Science and Technology, 59(11), 1856–1860.
Acknowledgments
We appreciate the support from the Technology Transfer Office (TBF13ENG004) of the Chinese University of Hong Kong. We also appreciate the valuable comments provided by the reviewers.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The PageRank algorithm
Given a graph \(G=(V,E)\), the PageRank algorithm can be considered as a random walk starting from any node along the edges. After an infinite number of steps, the probability that a node is visited is the PageRank value of that node.
More formally, the probability distribution of visiting each node can be derived by solving a Markov Chain. The transition matrix C’s entries \(c_{ij}\) (\(i,j=1,2,\dots , n\)) represent the transition probability that the random walk will visit node j next given that it is currently at node i. Thus, \(c_{ij}\) can be expressed as
where \(e_{ij}\) is from the adjacency matrix for the graph G. If G is the citation graph, for example, then \(e_{ij}=1\) if paper i cites paper j; else \(e_{ij}=0\).
In general, C is a substochastic matrix with rows summing to either 0 (dangling nodes, see also Brin and Page 1998, for example, representing papers with citing no other papers) or 1 (normal nodes, or papers). For each dangling node, the corresponding row is replaced by \(\frac{1}{n}{\mathbf {e}}\), so that C becomes a stochastic matrix.
In order to ensure the Markov Chain C is irreducible, hence a solution is guaranteed to exist, C is further transformed as follows:
Here, \({\mathbf {e}}\) is a special column vector with all 1s, and of dimension n.
In Eq. (2), \({\mathbf {v}}\in {\mathcal {R}}^{n}\) is a probability vector (i.e., its values are between 0 and 1, and sum to 1). It is referred to as the teleportation vector, which can be used to configure some bias into the random walk. For our purposes, we let \({\mathbf {v}} = 1/n{\mathbf {e}}\) as the default setting.
Now, according to the Perron–Frobenius Theorem (Langville and Meyer 2009; Meyer 2000), matrix \(\widetilde{C}\) is stochastic, irreducible, and aperiodic, and the equation
which can be solved by iteration methods in practice.
Appendix 2: Definition of metrics in matrix form
We list the matrix form for the five metrics discussed in the previous sections in Table 18.
Rights and permissions
About this article
Cite this article
Fu, T.Z.J., Song, Q. & Chiu, D.M. The academic social network. Scientometrics 101, 203–239 (2014). https://doi.org/10.1007/s11192-014-1356-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1356-x