Abstract
Being able to effectively measure similarity between patents in a complex patent citation network is a crucial task in understanding patent relatedness. In the past, techniques such as text mining and keyword analysis have been applied for patent similarity calculation. The drawback of these approaches is that they depend on word choice and writing style of authors. Most existing graph-based approaches use common neighbor-based measures, which only consider direct adjacency. In this work we propose new similarity measures for patents in a patent citation network using only the patent citation network structure. The proposed similarity measures leverage direct and indirect co-citation links between patents. A challenge is when some patents receive a large number of citations, thus are considered more similar to many other patents in the patent citation network. To overcome this challenge, we propose a normalization technique to account for the case where some pairs are ranked very similar to each other because they both are cited by many other patents. We validate our proposed similarity measures using US class codes for US patents and the well-known Jaccard similarity index. Experiments show that the proposed methods perform well when compared to the Jaccard similarity index.
Similar content being viewed by others
References
Amancio, D. R., Oliveira, O. N, Jr, & Costa, L. F. (2012a). Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts. Physica A: Statistical Mechanics and its Applications, 391(18), 4406–4419.
Amancio, D. R., Oliveira, O. N, Jr, & Costa, L. F. (2012b). On the use of topological features and hierarchical characterization for disambiguating names in collaborative networks. EPL (Europhysics Letters), 99(4), 48002.
Atallah, G., & Rodriguez, G. (2006). Indirect patent citations. Scientometrics, 67(3), 437–465.
Breschi, S., Lissoni, F., & Malerba, F. (2003). Knowledge-relatedness in firm technological diversification. Research Policy, 32(1), 69–87.
Cascini, G., & Zini, M. (2008). Measuring patent similarity by comparing inventions functional trees. In G. Cascini (Ed.), Computer-Aided Innovation (CAI), volume 277 of The International Federation for Information Processing (pp. 31–42). USA: Springer.
Cook, D. J., & Holder, L. B. (2006). Mining graph data. London: Wiley-Interscience.
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics: Quantitative methods in library, documentation and information science. Elsevier Science Ltd.
Egghe, L., & Rousseau, R. (2002). Co-citation, bibliographic coupling and a characterization of lattice citation networks. Scientometrics, 55(3), 349–361.
Gnyawali, D. R., & Park, B.-J. R. (2011). Co-opetition between giants: Collaboration with competitors for technological innovation. Research Policy, 40(5), 650–663.
Gress, B. (2010). Properties of the uspto patent citation network: 1963–2002. World Patent Information, 32(1), 3–21.
Gualdi, S., Medo, M., & Zhang, Y.-C. (2011). Influence, originality and similarity in directed acyclic graphs. EPL (Europhysics Letters), 96(1), 18004.
Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10–25.
Kim, B., Gazzola, G., Lee, J.-M., Kim, D., Kim, K., & Jeong, M. K. (2014a). Inter-cluster connectivity analysis for technology opportunity discovery. Scientometrics, 98(3), 1811–1825.
Kim, E., Cho, Y., & Kim, W. (2014b). Dynamic patterns of technological convergence in printed electronics technologies: Patent citation network. Scientometrics, 98(2), 975–998.
Larkey, L. S. (1999). A patent search and classification system. In Proceedings of DL-99, 4th ACM conference on digital libraries (pp. 179–187). New York: ACM.
Lin, Y., Chen, J., & Chen, Y. (2011). Backbone of technology evolution in the modern era automobile industry: An analysis by the patents citation network. Journal of Systems Science and Systems Engineering, 20(4), 416–442.
Meng, B., Ke, H., & Yi, T. (2011). Link prediction based on a semi-local similarity index. Chinese Physics B, 20(12), 128902.
Moehrle, M. G., & Gerken, J. M. (2012). Measuring textual patent similarity on the basis of combined concepts: design decisions and their consequences. Scientometrics, 91(3), 805–826.
Narin, F. (1994). Patent bibliometrics. Scientometrics, 30(1), 147–155.
Newman, M. E. J. (2010). Networks: An Introduction. Oxford: Oxford University Press.
No, H. J., & Park, Y. (2010). Trajectory patterns of technology fusion: Trend analysis and taxonomical grouping in nanobiotechnology. Technological Forecasting and Social Change, 77(1), 63–75.
Rodriguez, A., Kim, B., Lee, J.-M., Coh, B. Y., & Jeong, M. K. (2014). Graph kernel based centrality measure for evaluating patent influence. Technical report, Department of Industrial and System Engineering, Rutgers University.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of. Reading, MA: Addison-Wesley.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.
Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining (1st ed.). Boston, MA: Addison-Wesley Longman.
Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43(5), 1216–1247.
USPTO. (2014). Us patent full-text database number search. http://patft.uspto.gov/netahtml/pto/srchnum.htm.
von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 34(10), 1591–1607.
Wu, H.-C., Chen, H.-Y., Lee, K.-Y., & Liu, Y.-C. (2010). A method for assessing patent similarity using direct and indirect citation links. In 2010 IEEE international conference on industrial engineering and engineering management (IEEM) (pp. 149–152).
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37–50.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rodriguez, A., Kim, B., Turkoz, M. et al. New multi-stage similarity measure for calculation of pairwise patent similarity in a patent citation network. Scientometrics 103, 565–581 (2015). https://doi.org/10.1007/s11192-015-1531-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-015-1531-8