Skip to main content
Log in

New multi-stage similarity measure for calculation of pairwise patent similarity in a patent citation network

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Being able to effectively measure similarity between patents in a complex patent citation network is a crucial task in understanding patent relatedness. In the past, techniques such as text mining and keyword analysis have been applied for patent similarity calculation. The drawback of these approaches is that they depend on word choice and writing style of authors. Most existing graph-based approaches use common neighbor-based measures, which only consider direct adjacency. In this work we propose new similarity measures for patents in a patent citation network using only the patent citation network structure. The proposed similarity measures leverage direct and indirect co-citation links between patents. A challenge is when some patents receive a large number of citations, thus are considered more similar to many other patents in the patent citation network. To overcome this challenge, we propose a normalization technique to account for the case where some pairs are ranked very similar to each other because they both are cited by many other patents. We validate our proposed similarity measures using US class codes for US patents and the well-known Jaccard similarity index. Experiments show that the proposed methods perform well when compared to the Jaccard similarity index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Amancio, D. R., Oliveira, O. N, Jr, & Costa, L. F. (2012a). Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts. Physica A: Statistical Mechanics and its Applications, 391(18), 4406–4419.

    Article  Google Scholar 

  • Amancio, D. R., Oliveira, O. N, Jr, & Costa, L. F. (2012b). On the use of topological features and hierarchical characterization for disambiguating names in collaborative networks. EPL (Europhysics Letters), 99(4), 48002.

    Article  Google Scholar 

  • Atallah, G., & Rodriguez, G. (2006). Indirect patent citations. Scientometrics, 67(3), 437–465.

    Article  Google Scholar 

  • Breschi, S., Lissoni, F., & Malerba, F. (2003). Knowledge-relatedness in firm technological diversification. Research Policy, 32(1), 69–87.

    Article  Google Scholar 

  • Cascini, G., & Zini, M. (2008). Measuring patent similarity by comparing inventions functional trees. In G. Cascini (Ed.), Computer-Aided Innovation (CAI), volume 277 of The International Federation for Information Processing (pp. 31–42). USA: Springer.

    Google Scholar 

  • Cook, D. J., & Holder, L. B. (2006). Mining graph data. London: Wiley-Interscience.

    Book  Google Scholar 

  • Egghe, L., & Rousseau, R. (1990). Introduction to informetrics: Quantitative methods in library, documentation and information science. Elsevier Science Ltd.

  • Egghe, L., & Rousseau, R. (2002). Co-citation, bibliographic coupling and a characterization of lattice citation networks. Scientometrics, 55(3), 349–361.

    Article  Google Scholar 

  • Gnyawali, D. R., & Park, B.-J. R. (2011). Co-opetition between giants: Collaboration with competitors for technological innovation. Research Policy, 40(5), 650–663.

    Article  Google Scholar 

  • Gress, B. (2010). Properties of the uspto patent citation network: 1963–2002. World Patent Information, 32(1), 3–21.

    Article  Google Scholar 

  • Gualdi, S., Medo, M., & Zhang, Y.-C. (2011). Influence, originality and similarity in directed acyclic graphs. EPL (Europhysics Letters), 96(1), 18004.

    Article  Google Scholar 

  • Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10–25.

    Article  Google Scholar 

  • Kim, B., Gazzola, G., Lee, J.-M., Kim, D., Kim, K., & Jeong, M. K. (2014a). Inter-cluster connectivity analysis for technology opportunity discovery. Scientometrics, 98(3), 1811–1825.

    Article  Google Scholar 

  • Kim, E., Cho, Y., & Kim, W. (2014b). Dynamic patterns of technological convergence in printed electronics technologies: Patent citation network. Scientometrics, 98(2), 975–998.

    Article  Google Scholar 

  • Larkey, L. S. (1999). A patent search and classification system. In Proceedings of DL-99, 4th ACM conference on digital libraries (pp. 179–187). New York: ACM.

  • Lin, Y., Chen, J., & Chen, Y. (2011). Backbone of technology evolution in the modern era automobile industry: An analysis by the patents citation network. Journal of Systems Science and Systems Engineering, 20(4), 416–442.

    Article  Google Scholar 

  • Meng, B., Ke, H., & Yi, T. (2011). Link prediction based on a semi-local similarity index. Chinese Physics B, 20(12), 128902.

    Article  Google Scholar 

  • Moehrle, M. G., & Gerken, J. M. (2012). Measuring textual patent similarity on the basis of combined concepts: design decisions and their consequences. Scientometrics, 91(3), 805–826.

    Article  Google Scholar 

  • Narin, F. (1994). Patent bibliometrics. Scientometrics, 30(1), 147–155.

    Article  Google Scholar 

  • Newman, M. E. J. (2010). Networks: An Introduction. Oxford: Oxford University Press.

    Book  Google Scholar 

  • No, H. J., & Park, Y. (2010). Trajectory patterns of technology fusion: Trend analysis and taxonomical grouping in nanobiotechnology. Technological Forecasting and Social Change, 77(1), 63–75.

    Article  Google Scholar 

  • Rodriguez, A., Kim, B., Lee, J.-M., Coh, B. Y., & Jeong, M. K. (2014). Graph kernel based centrality measure for evaluating patent influence. Technical report, Department of Industrial and System Engineering, Rutgers University.

  • Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.

    Article  Google Scholar 

  • Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining (1st ed.). Boston, MA: Addison-Wesley Longman.

    Google Scholar 

  • Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43(5), 1216–1247.

    Article  Google Scholar 

  • USPTO. (2014). Us patent full-text database number search. http://patft.uspto.gov/netahtml/pto/srchnum.htm.

  • von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 34(10), 1591–1607.

    Article  Google Scholar 

  • Wu, H.-C., Chen, H.-Y., Lee, K.-Y., & Liu, Y.-C. (2010). A method for assessing patent similarity using direct and indirect citation links. In 2010 IEEE international conference on industrial engineering and engineering management (IEEM) (pp. 149–152).

  • Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37–50.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myong K. Jeong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodriguez, A., Kim, B., Turkoz, M. et al. New multi-stage similarity measure for calculation of pairwise patent similarity in a patent citation network. Scientometrics 103, 565–581 (2015). https://doi.org/10.1007/s11192-015-1531-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1531-8

Keywords

Navigation