Skip to main content

A Business Intelligence Tool for Explaining Similarity

  • Conference paper
  • First Online:
Model-Driven Organizational and Business Agility (MOBA 2022)

Abstract

Agile Business often requires to identify similar objects (firms, providers, end users, products) between an older business domain and a newer one. Data-driven tools for aggregating similar resources are nowadays often used in Business Intelligence applications, and a large majority of them involve Machine Learning techniques based on similarity metrics. However effective, the mathematics such tools are based on does not lend itself to human-readable explanations of their results, leaving a manager using them in a “take it as is”-or-not dilemma. To increase trust in such tools, we propose and implement a general method to explain the similarity of a given group of RDF resources. Our tool is based on the theory of Least Common Subsumers (LCS), and can be applied to every domain requiring the comparison of RDF resources, including business organizations. Given a set of RDF resources found to be similar by Data-driven tools, we first compute the LCS of the resources, which is a generic RDF resource describing the features shared by the group recursively—i.e., at any depth in feature paths. Subsequently, we translate the LCS in English common language. Being agnostic to the aggregation criteria, our implementation can be pipelined with every other aggregation tool. To prove this, we cascade an implementation of our method to (i) the comparison of contracting processes in Public Procurement (using TheyBuyForYou), and (ii) the comparison and clustering of drugs (using k-Means) in Drugbank. For both applications, we present a fairly readable description of the commonalities of the cluster given as input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.globalpublicprocurementdata.org/gppd/.

  2. 2.

    https://www.open-contracting.org/2014/04/30/comparing-contract-data-understanding-supply/.

  3. 3.

    https://www.gateshead.gov.uk.

  4. 4.

    https://www.rksk.dk.

  5. 5.

    http://www.azuaga.es.

  6. 6.

    https://www.drugs.com/compare/.

  7. 7.

    https://www.webmd.com/drugs/compare.

  8. 8.

    https://www.wolterskluwer.com/en/solutions/lexicomp/resources/facts-comparisons-user-academy/drug-comparisons.

  9. 9.

    https://old.datahub.io/dataset/fu-berlin-drugbank.

  10. 10.

    https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

References

  1. Bouayad-Agha, N., Casamayor, G., Wanner, L.: Natural language generation in the context of the semantic web. Semant. Web 5(6), 493–513 (2014)

    Article  Google Scholar 

  2. Cimiano, P., Lüker, J., Nagel, D., Unger, C.: Exploiting ontology lexica for generating natural language texts from RDF data. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 10–19. Association for Computational Linguistics, August 2013. https://aclanthology.org/W13-2102

  3. Colin, E., Gardent, C., M’rabet, Y., Narayan, S., Perez-Beltrachini, L.: The webNLG challenge: generating text from DBpedia data. In: Proceedings of the 9th International Natural Language Generation Conference, pp. 163–167 (2016)

    Google Scholar 

  4. Colucci, S., Donini, F., Giannini, S., Di Sciascio, E.: Defining and computing least common subsumers in RDF. Web Semant. Sci. Serv. Agents World Wide Web 39, 62–80 (2016)

    Article  Google Scholar 

  5. Colucci, S., Donini, F.M., Di Sciascio, E.: Common subsumbers in RDF. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds.) AI*IA 2013. LNCS (LNAI), vol. 8249, pp. 348–359. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03524-6_30

    Chapter  Google Scholar 

  6. Colucci, S., Giannini, S., Donini, F.M., Di Sciascio, E.: A deductive approach to the identification and description of clusters in linked open data. In: Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014). IOS Press (2014)

    Google Scholar 

  7. Ghosal, A., Nandy, A., Das, A.K., Goswami, S., Panday, M.: A short review on different clustering techniques and their applications. In: Mandal, J.K., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics. AISC, vol. 937, pp. 69–83. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_9

    Chapter  Google Scholar 

  8. Hayes, P., Patel-Schneider, P.F.: RDF 1.1 semantics, W3C recommendation (2014). www.w3.org/TR/2014/REC-rdf11-mt-20140225/

  9. Huang, L., Luo, H., Li, S., Wu, F.X., Wang, J.: Drug-drug similarity measure and its applications. Briefings Bioinform. 22(4) (2020)

    Google Scholar 

  10. Li, J., Zhang, Y., Qian, C., Ma, S., Zhang, G.: Research on recommendation and interaction strategies based on resource similarity in the manufacturing ecosystem. Adv. Eng. Inform. 46, 101183 (2020). www.sciencedirect.com/science/article/pii/S1474034620301543

  11. Li, J., et al.: Neural entity summarization with joint encoding and weak supervision. In: Bessiere, C. (ed.) Proceedings of IJCAI-2020, pp. 1644–1650. ijcai.org (2020). https://doi.org/10.24963/ijcai.2020/228

  12. Michalski, R.S.: Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. Int. J. Policy Anal. Inf. Syst. 4, 219–244 (1980)

    Google Scholar 

  13. Pérez-Suárez, A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A review of conceptual clustering algorithms. Artif. Intell. Rev. 52(2), 1267–1296 (2019). https://doi.org/10.1007/s10462-018-9627-1

    Article  Google Scholar 

  14. Saxena, A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)

    Article  Google Scholar 

  15. Soylu, A., et al.: TheyBuyForYou platform and knowledge graph: expanding horizons in public procurement with open linked data. Semant. Web 13(2), 265–291 (2022)

    Article  Google Scholar 

  16. Soylu, A., et al.: Towards an ontology for public procurement based on the open contracting data standard. In: Pappas, I.O., Mikalef, P., Dwivedi, Y.K., Jaccheri, L., Krogstie, J., Mäntymäki, M. (eds.) I3E 2019. LNCS, vol. 11701, pp. 230–237. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29374-1_19

    Chapter  Google Scholar 

  17. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS 2014, vol. 2, pp. 3104–3112, Cambridge, MA, USA. MIT Press (2014)

    Google Scholar 

  18. Vougiouklis, P., et al.: Neural Wikipedian: generating textual summaries from knowledge base triples. J. Web Semant. 52–53, 1–15 (2018). www.sciencedirect.com/science/article/pii/S1570826818300313

  19. Wishart, D.S., et al.: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36(suppl 1), D901–D906 (2008)

    Article  Google Scholar 

  20. Yu, Y., Umashankar, N., Rao, V.R.: Choosing the right target: relative preferences for resource similarity and complementarity in acquisition choice. Strat. Manag. J. 37(8), 1808–1825 (2016). https://onlinelibrary.wiley.com/doi/abs/10.1002/smj.2416

  21. Zhou, G., Lampouras, G.: WebNLG challenge 2020: language agnostic delexicalisation for multilingual RDF-to-text generation. In: Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Dublin, Ireland (Virtual), pp. 186–191. Association for Computational Linguistics, December 2020. https://aclanthology.org/2020.webnlg-1.22

Download references

Acknowledgements

Projects Regione Lazio-DTC/“SanLo” (CUP F85F21001090003) and MISE (FSC 2014–2020)/“BARIUM5G” (CUP D94I20000160002) partially supported this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simona Colucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Colucci, S., Donini, F.M., Iurilli, N., Di Sciascio, E. (2022). A Business Intelligence Tool for Explaining Similarity. In: Babkin, E., Barjis, J., Malyzhenkov, P., Merunka, V. (eds) Model-Driven Organizational and Business Agility. MOBA 2022. Lecture Notes in Business Information Processing, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-17728-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17728-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17727-9

  • Online ISBN: 978-3-031-17728-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics