Abstract
Data mining (DM) models are knowledge-intensive information products that enable knowledge creation and discovery. As large volume of data is generated with high velocity from a variety of sources, there is a pressing need to place DM model selection and self-service knowledge discovery in the hands of the business users. However, existing knowledge discovery and data mining (KDDM) approaches do not sufficiently address key elements of data mining model management (DMMM) such as model sharing, selection and reuse. Furthermore, they are mainly from a knowledge engineer’s perspective, while the business requirements from business users are often lost. To bridge these semantic gaps, we propose an ontology-based DMMM approach for self-service model selection and knowledge discovery. We develop a DM3 ontology to translate the business requirements into model selection criteria and measurements, provide a detailed deployment architecture for its integration within an organization’s KDDM application, and use the example of a student loan company to demonstrate the utility of the DM3.
Similar content being viewed by others
Notes
The DM3 is available at http://128.172.188.35:8080/webprotege.
References
Alavi, M., & Leidner, D. E. (2001). Review: knowledge management and knowledge management systems: conceptual foundations and research issues. MIS Quarterly, 25(1), 107–136.
Baader, F. (2003). The description logic handbook: Theory, implementation, and applications. Cambridge University Press.
Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G., & Summers, E. (2013). Key choices in the design of simple knowledge organization system (SKOS). Web Semantics: Science, Services and Agents on the World Wide Web, 20, 35–49.
Basili, V.R., Caldiera, G., & Rombach, H.D. (1994). Goal question metrics paradigm. In Encyclopedia of Software Engineering (vol. 12, pp. 528–532).
Bernstein, P. A., & Melnik, S. (2007). Model management 2.0: manipulating richer mappings. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (pp. 1–12). ACM.
Berry, M.J., & Linoff, G.S. (2004). Data mining techniques: For marketing, sales, and customer relationship management. Wiley Computer Publishing.
Bouamrane, M.-M., Rector, A., & Hurrell, M. (2009). Development of an ontology for a preoperative risk assessment clinical decision support system. In Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, Albuquerque, NM, USA (pp. 1–6).
Brezany, P., Buil, C., Janciak, I., & Pllana, S. (2009). ADMIRE D1.2 - DMI model, language and ontology. the ADMIRE Project: The University of Vienna and Others within the ADMIRE Project.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., et al. (2000). CRISP-DM 1.0. CRISP-DM Consortium.
Charest, M., Delisle, S., Cervantes, O., & Shen, Y. (2008). Bridging the gap between data mining and decision support: a case-based reasoning and ontology approach. Intelligent Data Analysis, 12(2), 211–236.
Chen, Y. J. (2010). Development of a method for ontology-based empirical knowledge representation and reasoning. Decision Support Systems, 50(1), 1–20.
Chen, C. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: a survey on Big data. Information Sciences, 275, 314–347.
Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business intelligence and analytics: from big data to big impact. MIS Quarterly, 36(4), 1165–1188.
Choinski, M., & Chudziak, J.A. (2009). Ontological learning assistant for knowledge discovery and data mining. In International Multiconference on Computer Science and Information Technology (IMCSIT’09), Mrągowo, Poland (pp. 147–155). IEEE.
Data Mining Group (2014). PMML 4.2 - general structure. http://dmg.org/pmml/v4-2-1/GeneralStructure.html. Accessed 02/10 2016.
Davenport, T. H. (2006). Competing on analytics. Harvard Business Review, 84(1), 98.
Devedzić, V. (2002). Understanding ontological engineering. Communications of the ACM, 45(4), 136–144.
Diamantini, C., Potena, D., & Storti, E. (2013). A virtual mart for knowledge discovery in databases. Information Systems Frontiers, 15(3), 447–463.
Ding, Y., & Foo, S. (2002). Ontology research and development. Part 1-a review of ontology generation. Journal of Information Science, 28(2), 123–136.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27–34.
Fernández López, M., Gómez-Pérez, A., Pazos Sierra, A., & Pazos Sierra, J. (1999). Building a chemical ontology using methontology and the ontology design environment
Gangemi, A., Catenacci, C., Ciaramita, M., & Lehmann, J. (2006). Modelling ontology evaluation and validation. In The Semantic Web: Research and Applications (pp. 140–154. Springer.
Gartner, I. (2013). Gartner IT glossary. Technology Research.
Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing? International Journal of Human-Computer Studies, 43(5), 907–928.
Grüninger, M., & Fox, M.S. (1995). Methodology for the design and evaluation of ontologies. In Workshop on Basic Ontological Issues in Knowledge Sharing. (pp. 1–10).
Haley, A., & Zweben, S. (1984). Development and application of a white box approach to integration testing. Journal of Systems and Software, 4(4), 309–315.
Heras, S., Botti, V., & Julián, V. (2014). An ontological-based knowledge-representation formalism for case-based argumentation. Information Systems Frontiers, 17(4), 779–798.
Hermida, J. M., Meliá, S., Montoyo, A., & Gómez, J. (2013). Applying model-driven engineering to the development of Rich internet applications for business intelligence. Information Systems Frontiers, 15(3), 411–431.
Hevner, A. R., March, S. T., & Park, J. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105.
Hilario, M., Kalousis, A., Nguyen, P., & Woznica, A. (2009). A data mining ontology for algorithm selection and meta-mining. In ECML/PKDD09 Workshop on 3rd generation Data Mining (SoKD-09) (pp. 76–87).
Horrocks, I., Parsia, B., & Sattler, U. (2012). OWL 2 web ontology language direct semantics (2nd Edn). http://www.w3.org/TR/owl2-direct-semantics/. Accessed 12 August 2015.
kdnuggets.com (2014). CRISP-DM, still the top methodology for analytics, data mining, or data science projects. http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html. Accessed 02/10 2016.
Kietz, J.-U., Serban, F., & Bernstein, A. (2010). eProPlan : a tool to model automatic generation of data mining workflows. In ECML Workshop on third generation data mining: Towards service-oriented knowledge discovery (SoKD-2010), Barcelona, Spain.
Kimball, R., & Ross, M. (2011). The data warehouse toolkit: The complete guide to dimensional modeling. Wiley.
Leavitt, N. (2002). Data mining for the corporate masses? Computer, 35(5), 22–24.
Liu, B., & Tuzhilin, A. (2008). Managing large collections of data mining models. Communications of the ACM, 51(2), 85–89.
Maedche, A., & Staab, S. (2001). Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 72–79.
Marbán, Ó., Mariscal, G., Menasalvas, E., & Segovia, J. (2007). An engineering approach to data mining projects. In H. Yin, P. Tino, E. Corchado, W. Byrne, & X. Yao (Eds.), Intelligent data engineering and automated learning—IDEAL 2007 (vol. 4881, pp. 578–588, Lecture Notes in Computer Science). Springer Berlin Heidelberg.
Mariscal, G., Marbán, Ó., & Fernández, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. Knowledge Engineering Review, 25(2), 137.
Muhanna, W. A., & Pick, R. A. (1994). Meta-modeling concepts and tools for model management: a systems approach. Management Science, 40(9), 1093–1123.
Noy, N.F., & McGuinness, D.L. (2001). Ontology development 101: A guide to creating your first ontology. Stanford knowledge systems laboratory technical report KSL-01-05 and Stanford medical informatics technical report SMI-2001-0880.
Osei-Bryson, K.-M. (2004). Evaluation of decision trees: a multi-criteria approach. Computers & Operations Research, 31(11), 1933–1945.
Panov, P., Dzeroski, S., & Soldatova, L. (2008). OntoDM: An ontology of data mining. In IEEE International Conference on Data Mining Workshops, 2008 (ICDMW’08) Pisa, Italy, 2008 (pp. 752–760). IEEE.
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.
Protégé (2007). http://protege.stanford.edu/. Accessed 02/10 2016.
RacerPro (2012). Protégé 4.x Reasoner Plugin for RacerPro. http://www1.racer-systems.com/products/racerpro/index.phtml. Accessed 09/30 2015.
Rohanizadeh, S.S., & Moghadam, M.B. (2009). A proposed data mining methodology and its application to industrial procedures. Journal of Industrial Engineering.
Schwartz, D. G. (2003). From open IS semantics to the semantic web: the road ahead. IEEE Intelligent Systems, 18(3), 52–58.
Sharma, S., Osei-Bryson, K.-M., & Kasper, G. M. (2012). Evaluation of an integrated knowledge discovery and data mining process model. Expert Systems with Applications, 39(13), 11335–11348.
Sun, L., Ousmanou, K., & Cross, M. (2008). An ontological modelling of user requirements for personalised information provision. Information Systems Frontiers, 12(3), 337–356.
Tudorache, T., Vendetti, J., & Noy, N.F. (2008). Web-Protege: A lightweight OWL ontology editor for the Web. In OWLED, (vol. 432).
Uschold, M., & Gruninger, M. (1996). Ontologies: principles, methods and applications. The Knowledge Engineering Review, 11(02), 93–136.
Van Solingen, R., Basili, V., Caldiera, G., & Rombach, H.D. (2002). Goal question metric (gqm) approach. Encyclopedia of Software Engineering.
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77–95.
W3C OWL Working Group (2012). OWL 2 web ontology language document overview. https://www.w3.org/TR/owl2-overview/. Accessed 02/10 2016.
Yu, J., Thom, J. A., & Tam, A. (2009). Requirements-oriented methodology for evaluating ontologies. Information Systems, 34(8), 766–791.
Zack, M., McKeen, J., & Singh, S. (2009). Knowledge management and organizational performance: an exploratory analysis. Journal of Knowledge Management, 13(6), 392–409.
Zorrilla, M., & García-Saiz, D. (2013). A service oriented architecture to provide data mining services for non-expert data miners. Decision Support Systems, 55(1), 399–411.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Li, Y., Thomas, M.A. & Osei-Bryson, KM. Ontology-based data mining model management for self-service knowledge discovery. Inf Syst Front 19, 925–943 (2017). https://doi.org/10.1007/s10796-016-9637-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-016-9637-y