Improving the quality of data models: empirical validation of a quality management framework

doi:10.1016/S0306-4379(02)00043-1

Information Systems

Volume 28, Issue 6, September 2003, Pages 619-650

https://doi.org/10.1016/S0306-4379(02)00043-1 Get rights and content

Abstract

This paper describes the results of a 5-year research programme into evaluating and improving the quality of data models. The theoretical base for this work was a data model quality management framework proposed by Moody and Shanks (In: P. Loucopolous (Ed.), Proceedings of the 13th International Conference on the Entity Relationship Approach, Manchester, England, December 14–17, 1994). A combination of field and laboratory research methods (action research, laboratory experiments and systems development) was used to empirically validate the framework. This paper describes how the framework was used to: (a) quality assure a data model in a large application development project (product quality); (b) reengineer application development processes to build quality into the data analysis process (process quality); (c) investigate differences between data models produced by experts and novices; (d) provide automated support for the evaluation process (the Data Model Quality Advisor). The results of the research have been used to refine and extend the framework, to the point that it is now a stable and mature approach.

Introduction

The choice of an appropriate representation of data is one of the most crucial tasks in information systems development. Although data modelling represents only a small proportion of the total systems development effort, its impact on the quality of the final system is probably greater than any other phase [1]. The data model is a major determinant of system development costs [2], system flexibility [3], integration with other systems [4] and the ability of the system to meet user requirements [5].

The traditional thrust of software quality assurance has been to use “brute force” testing at the end of development [6]. However, Total Quality Management (TQM) approaches suggest that it is faster and cheaper to concentrate effort during the early development phases of a product, in order to detect and correct defects as early as possible [7]. According to Boehm [8], relative to removing a defect discovered during the requirements stage, removing the same defect costs on average 3.5 times more during design, 50 times more at the implementation stage, and 170 times more after delivery (Fig. 1). Empirical studies have shown that moving quality assurance effort up to the early phases of development can be 33 times more cost effective than testing done at the end of development [9].

This suggests that substantially more effort should be spent during early development phases to catch defects when they occur, or to prevent them from occurring altogether. However, it is during analysis that the notion of software development as a craft rather than an engineering discipline is strongest, and quality is therefore most difficult to assess. There are relatively few guidelines for evaluating the quality of data models, and little agreement even among experts as to what makes a “good” data model. As a result, the quality of data models produced in practice is almost entirely dependent on the competence of the data modeller [10], [11].

In the quality management literature, the distinction is frequently made between product and process quality [12]:

•
Product quality focuses on the characteristics of the product. Product quality criteria are used to carry out inspections of the finished product and detect and correct defects. This is the traditional approach to quality assurance.
•
Process quality focuses on the process used to produce the product. The objective is to build quality into the production process rather than trying to add it in at the end through reviews and inspections of the finished product. The focus of process quality is on defect prevention rather than detection, and aims to reduce reliance on mass inspections as a way of achieving quality [13]. This is the TQM approach to quality assurance.

In the context of data modelling, product quality is concerned with evaluating and improving the quality of the data model (the product) while process quality is concerned with improving the data analysis process (the production process) (see Fig. 2). Product quality is most important in the context of an individual project—it is important to ensure that the data model is free of defects so that a database can be built which will meet user requirements. However process quality is more important in the wider organisational context: to improve the organisation's ability to efficiently deliver high quality information systems.

Previous research on data model quality has focused almost exclusively on product quality. A summary of approaches to quality in data modelling is shown in Table 1.

The simplest type of quality evaluation approach is where quality is defined as a list of desirable properties of a data model (e.g. [1], [14], [15], [16]). Such lists provide a useful starting point for understanding and evaluating quality in data models, but are mostly unstructured, use imprecise definitions, often overlap, and properties of models are often confused with language and method properties [17].

More comprehensive approaches to quality evaluation develop theoretical frameworks which define the key concepts underlying data model quality. Lindland et al. [17] define a framework based on semiotic theory, which defines a conceptual model as a set of statements in a language. For each semiotic level (syntactic, semantic, pragmatic) the framework defines quality goals and means to achieve them. Krogstie et al. [10] extend the framework to include a fourth semiotic level: the social level. These frameworks apply to conceptual models generally, not just data models. Kesh [18] develops a framework for evaluating data models based on ontological concepts. This framework defines criteria and metrics for evaluating the quality of data models.

The most serious deficiencies in the existing literature are:

•
None of the approaches have been empirically validated in practice: all are either justified based on theory or the author(s)’ experience. Theoretical justification is limited because methods have no “truth” value—the validity of a method is an empirical rather than a theoretical question [19], [20]. Experiential justification is also limited because personal experience is subject to bias. Also, a method which works well for one person may not work for another [21].
•
None of the approaches adequately addresses the issue of process quality: they define criteria and, in some cases, measures for evaluating the quality of data models (error detection) but not how to develop models in a high quality manner (error prevention).

Both of these issues are addressed in this paper.

The structure of the paper is:

•
Section 2 describes the quality management framework used as the theoretical basis for this research—this represents the a priori theory being tested.
•
Section 3 outlines the research methodology used to validate the framework.
•
Section 4 describes how the framework was used to quality assure a data model for an application development project as part of an action research study (product quality).
•
Section 5 describes how the framework was used to re-engineer the analysis process in an organisation as part of a longitudinal action research study (process quality).
•
Section 6 describes how the framework was used to analyse differences in the quality of models produced by expert and novice data modellers using a laboratory experiment.
•
Section 7 describes how the framework was used to provide automated support for the evaluation process (the Data Model Quality Advisor), and analyses its effectiveness using a laboratory experiment.
•
Section 8 summarises the research findings and their implications for research and practice.

Section snippets

Overview of the framework

The quality management framework used as the basis for this research is defined by the Entity Relationship model in Fig. 3 [11]. This represents the a priori theory being tested by this research. The purpose of the framework is to evaluate and improve the quality of application data models. The framework consists of five major constructs, each of which is shown as a separate entity in Fig. 3:

•
Quality factors define the characteristics of a data model that determine its overall quality. A

Validation of IS design methods

The question of how to validate IS design methods has been a longstanding issue in the IS field (e.g. [19], [21], [24], [25], [26], [27], [28]). There are inherent problems evaluating any methodology or design technique since there is typically no theory, no hypotheses, no experimental design and no data analysis to which traditional evaluation criteria can be applied [28].

As a result, IS design research tends to emphasise the development of new design methods and frameworks, while addressing

Action research study I: product quality

This section describes how the framework was used to evaluate and improve the quality of a data model in a large application development project. This was the first real world application of the framework. In this case, the framework is used to improve product quality.

Action research study II: process quality

This section describes how the framework was used to improve the process of developing data models as part of a longitudinal action research study in a single organisation [31]. One of the principles of TQM is that the most effective way to improve the quality of a product is to improve the process by which it is developed [13]. This was also one of the major findings from the first action research study.

Analysis of differences between expert and novice data modellers

This section describes how the framework was used to investigate differences in models produced by expert and novice data modellers. This study focuses on product quality, as the framework is used to evaluate the quality of models produced by experimental subjects.

Systems development as a research method

Systems development is a research method in which scientific knowledge is used to produce devices, systems or methods including design and development of prototypes [98]. In this approach, theory is used to develop a prototype system, which is then used to test the theory. It thus provides a way of linking basic and applied research [99]. According to Nunamaker et al. [37]:

The development of a method or system can provide a perfectly acceptable piece of evidence (an artifact) in support of a

Conclusion

This paper has described how the data model quality evaluation framework proposed by Moody and Shanks [11] has been validated using a variety of research methods. Experiences in practice have been used to refine the framework using an action research approach. The paper describes how the framework has been used to:

(a)
Quality assure individual data models as part of application development projects (product quality),
(b)
reengineer application development procedures to build quality into the data

References (110)

S Kesh
Evaluating the quality of entity relationship models
Information Software Technol.
(1995)
P Shoval et al.
Entity-relationship and object-oriented data modelingan experimental comparison of design quality
Data Knowledge Eng.
(1997)
D Batra et al.
Conceptual data modelling in database designsimilarities and differences between expert and novice designers
Int. J. Man-Mach. Stud.
(1992)
G.C. Witt, G.C. Simsion, Data Modeling Essentials: Analysis, Design, and Innovation, The Coriolis Group,...
Asma, ASMA Project Database Release 7.0, ASMA (Australian Software Metrics Association), P.O. Box 1287, Box Hill,...
Gartner Research, Sometimes You Gotta Break the Rules. Gartner Group Strategic Management Series Key Issues, November...
D.L Moody et al.
Justifying investment in information resource management
Aust. J. Inform. Systems
(1995)
R.D Banker et al.
Reuse and productivity in integrated computer aided software engineeringan empirical study
MIS Q.
(1991)
J.C van Vliet
Software EngineeringPrinciples and Practice
(1993)
R.E. Zultner, The Deming way: total quality management for software, Proceedings of Total Quality Management for...

B.W Boehm

Software Engineering Economics

(1981)

C Walrad et al.

Measurementthe key to application development quality

IBM Systems J.

(1993)

J. Krogstie, O.I. Lindland, G. Sindre, Towards a deeper understanding of quality in requirements engineering,...

D.L. Moody, G.G. Shanks, What makes a good data model? Evaluating the quality of entity relationship models, in: P....

J.R. Evans, W.M. Lindsay, The Management and Control of Quality, 5th Edition, South-Western (Thomson Learning),...

W.E Deming

Out of the Crisis

(1986)

B. von Halle, Data: asset or liability? Database Programming Design 4(7) (1991)...

C Batini et al.

Conceptual Database DesignAn Entity Relationship Approach

(1992)

A. Levitin, T. Redman, Quality dimensions of a conceptual view, Inform. Process. Manage. 31(1)...

O.I Lindland et al.

Understanding quality in conceptual modelling

IEEE Software

(1994)

J Ivari

Dimensions of information systems designa framework for a long range research program

Inform. Systems J.

(1986)

N Rescher

Methodological PragmatismSystems-Theoretic Approach to the Theory of Knowledge

(1977)

J.L Wynekoop et al.

Studying systems development methodologiesan examination of research methods

Inform. Systems J.

(1997)

M Gibbons et al.

The New Production of KnowledgeThe Dynamics of Science and Research in Contemporary Societies

(1994)

D.L. Moody, Metrics for evaluating the quality of entity relationship models, in: T.W. Ling, S. Ram, M.L. Lee (Eds.),...

T.W. Olle, H.G. Sol, A.A. Verrijn-Stuart (Eds.), Information Systems Design Methodologies: A Comparative Review,...

T.W. Olle, H.G. Sol, C.J. Tully (Eds.), Information Systems Design Methodologies: A Feature Analysis, North-Holland,...

T.W. Olle, H.G. Sol, A.A. Verrijn-Stuart (Eds.), Information Systems Design Methodologies: Improving the Practice,...

G. Fitzgerald, in: H.E. Nissen, H.K. Klein, R. Hirschheim (Eds.), Validating New Information Systems Techniques: A...

R.A. Weber, Ontological Foundations of Information Systems, Coopers and Lybrand Accounting Research Methodology...

J.A. Bubenko, in: T.W. Olle, H.G. Sol, A.A. Verrijn-Stuart (Eds.), Information Systems Methodologies—A Research View,...

B. Curtis, in: E. Soloway, S. Iyengar, (Eds.), By The Way, Did Anyone Study Any Real Programmers? Empirical Studies of...

D.L Moody et al.

Evaluating and improving the quality of entity relationship modelsan action research programme

Aust. Comput. J.

(1998)

C Westrup

Information systems methodologies in use

J. Inform. Technol.

(1993)

N Rescher

Cognitive Systematization

(1979)

R.L Baskerville et al.

A critical perspective on action research as a method for information systems research

J. Inform. Technol.

(1996)

R.D. Galliers, in: H.E. Nissen, H.K. Klein, R. Hirschheim, (Eds.), Choosing Information Systems Research Approaches,...

R.D Galliers

Information Systems ResearchIssues, Methods and Practical Guidelines

(1992)

J Nunamaker et al.

Systems development in information systems research

J. Manage. Inform. Systems

(1991)

G. Shanks, A. Rouse, D. Arnott, A review of approaches to research and scholarship in information systems, Proceedings...

T.D Jick

Mixing qualitative and quantitative methodstriangulation in action

Administrative Sci. Q.

(1979)

B Kaplan et al.

Combining qualitative and quantitative methods in information systems researcha case study

MIS Q.

(1988)

A Lee

Integrating positivist and interpretivist approaches to organisational research

Organ. Sci.

(1991)

W.L. Neuman, Social Research Methods—Qualitative and Quantitative Approaches, 4th Edition, Allyn and Bacon, Needham...

D Avison et al.

Action Research

Comm. ACM

(1999)

T.L Baker

Doing Social Research

(1998)

J Mckernan

Curriculum Action ResearchA Handbook of Methods and Resources for the Reflective Practitioner

(1991)

E.T Stringer

Action Research—A Handbook for Practitioners

(1996)

J. Masters, The history of action research, in: I. Hughes (Ed.), Action Research Electronic Reader (on-line),...

B. Dick, A beginner's guide to action research [On line], in: B. Dick, R. Passfield, P. Wildman, (Eds.), Action...

Cited by (182)

Evaluation of OMOP CDM, i2b2 and ICGC ARGO for supporting data harmonization in a breast cancer use case of a multicentric European AI project
2023, Journal of Biomedical Informatics
Observational research in cancer poses great challenges regarding adequate data sharing and consolidation based on a homogeneous data semantic base. Common Data Models (CDMs) can help consolidate health data repositories from different institutions minimizing loss of meaning by organizing data into a standard structure. This study aims to evaluate the performance of the Observational Medical Outcomes Partnership (OMOP) CDM, Informatics for Integrating Biology & the Bedside (i2b2) and International Cancer Genome Consortium, Accelerating Research in Genomic Oncology (ICGC ARGO) for representing non-imaging data in a breast cancer use case of EuCanImage.
We used ontologies to represent metamodels of OMOP, i2b2, and ICGC ARGO and variables used in a cancer use case of a European AI project. We selected four evaluation criteria for the CDMs adapted from previous research: content coverage, simplicity, integration, implementability.
i2b2 and OMOP exhibited higher element completeness (100% each) than ICGC ARGO (58.1%), while the three achieved 100% domain completeness. ICGC ARGO normalizes only one of our variables with a standard terminology, while i2b2 and OMOP use standardized vocabularies for all of them. In terms of simplicity, ICGC ARGO and i2b2 proved to be simpler both in terms of ontological model (276 and 175 elements, respectively) and in the queries (7 and 20 lines of code, respectively), while OMOP required a much more complex ontological model (615 elements) and queries similar to those of i2b2 (20 lines). Regarding implementability, OMOP had the highest number of mentions in articles in PubMed (1 3 0) and Google Scholar (1,810), ICGC ARGO had the highest number of updates to the CDM since 2020 (4), and i2b2 is the model with more tools specifically developed for the CDM (26).
ICGC ARGO proved to be rigid and very limited in the representation of oncologic concepts, while i2b2 and OMOP showed a very good performance. i2b2′s lack of a common dictionary hinders its scalability, requiring sites that will share data to explicitly define a conceptual framework, and suggesting that OMOP and its Oncology extension could be the more suitable choice. Future research employing these CDMs with actual datasets is needed.
Machine and human roles for mitigation of misinformation harms during crises: An activity theory conceptualization and validation
2023, International Journal of Information Management
During crises, there is a need for a large amount of information in a short period. Such need creates the base for misinformation to spread within and outside the affected community. This may result in misinformation harms that can generate serious short term or long-term consequences. In such situations, there is a need for a joint human-machine effort to mitigate misinformation. Though there has been research in the area of management of AI in the recent past, there has been scarce work in examining situations where machines and humans interact for mitigating misinformation. In order to systematically analyze misinformation and suggest mechanisms for mitigation, we draw on Activity Theory to conceptualize a suitable framework. Such a framework will enable investigating human-machine interactions through loops of “misinformation generation” and “misinformation mitigation” activities for mitigating misinformation harms. The paper also validates the framework using three different target audiences, undergraduates, graduates and professionals.
A decision support tool for evaluating the wildlife corridor design and conservation performance using analytic network process (ANP)
2022, Journal for Nature Conservation
Citation Excerpt :
After developing the WCDC index assessment model, the study conducted the model verification. Indirect observation and mixed empirical investigation (coupling qualitative and quantitative empirical data) are the most practical techniques for model verification (Flynn et al., 1990; Moody and Shanks, 2003; Tracey, 2009). Accordingly, this study has verified the model through empirical research and WSM.
A recent issue highlights the need for a wildlife corridor to mitigate habitat fragmentation and biodiversity degradation caused by intensive urbanization in the country, such as the development of infrastructure, roads, and highways. Indeed, the issues of frequent road and vehicle-animals collisions are ever-increasing in urban corridors. As a result, flora and fauna populations can become isolated, leading to an unbalanced ecosystem. This study developed the Wildlife Corridor Design and Conservation (WCDC) index, assessment model. The study has investigated the wildlife corridor design and conservation features through a systematic literature review, clustered them into three criteria (biophysical, structural, and environmental designs) and twenty-one sub-criteria. The study applied the Analytic Network Process method to measure the weights of features; habitat preservation (W_C1.1 = 0.280), species varieties control (W_C1.2.=0187), and conduit for movement (W_C1.3. = 0.100) play critical roles in wildlife corridor design and conservation. The Weighted Sum Method verified the WCDC model through a case study (Red Earth overpass, Canada). The model assigns certification labels (Gold, Silver, Bronze, or not-certified); accordingly, it ranked the evaluated site as Gold, which means the Red Earth overpass performs intimately in biophysical, structural, and environmental preservation of mammal habitat. The study conducted regression analysis for model validation, determining a significant association between the base and case-study models.
Health data standards’ limitations
2022, Roadmap to Successful Digital Health Ecosystems: A Global Perspective
Data represent foundational assets of any healthcare delivery system. Clinical data form the basis of electronic communications from point of data collection to storage and archiving. Computers cannot handle ambivalence, hence the need for the widespread adoption of technical and terminology standards. Many domain ontologies and terminologies, developed to suit a variety of different purposes well before this digital era, are reviewed and examined to determine their usability within a digital ecosystem. The ontological data modelling approach was found to result in the highest degrees of expressivity and formalism available today. Resulting artefacts linked to standard value sets were found to be most comprehensive with their ability to best represent data for a lifetime support, patient safety, and electronic communication. Many issues and limitations, such as variations regarding design principles used, overlaps, and shortcomings, are identified and discussed. There is a need for a major globally led transformation.
A model for evaluating green credit rating and its impact on sustainability performance
2021, Journal of Cleaner Production
Citation Excerpt :
Consequently, the study combined GRA to evaluate the degree of interrelation among the criterion of GCP, DEMATEL to analyze the preferences among the criterion of GCP, ANP to obtain the final weights for the GCP criterion, to attain optimal SCM in manufacturing solution with the integration of neutrosophic theory to evaluate the conditions of GCP and recommends the optimal SCM in manufacturing alternatives. Moody and Shanks (2003) applied the data model quality function to validate the framework. The study evaluated N-MCDMF according to eight factors illustrated by Moody and Shanks (2003).
The development of economic activities and social progress index leads to the governmental considerations for the environmental challenge’s issues. The Green Credit Policy (GCP) in China for manufacturing, as a part of a sustainable finance package, initiatives restrictions with suppliers to reduce harmful pollution for the environment. The study mainly validates the impact of GCP on manufacturing for diminishing the emerged pollution to the environment. The study develops Neutrosophic Multiple-Criteria Decision-Making Framework (N-MCDMF) according to neutrosophic theory and various MCDM methods of grey relational analysis (GRA), analytic network process (ANP), the Decision-Making Trial and Evaluation Laboratory technique (DEMATEL), and the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to support the decision-makers with highly systematic procedures in the uncertain and inconsistent environmental conditions. The N-MCDMF evaluates the conditions of GCP and recommends the optimal Supply Chain Management (SCM) in manufacturing alternatives. A case study is presented for the validation of the issues of applicability and flexibility for the proposed N-MCDMF. The results obtained from the implementation of the N-MCDMF indicates the applicability and flexibility of the proposed approach. In addition, results show that SCM in manufacturing can provide more cooperation for the environment to reduce harmful pollution and to attain sustainability for achieving motivations under the restrictions of GCP.
Monitoring and Evaluating School-University Partnerships: Insights from the Usability Assessment and Refinement of a Conceptual Framework
2024, SSRN

View all citing articles on Scopus

^☆: Recommended by Professor P. Loucopoulos.

View full text

Improving the quality of data models: empirical validation of a quality management framework☆

Abstract

Introduction

Section snippets

Overview of the framework

Validation of IS design methods

Action research study I: product quality

Action research study II: process quality

Analysis of differences between expert and novice data modellers

Systems development as a research method

Conclusion

Information Software Technol.

Data Knowledge Eng.

Int. J. Man-Mach. Stud.

Justifying investment in information resource management

Aust. J. Inform. Systems

Reuse and productivity in integrated computer aided software engineeringan empirical study

MIS Q.

Software EngineeringPrinciples and Practice

Software Engineering Economics

Measurementthe key to application development quality

IBM Systems J.

Out of the Crisis

Conceptual Database DesignAn Entity Relationship Approach

Understanding quality in conceptual modelling

IEEE Software

Dimensions of information systems designa framework for a long range research program

Inform. Systems J.

Methodological PragmatismSystems-Theoretic Approach to the Theory of Knowledge

Studying systems development methodologiesan examination of research methods

Inform. Systems J.

The New Production of KnowledgeThe Dynamics of Science and Research in Contemporary Societies

Evaluating and improving the quality of entity relationship modelsan action research programme

Aust. Comput. J.

Information systems methodologies in use

J. Inform. Technol.

Cognitive Systematization

A critical perspective on action research as a method for information systems research

J. Inform. Technol.

Information Systems ResearchIssues, Methods and Practical Guidelines

Systems development in information systems research

J. Manage. Inform. Systems

Mixing qualitative and quantitative methodstriangulation in action

Administrative Sci. Q.

Combining qualitative and quantitative methods in information systems researcha case study

MIS Q.

Integrating positivist and interpretivist approaches to organisational research

Organ. Sci.

Action Research

Comm. ACM

Doing Social Research

Curriculum Action ResearchA Handbook of Methods and Resources for the Reflective Practitioner

Action Research—A Handbook for Practitioners