Adding monotonicity to learning algorithms may impair their accuracy

https://doi.org/10.1016/j.eswa.2008.08.021Get rights and content

Abstract

Ordinal (i.e., ordered) classifiers are used to make judgments that we make on a regular basis, both at work and at home. Perhaps surprisingly, there have been no comprehensive studies in the scientific literature comparing the various ordinal classifiers. This paper compares the accuracy of five ordinal and three non-ordinal classifiers on a benchmark of fifteen real-world datasets. The results show that the ordinal classifiers that were tested had no meaningful statistical advantage over the corresponding non-ordinal classifiers. Furthermore, the ordinal classifiers that guaranteed monotonic classifications showed no meaningful statistical advantage over a majority-based classifier. We suggest that the tested ordinal classifiers did not properly utilize the order information in the presence of non-monotonic noise.

Introduction

Ordinal classification problems are those in which the class is neither numeric nor nominal. Instead, the class values are ordered. For instance, an employee can be described as “excellent”, “good” or “bad”, and a bond can be evaluated as “AAA”, “AA”, “A”, “A-”, etc. Similar to a numeric scale, an ordinal scale has an order, but unlike it – it does not posses a precise notion of distance. We cannot say that “AA” is closer to “AAA” than it is to “A”, nor the other way round. In this respect an ordinal scale is similar to a nominal one.

Ordinal classification problems are important, since they are very common in our everyday life. Selecting the best route to work, where to shop, which product to buy, and where to live, are just examples of daily ordinal decision-making. Employee selection and promotion, determination of credit rating, bond rating, economic performance of countries, industries and firms, and insurance underwriting, are examples of ordinal problem-solving in business. Rating manuscripts, evaluating lecturers, student admissions, and decisions about scholarships for students, are examples of ordinal decision-making in academic life.

Ordinal problems have been investigated in scientific disciplines such as decision-making, psychology, and statistics for many decades. The machine learning community, on the other hand, has mainly focused on the learning of numeric and nominal problems. The UCI Machine Learning Repository (UCI), for example, the major source of experimental data for machine learning research, has currently more than 160 datasets, with less than ten of them being ordinal. The Weka Machine Learning Project (WEKA), perhaps the most popular test-bed for machine learning research, has dozens of built-in classifiers, only one of which is ordinal. These examples demonstrate that the learning of ordinal concepts has not been of especial interest for the machine learning community.

In the last decade or so, however, an increasing number of publications report progress in the artificial learning of ordinal concepts. Machine learning models, such as decision trees, neural networks, and support vector machines have been extended to support ordinal classification. Each model makes different assumptions. For instance, one of the major differences among the various approaches towards ordinal concept learning is the way by which monotonicity of classifications is dealt with. A model that guarantees monotonic classifications will never classify a young and healthy applicant in a higher life insurance premium category than an old, unhealthy one. We later refer to this issue in greater detail. Some ordinal models ignore monotonicity-related considerations altogether, while others do not. Some monotonic ordinal classifiers need monotonic examples to learn from, while others are capable of learning from non-monotonic examples as well.

So far, there have been no comprehensive reports in the scientific literature comparing the various ordinal classifiers. As a result, it is not clear whether these models currently offer any significant benefit over non-ordinal classifiers. Intuitively we would expect ordinal classifiers, that use the information about the order within their domain, to classify more accurately than non-ordinal models which do not use this information. However, this hypothesis has not yet been tested on a meaningful scale with sufficient ordinal and non-ordinal classifiers, and on a large enough benchmark of real-world ordinal datasets. Ordinal classifiers that do guarantee monotonic classifications are expected to achieve this goal at some cost in accuracy, since a constraint, monotonicity, is imposed upon their concepts. Yet, it is not clear from the current literature whether the cost in accuracy (if any) is high or not, when such models learn from a benchmark of real-world datasets.

Eight classifiers were tested on fifteen ordinal datasets in our experiment. Five of the classifiers were ordinal, and three were non-ordinal. Two of the ordinal classifiers guaranteed monotonic classifications, and three did not. One of the three non-ordinal classifiers, a majority-based classifier, was used as a baseline.

Some of the results we report are quite unexpected. As far as this benchmark of classifiers and datasets is concerned, the two major findings are:

  • A.

    The ordinal classifiers were statistically indistinguishable from their non-ordinal counterparts. For instance, the ordinal version of support vector machine (SVM) failed to show any meaningful statistical advantage in terms of predictive accuracy over a “regular” (i.e., non-ordinal) SVM classifier. A similar observation applies to ordinal versus non-ordinal versions of logistic regression.

  • B.

    The two ordinal classifiers that guarantee monotonic classifications were both statistically indistinguishable from a majority-based classifier.

We attribute these unexpected results to the high levels of non-monotonic noise in most of the datasets. The findings of this research suggest that much more needs to be understood about ordinal machine learning classifiers. In particular, models that aim to learn monotonic patterns in presence of non-monotonic noise should be further investigated and refined. Hopefully, our empirical research will encourage the artificial intelligence and data mining communities to increase research on these topics.

The next section describes the various main approaches towards ordinal classification. The experiment and its major findings are described later. A discussion of the findings follows, as well as some suggestions for future work.

Section snippets

Ordinal classifiers

Several types of ordinal classifiers currently exist. This section begins with a taxonomy of ordinal classifiers. After describing their basic properties, individual classifiers will briefly be presented. In order to keep this overview focused and within a reasonable length, only a general description is given about each classifier, without delving too deeply into mathematical details.

A. Class values

All ordinal classifiers assume that the class values are ordinal (i.e., ordered). However, they

The experiment

The main goals of the experiment were to compare the predictive accuracy of:

  • Ordinal versus non-ordinal classifiers.

  • Ordinal classifiers that guarantee monotonic classifications versus those which do not.

The research hypotheses were:

  • Ordinal classifiers should be more accurate than non-ordinal classifiers, since the former utilize the order information that the latter do not use.

  • Ordinal classifiers that guarantee monotonic classifications should be less accurate than those that do not, due to the

The results

Table 3 shows the average accuracy (upper line, labeled A) and the average Kappa (lower line, labeled K) for each dataset. Each value that is shown in Table 3 is the average outcome of one stratified 10-fold cross validation run on the testing (i.e., unseen) portion of the dataset. As has been mentioned above, the Kappa values of ZeroR, the baseline majority classifier, are zero by definition.

Table 4 shows the average rankings by accuracy, while Table 5 shows the rankings via Kappa. These two

Discussion

Table 6, Table 7 give conflicting conclusions about some of the rankings. For example, SVOR is more accurate than OSDL, while they are statistically indistinguishable by Kappa. On the other hand, OLR is more accurate than ZeroR via Kappa, but they are statistically indistinguishable by accuracy. The first question that arises is which measure to adopt? Accuracy or Kappa?

According to a study by Demsar (2006), most machine learning research publications still report the rankings of classifiers by

Conclusions

Ordinal classifiers should be of major interest to the machine learning community for the simple reason that we, human beings, solve such problems on a daily basis. We classify ordinals all the time, consciously or unconsciously.

Despite a growing interest within the machine learning community, there is still a lot to be investigated. We have shown here a benchmark of fifteen real-world ordinal datasets, where state-of-the-art ordinal classifiers have failed to show any statistically meaningful

Acknowledgements

This research would not have been possible without the assistance we have received from many people and organizations. In particular we would like to thank Eibe Frank (University of Waikato, New Zealand) for writing a version of the OLM for the Weka environment, Kim Cao-Van and Bernard DeBaets (University of Ghent, Belgium) for the OSDL code, Wei Chu (University College, London, UK) for donating the SVOR, and to G. Smyth for writing the code for OLR in Matlab. We are also thankful for all those

References (24)

  • A. Ben David et al.

    Evaluation of the number of consistent multiattribute classification rules

    Engineering Applications of Artificial Intelligence

    (1997)
  • Y. Ganzach

    Goals as determinants of nonlinear noncompensatory judgment strategies

    Organizational Behavior and Human Decision Processes

    (1993)
  • Altendorf, E. E., Restificar, A. C., & Dietterich, T. G. (2005). Learning from sparse data by exploiting monotonicity...
  • A. Ben David et al.

    Learning and classification of monotonic ordinal concepts

    Computational Intelligence

    (1989)
  • A. Ben David

    Monotonicity maintenance in information-theoretic machine learning algorithms

    Machine Learning

    (1995)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • Chu, W., & Keerthi, S. S. (2005). New approaches to support vector ordinal regression. In Proceedings of the 22nd...
  • Cao-Van, Kim (2003). Supervised ranking – from semantics to algorithms. Ph.D. Thesis. Belgium: CS Department, Ghent...
  • J.A. Cohen

    Coefficient of agreement for nominal scales

    Educational and Psychological Measurement

    (1960)
  • H. Daniels et al.

    Application of MLP networks to bond rating and house pricing

    Neural Commuting & Applications

    (1999)
  • J. Demsar

    Statistical comparisons of classifiers over multiple datasets

    Journal of Machine Learning Research

    (2006)
  • Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. In 12th European conference on machine...
  • Cited by (44)

    • Classification of severity of trachea stenosis from EEG signals using ordinal decision-tree based algorithms and ensemble-based ordinal and non-ordinal algorithms

      2021, Expert Systems with Applications
      Citation Excerpt :

      For ordinal prediction problems, conventional non-ordinal classification algorithms for nominal classes can be applied by discarding the ordering information in the class attribute. In fact, it has been shown that ordinal classifiers do not outperform their non-ordinal sibling classifiers, especially when the ordinal classifiers assume monotonicity in the explanatory variables relative to the ordinal target (such as regression-based ordinal classifiers), while this assumption does actually not hold in the dataset (Ben-David et al., 2009). Due to the above limitations, the present study used the third approach of devising specific techniques.

    • Fuzzy k-nearest neighbors with monotonicity constraints: Moving towards the robustness of monotonic noise

      2021, Neurocomputing
      Citation Excerpt :

      More metrics must be used during the learning and validation of the models. However, these other objectives may impair accuracy [4]. Hence, a fair balance must be sought between the different needs of each problem.

    • A weighted information-gain measure for ordinal classification trees

      2020, Expert Systems with Applications
      Citation Excerpt :

      Moreover, it is not clear whether these ordinal models outperform their non-ordinal sibling classifiers. In fact, Ben-David et al. (2009) showed that the ordinal classifiers were statistically indistinguishable from their non-ordinal counterparts since the monotonicity assumption led to high levels of non-monotonic noise data that resulted in a poor classification accuracy. In this research we propose an extended ordinal decision-tree, which is based for simplicity reasons on the well-known C4.5 algorithm but does not depend on any monotonicity constraint of the classifying attributes.

    • Local neighborhood rough set

      2018, Knowledge-Based Systems
    • Local rough set: A solution to rough data analysis in big data

      2018, International Journal of Approximate Reasoning
    • RULEM: A novel heuristic rule learning approach for ordinal classification with monotonicity constraints

      2017, Applied Soft Computing Journal
      Citation Excerpt :

      Medical knowledge suggests a lower survival rate for patients that are older, have more detected positive axillary nodes, and whose operation was less recent. The ERA, ESL, LEV, and SVD data sets have been described in [6]. All attributes in these data sets are positively related to the target variables, and total monotonicity of the classification model is demanded.

    View all citing articles on Scopus
    View full text