Towards objective measures of algorithm performance across instance space

https://doi.org/10.1016/j.cor.2013.11.015Get rights and content

Abstract

This paper tackles the difficult but important task of objective algorithm performance assessment for optimization. Rather than reporting average performance of algorithms across a set of chosen instances, which may bias conclusions, we propose a methodology to enable the strengths and weaknesses of different optimization algorithms to be compared across a broader instance space. The results reported in a recent Computers and Operations Research paper comparing the performance of graph coloring heuristics are revisited with this new methodology to demonstrate (i) how pockets of the instance space can be found where algorithm performance varies significantly from the average performance of an algorithm; (ii) how the properties of the instances can be used to predict algorithm performance on previously unseen instances with high accuracy; and (iii) how the relative strengths and weaknesses of each algorithm can be visualized and measured objectively.

Introduction

Objective assessment of optimization algorithm performance is notoriously difficult [1], [2], especially when the conclusions depend so heavily on the chosen test instances of the optimization problem. The popular use of benchmark libraries of instances (e.g. the OR-Library [3]) helps to standardize the testing of algorithms, but may not be sufficient to reveal the true strengths and weaknesses of algorithms. As cautioned by Hooker [1], [2] nearly two decades ago, there is a need to be careful about the conclusions that can be drawn beyond the selected instances. It has been documented that there are some optimization problems where the benchmark library instances are not very diverse [4] and there is a danger that algorithms are developed and tuned to perform well on these instances without understanding the performance that can be expected on instances with diverse properties. Furthermore, while the peer-review process usually ensures that standard benchmark instances are used for well-studied problems, for many real-world or more unusual optimization problems there is a lack of benchmark instances, and a tendency for papers to be published that report algorithm performance based only on a small set of instances presented by the authors. Such papers typically are able to demonstrate that the new algorithm proposed by the authors outperforms other previously published approaches (it is difficult to get published otherwise), and the choice of instances cannot be challenged due to the lack of alternative instances.

The No-Free-Lunch (NFL) Theorems [5], [6] state that all optimization algorithms have identically distributed performance when objective functions are drawn uniformly at random, and all algorithms have identical mean performance across the set of all optimization problems. Does this idea apply also to different instances of a particular optimization problem, which gives rise to only a subset of possible objective functions? Probably not [7], but it still seems unwise to believe that any one optimization algorithm will always be superior for all possible instances of a given problem. We should expect that any algorithm has weaknesses, and that some instances could be conceived where the algorithm would be less effective than its competitors, or at least instances exist where their competitive advantage disappears. Our current research culture, where negative results are seen as somehow less of a contribution than positive ones, means that the true strengths and weaknesses of an optimization algorithm are rarely exposed and reported. Yet for advancement of the field, surely we must find a way to make it easier for researchers to report the strengths and weaknesses of their algorithms. On which types of instances does an algorithm outperform its competitors? Where is it less effective? How can we describe those instances?

Occasionally we find a paper that presents a well-defined class of instances where an algorithm performs well, and reports its failing outside this class (see [8] for a recent example). Such studies assist our understanding of an algorithm and its applicability. Does the class of instances where an algorithm is effective overlap real-world or other interesting instances? Is an algorithm only effective on instances where its competitors are also effective, or are there some classes where it is uniquely powerful? How do the properties of the instances affect algorithm performance? Until we develop the tools to enable researchers to quickly and easily determine the instances they need to consider to enable the boundary of effective algorithm performance to be described and quantified in terms of the properties of the instances, the objectivity of algorithm performance assessment will always be compromised with sample bias.

Recently, we have been developing the components of such a methodology [9]. Instances are represented as points in a high-dimensional feature space, with features chosen intentionally to tease out the similarities and differences between instance classes. For many broad classes of optimization problems, a rich set of features have already been identified that can be used to summarize the properties of instances affecting instance difficulty (see [10] for a survey of suitable features). Representing all available instances of an optimization problem in a single space in this manner can often reveal inadequacies in the diversity of the test instances. We can observe for some problems that benchmark instances appear to be structurally similar to randomly generated instances, eliciting similar performance from algorithms, and are not well designed for testing the strengths and weaknesses of algorithms. We have previously proposed the use of evolutionary algorithms to intentionally construct instances that are easy or hard for specific algorithms [11], thereby guaranteeing diversity of the instance set. Once we have sufficient instances covering most regions of the high-dimensional feature space, we need to be able to superimpose algorithm performance in this space and visualize the boundaries of good performance. Using dimensional reduction techniques such as Principal Component Analysis, we have previously proposed projecting all instances to a two-dimensional “instance space” [9] where we can visualize the region where an algorithm can be expected to perform well based on generalization of its observable performance on the test instances. We call this region the algorithm footprint in instance space, and the relative size and uniqueness of an algorithm's footprint can be used as an objective measure of algorithm power. Inspection of the distribution of individual features across the instance space can also be used to generate new insights into how the properties of instances affect algorithm performance, and machine learning techniques can be employed in the feature space (or instance space) to predict algorithm performance on unseen instances [12]. Over the last few years we have applied components of this broad methodology to a series of optimization problems including the Travelling Salesman Problem [9], [11], [13], Job-Shop Scheduling [14], Quadratic Assignment Problem [15], Graph Coloring [12], [16], and Timetabling Problems [17], [18].

While our previous research has generated an initial methodology, it has raised a number of questions that need to be addressed for a more comprehensive tool to be developed: How should we select the right features to represent the instance space most effectively? How can we determine the sufficiency and diversity of the set of instances? Can we more accurately predict algorithm performance in the high-dimensional feature space or the projected two-dimensional space? How should we determine the boundary of where we expect an algorithm to perform well based on limited observations? How can we reveal the strengths and weaknesses of a portfolio of algorithms, as well as their unique strengths and weaknesses within the portfolio.

This paper extends the methodology that has been under development for the last few years by addressing these last remaining questions. We demonstrate the use of the methodology by applying it to some computational results reported recently for an extensive comparison of graph coloring heuristics [19]. This case study reveals insights into the relative powers of the chosen optimization algorithms that were not apparent by considering performance averaged across all chosen instances.

The remainder of this paper is as follows: in Section 2 we present the framework upon which our methodology rests – the Algorithm Selection Problem [20] – which considers the relationships between the instance set, features, algorithms, and performance metrics. The detailed steps of the methodology are then described in Section 3, after proposing solutions to the questions raised above. In Section 4, we present a graph coloring case study based on the computational experiments of Lewis et al. [19] and discuss the new insights that the methodology has generated. Our conclusions are presented in Section 5, along with suggestions for use of the methodology and future research directions.

Section snippets

Framework: the algorithm selection problem

In 1976, Rice [20] proposed a framework for the Algorithm Selection Problem (ASP), which seeks to predict which algorithm from a portfolio is likely to perform best based on measurable features of problem instances. While Rice's focus was not on optimization algorithms, instead applying this approach to predict the performance of partial differential equation solvers [21], [22], the framework is one that is readily generalizable to other domains (see the survey paper by Smith-Miles [23] for a

Methodology

The proposed methodology comprises three stages:

  • 1.

    Generating the instance space – a process whereby instances are selected, their features calculated, and an optimal subset of features is generated to create a high-dimensional summary of the instances in feature space that, when projected to the two-dimensional instance space, achieves good separation of the easy and hard instances.

  • 2.

    Algorithm performance prediction – using the location of an instance within the instance space, machine learning

Graph coloring case study

In this section we introduce the graph coloring problem and a set of algorithms studied by Lewis et al. [19]. We start by defining the meta-data for our study in Section 4.1, including the set of graph coloring instances, the features we use to summarize the instances, the set of algorithms, and the performance metric chosen to measure algorithm performance. We then demonstrate the methodology introduced in the previous section by generating the instance space in Section 4.2, and visualizing

Conclusions

This paper has proposed a new methodology for objective assessment of the relative power of algorithms, in general, and has focused on optimization algorithms in particular. It is a methodology based on representing instances to optimization problems as points in a two-dimensional plane, which opens up the opportunity to visualize instance diversity and observe any sample bias; to identify the regions of instance space where algorithms have unique strengths and weaknesses; and to generate new

Acknowledgments

This research is funded by the Australian Research Council under Grant DP120103678. The authors are grateful to Gordon Royle for providing useful discussions about the energy of a graph. They are also grateful to the two reviewers who made useful suggestions to improve the clarity of the paper, and to Simon Bowly for his assistance with some of the figures.

References (60)

  • R. Balakrishnan

    The energy of a graph

    Linear Algebra Appl

    (2004)
  • D. Wood

    An algorithm for finding a maximum clique in a graph

    Oper Res Lett

    (1997)
  • R. Lewis

    A general-purpose hill-climbing method for order independent minimum grouping problemsa case study in graph colouring and bin packing

    Comput Oper Res

    (2009)
  • I. Blöchliger et al.

    A graph coloring heuristic using partial solutions and a reactive tabu scheme

    Comput Oper Res

    (2008)
  • K.A. Dowsland et al.

    An improved ant colony optimisation heuristic for graph colouring

    Discrete Appl Math

    (2008)
  • J. Hooker

    Neededan empirical science of algorithms

    Oper Res

    (1994)
  • J. Hooker

    Testing heuristicswe have it all wrong

    J Heuristics

    (1995)
  • J. Beasley

    OR-Librarydistributing test problems by electronic mail

    J Oper Res Soc

    (1990)
  • R. Hill et al.

    The effects of coefficient correlation structure in two-dimensional knapsack problems on solution procedure performance

    Manag Sci

    (2000)
  • D.H. Wolpert et al.

    No free lunch theorems for optimization

    IEEE Trans Evolut Comput

    (1997)
  • J. Culberson

    On the futility of blind searchan algorithmic view of ‘no free lunch’

    Evolut Comput

    (1998)
  • C. Igel et al.

    A no-free-lunch theorem for non-uniform distributions of target functions

    J Math Model Algorithms

    (2005)
  • Margulies S, Ma J, Hicks I. The Cunningham–Geelen method in practice: branch-decompositions and integer programming....
  • Smith-Miles K, Tan T. Measuring algorithm footprints in instance space. In: IEEE congress on evolutionary computation...
  • K. Smith-Miles et al.

    Discovering the suitability of optimisation algorithms by learning from evolved instances

    Ann Math Artif Intell

    (2011)
  • K. Smith-Miles et al.

    Predicting metaheuristic performance on graph coloring problems using data mining

  • Smith-Miles K, van Hemert J, Lim X. Understanding TSP difficulty by learning from evolved instances. In: Learning and...
  • Smith-Miles K, James R, Giffin J, Tu Y. A knowledge discovery approach to understanding relationships between...
  • Smith-Miles K. Towards insightful algorithm selection for optimisation using meta-learning concepts. In: IEEE...
  • Smith-Miles K, Baatar D. Exploring the role of graph spectra in graph coloring algorithm performance. Discrete Appl...
  • Cited by (161)

    • Instance space analysis for 2D bin packing mathematical models

      2024, European Journal of Operational Research
    View all citing articles on Scopus
    View full text