Towards objective measures of algorithm performance across instance space
Introduction
Objective assessment of optimization algorithm performance is notoriously difficult [1], [2], especially when the conclusions depend so heavily on the chosen test instances of the optimization problem. The popular use of benchmark libraries of instances (e.g. the OR-Library [3]) helps to standardize the testing of algorithms, but may not be sufficient to reveal the true strengths and weaknesses of algorithms. As cautioned by Hooker [1], [2] nearly two decades ago, there is a need to be careful about the conclusions that can be drawn beyond the selected instances. It has been documented that there are some optimization problems where the benchmark library instances are not very diverse [4] and there is a danger that algorithms are developed and tuned to perform well on these instances without understanding the performance that can be expected on instances with diverse properties. Furthermore, while the peer-review process usually ensures that standard benchmark instances are used for well-studied problems, for many real-world or more unusual optimization problems there is a lack of benchmark instances, and a tendency for papers to be published that report algorithm performance based only on a small set of instances presented by the authors. Such papers typically are able to demonstrate that the new algorithm proposed by the authors outperforms other previously published approaches (it is difficult to get published otherwise), and the choice of instances cannot be challenged due to the lack of alternative instances.
The No-Free-Lunch (NFL) Theorems [5], [6] state that all optimization algorithms have identically distributed performance when objective functions are drawn uniformly at random, and all algorithms have identical mean performance across the set of all optimization problems. Does this idea apply also to different instances of a particular optimization problem, which gives rise to only a subset of possible objective functions? Probably not [7], but it still seems unwise to believe that any one optimization algorithm will always be superior for all possible instances of a given problem. We should expect that any algorithm has weaknesses, and that some instances could be conceived where the algorithm would be less effective than its competitors, or at least instances exist where their competitive advantage disappears. Our current research culture, where negative results are seen as somehow less of a contribution than positive ones, means that the true strengths and weaknesses of an optimization algorithm are rarely exposed and reported. Yet for advancement of the field, surely we must find a way to make it easier for researchers to report the strengths and weaknesses of their algorithms. On which types of instances does an algorithm outperform its competitors? Where is it less effective? How can we describe those instances?
Occasionally we find a paper that presents a well-defined class of instances where an algorithm performs well, and reports its failing outside this class (see [8] for a recent example). Such studies assist our understanding of an algorithm and its applicability. Does the class of instances where an algorithm is effective overlap real-world or other interesting instances? Is an algorithm only effective on instances where its competitors are also effective, or are there some classes where it is uniquely powerful? How do the properties of the instances affect algorithm performance? Until we develop the tools to enable researchers to quickly and easily determine the instances they need to consider to enable the boundary of effective algorithm performance to be described and quantified in terms of the properties of the instances, the objectivity of algorithm performance assessment will always be compromised with sample bias.
Recently, we have been developing the components of such a methodology [9]. Instances are represented as points in a high-dimensional feature space, with features chosen intentionally to tease out the similarities and differences between instance classes. For many broad classes of optimization problems, a rich set of features have already been identified that can be used to summarize the properties of instances affecting instance difficulty (see [10] for a survey of suitable features). Representing all available instances of an optimization problem in a single space in this manner can often reveal inadequacies in the diversity of the test instances. We can observe for some problems that benchmark instances appear to be structurally similar to randomly generated instances, eliciting similar performance from algorithms, and are not well designed for testing the strengths and weaknesses of algorithms. We have previously proposed the use of evolutionary algorithms to intentionally construct instances that are easy or hard for specific algorithms [11], thereby guaranteeing diversity of the instance set. Once we have sufficient instances covering most regions of the high-dimensional feature space, we need to be able to superimpose algorithm performance in this space and visualize the boundaries of good performance. Using dimensional reduction techniques such as Principal Component Analysis, we have previously proposed projecting all instances to a two-dimensional “instance space” [9] where we can visualize the region where an algorithm can be expected to perform well based on generalization of its observable performance on the test instances. We call this region the algorithm footprint in instance space, and the relative size and uniqueness of an algorithm's footprint can be used as an objective measure of algorithm power. Inspection of the distribution of individual features across the instance space can also be used to generate new insights into how the properties of instances affect algorithm performance, and machine learning techniques can be employed in the feature space (or instance space) to predict algorithm performance on unseen instances [12]. Over the last few years we have applied components of this broad methodology to a series of optimization problems including the Travelling Salesman Problem [9], [11], [13], Job-Shop Scheduling [14], Quadratic Assignment Problem [15], Graph Coloring [12], [16], and Timetabling Problems [17], [18].
While our previous research has generated an initial methodology, it has raised a number of questions that need to be addressed for a more comprehensive tool to be developed: How should we select the right features to represent the instance space most effectively? How can we determine the sufficiency and diversity of the set of instances? Can we more accurately predict algorithm performance in the high-dimensional feature space or the projected two-dimensional space? How should we determine the boundary of where we expect an algorithm to perform well based on limited observations? How can we reveal the strengths and weaknesses of a portfolio of algorithms, as well as their unique strengths and weaknesses within the portfolio.
This paper extends the methodology that has been under development for the last few years by addressing these last remaining questions. We demonstrate the use of the methodology by applying it to some computational results reported recently for an extensive comparison of graph coloring heuristics [19]. This case study reveals insights into the relative powers of the chosen optimization algorithms that were not apparent by considering performance averaged across all chosen instances.
The remainder of this paper is as follows: in Section 2 we present the framework upon which our methodology rests – the Algorithm Selection Problem [20] – which considers the relationships between the instance set, features, algorithms, and performance metrics. The detailed steps of the methodology are then described in Section 3, after proposing solutions to the questions raised above. In Section 4, we present a graph coloring case study based on the computational experiments of Lewis et al. [19] and discuss the new insights that the methodology has generated. Our conclusions are presented in Section 5, along with suggestions for use of the methodology and future research directions.
Section snippets
Framework: the algorithm selection problem
In 1976, Rice [20] proposed a framework for the Algorithm Selection Problem (ASP), which seeks to predict which algorithm from a portfolio is likely to perform best based on measurable features of problem instances. While Rice's focus was not on optimization algorithms, instead applying this approach to predict the performance of partial differential equation solvers [21], [22], the framework is one that is readily generalizable to other domains (see the survey paper by Smith-Miles [23] for a
Methodology
The proposed methodology comprises three stages:
- 1.
Generating the instance space – a process whereby instances are selected, their features calculated, and an optimal subset of features is generated to create a high-dimensional summary of the instances in feature space that, when projected to the two-dimensional instance space, achieves good separation of the easy and hard instances.
- 2.
Algorithm performance prediction – using the location of an instance within the instance space, machine learning
Graph coloring case study
In this section we introduce the graph coloring problem and a set of algorithms studied by Lewis et al. [19]. We start by defining the meta-data for our study in Section 4.1, including the set of graph coloring instances, the features we use to summarize the instances, the set of algorithms, and the performance metric chosen to measure algorithm performance. We then demonstrate the methodology introduced in the previous section by generating the instance space in Section 4.2, and visualizing
Conclusions
This paper has proposed a new methodology for objective assessment of the relative power of algorithms, in general, and has focused on optimization algorithms in particular. It is a methodology based on representing instances to optimization problems as points in a two-dimensional plane, which opens up the opportunity to visualize instance diversity and observe any sample bias; to identify the regions of instance space where algorithms have unique strengths and weaknesses; and to generate new
Acknowledgments
This research is funded by the Australian Research Council under Grant DP120103678. The authors are grateful to Gordon Royle for providing useful discussions about the energy of a graph. They are also grateful to the two reviewers who made useful suggestions to improve the clarity of the paper, and to Simon Bowly for his assistance with some of the figures.
References (60)
- et al.
Measuring instance difficulty for combinatorial optimization problems
Comput Oper Res
(2012) - et al.
A wide-ranging computational comparison of high-performance graph colouring algorithms
Comput Oper Res
(2012) - et al.
GAUSSan online algorithm selection system for numerical quadrature
Adv Eng Softw
(2002) - et al.
A graph-based hyper-heuristic for educational timetabling problems
Eur J Oper Res
(2007) An introduction to timetabling
Eur J Oper Res
(1985)Some models of graphs for scheduling sports competitions
Discrete Appl Math
(1988)- et al.
Measuring instance difficulty for combinatorial optimization problems
Comput Oper Res
(2012) - et al.
House of graphsa database of interesting graphs
Discrete Appl Math
(2013) Facet defining inequalities among graph invariantsthe system graphedron
Discrete Appl Math
(2008)- et al.
Use of the Szeged index and the revised Szeged index for measuring network bipartivity
Discrete Appl Math
(2010)
The energy of a graph
Linear Algebra Appl
An algorithm for finding a maximum clique in a graph
Oper Res Lett
A general-purpose hill-climbing method for order independent minimum grouping problemsa case study in graph colouring and bin packing
Comput Oper Res
A graph coloring heuristic using partial solutions and a reactive tabu scheme
Comput Oper Res
An improved ant colony optimisation heuristic for graph colouring
Discrete Appl Math
Neededan empirical science of algorithms
Oper Res
Testing heuristicswe have it all wrong
J Heuristics
OR-Librarydistributing test problems by electronic mail
J Oper Res Soc
The effects of coefficient correlation structure in two-dimensional knapsack problems on solution procedure performance
Manag Sci
No free lunch theorems for optimization
IEEE Trans Evolut Comput
On the futility of blind searchan algorithmic view of ‘no free lunch’
Evolut Comput
A no-free-lunch theorem for non-uniform distributions of target functions
J Math Model Algorithms
Discovering the suitability of optimisation algorithms by learning from evolved instances
Ann Math Artif Intell
Predicting metaheuristic performance on graph coloring problems using data mining
Cited by (161)
Instance space analysis for 2D bin packing mathematical models
2024, European Journal of Operational ResearchOn the impact of initialisation strategies on Maximum Flow algorithm performance
2024, Computers and Operations ResearchVerifying new instances of the multidemand multidimensional knapsack problem with instance space analysis
2024, Computers and Operations ResearchFeatures for the 0-1 knapsack problem based on inclusionwise maximal solutions
2023, European Journal of Operational ResearchBenchmarking surrogate-based optimisation algorithms on expensive black-box functions
2023, Applied Soft ComputingResponsive strategic oscillation for solving the disjunctively constrained knapsack problem
2023, European Journal of Operational Research