Abstract
Non-informative or redundant features with big data can significantly reduce the performance of any machine learning problem. They render the model training costly and the model interpretability weak. Traditional feature selection methods, particularly wrapper methods, often performed using greedy search, are susceptible to suboptimal solutions, selection bias, and high variability due to noise in the data. Our simulation optimization framework seeks to identify the best subset of features by utilizing resamples of the training and test set, where the random holdout errors produce the simulation outputs. The resulting feature subsets are more reliable since they perform well on several resampled datasets. Our experiments on four actual and simulated datasets indicate the fixed sampling approach’s competitive advantages in various performance metrics. Further, we develop adaptive sampling strategies for large enough datasets, where the number of training and test resamples vary for each solution. Adaptive sample sizes reach the same quality level of recommended feature subsets but significantly faster than the fixed sample size version.
Similar content being viewed by others
Notes
There is no publicly available code for the nested partitioning method, so we compare results with those published on nested partitioning solvers on a unique dataset from UCI (Dua and Graff 2017).
References
Abramson MA, Audet C, Chrissis JW, Walston JG (2009) Mesh adaptive direct search algorithms for mixed variable optimization. Optim Lett 3(1):35
Almuallim H, Dietterich TG (1994) Learning Boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305
Audet C, Dennis JE Jr (2002) Analysis of generalized pattern searches. SIAM J Optim 13(3):889–903
Audet C, Dennis JE Jr (2006) Mesh adaptive direct search algorithms for constrained optimization. SIAM J Optim 17(1):188–217
Bareiss ER, Porter B (1987) Protos: an exemplar-based learning apprentice. In: Proceedings of the 4th international workshop on machine learning, pp 12–23
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems series NIPS’11. Curran Associates Inc., Red Hook, NY, pp 2546–2554
Billingsley P (2012) Probability and measure. Wiley, Hoboken
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) MLR: Machine learning in R. J Mach Learn Res 17(170):1–5
Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2017) mlrmbo: a modular framework for model-based optimization of expensive black-box functions. arXiv preprint arXiv:1703.03373
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575
Cardie C (1993) Using decision trees to improve case-based learning. In: Proceedings of the tenth international conference on machine learning, pp 25–32
Chen Y-W, Lin C-J (2006) Combining SVMs with various feature selection strategies. In: Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) Feature extraction. Springer, Berlin, Heidelberg, pp 315–324
Cristianini N, Shawe-Taylor J et al (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Derrac J, García S, Herrera F (2012) A survey on evolutionary instance selection and generation. In: Yin, P-P (ed) Modeling, analysis, and applications in metaheuristic computing: advancements and trends. IGI Global, pp 233–266
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Eckman DJ, Henderson SG, Shashaani S (2021) Evaluating and comparing simulation-optimization algorithms (under review)
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26
Efron B, Tibshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92(438):548–560
Fisher A, Rudin C, Dominici F (2018) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. arXiv preprint arXiv:1801.01489
Fu CCMC, Hu J, Xiong X (2004) Optimal computing budget allocation under correlated sampling. In: Proceedings of the 2004 Winter simulation conference, Washington, DC,USA, p 603
Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70(350):320–328
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Haury A-C, Gestraud P, Vert J-P (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLOS One 6:e28210
Hong LJ, Nelson BL (2006) Discrete optimization via simulation using compass. Oper Res 54(1):115–129
Hunter SR, Nelson BL (2017) Parallel ranking and selection. In: Tolk A, Fowler J, Shao G, Yücesan E (eds) Advances in modeling and simulation. Springer, Cham, pp 249–275
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Global Optim 13(4):455–492
Jung Y (2018) Multiple predicting k-fold cross-validation for model selection. J Nonparametr Stat 30(1):197–215
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95—international conference on neural networks, vol 4, pp 1942–1948
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics. Computational cybernetics and simulation, vol 5, pp 4104–4108
Kepplinger D, Filzmoser P, Varmuza K (2017) Variable selection with genetic algorithms using repeated cross-validation of pls regression models as fitness measure. arXiv preprint arXiv:1711.06695
Kim S, Pasupathy R, Henderson SG (2015) A guide to sample average approximation. In: Fu MC (ed) Handbook of simulation optimization. Springer, pp 207–243
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine learning proceedings. Elsevier, pp 249–256
Kleijnen JP (2009) Factor screening in simulation experiments: review of sequential bifurcation. In: Alexopoulos C, Goldsman D, Wilson JR (eds) Advancing the frontiers of simulation. Springer, pp 153–167
Kolda TG, Lewis RM, Torczon V (2003) Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev 45(3):385–482
Koumi F, Aldasht M, Tamimi H (2019) Efficient feature selection using particle swarm optimization: a hybrid filters-wrapper approach. In: 2019 10th international conference on information and communication systems (ICICS), pp 122–127
Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recogn 33(1):25–41
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin, vol, p 26
Le Digabel S (2011) Algorithm 909: Nomad: nonlinear optimization with the mads algorithm. ACM Trans Math Softw 37(4):1–15
Li R, Lu J, Zhang Y, Zhao T (2010) Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl-Based Syst 23(3):195–201
Liu W, Wang J (2019) A brief survey on nature-inspired metaheuristics for feature selection in classification in this decade. In: 2019 IEEE 16th international conference on networking, sensing and control (ICNSC), pp 424–429
Mak W-K, Morton DP, Wood RK (1999) Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper Res Lett 24(1–2):47–56
Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17
Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern B (Cybern) 36(1):106–117
Musavi M, Ahmed W, Chan K, Faris K, Hummels D (1992) On the training of radial basis function classifiers. Neural Netw 5(4):595–603
Muthukrishnan R, Rohini R (2016) Lasso: a feature selection technique in predictive modeling for machine learning. In: 2016 IEEE international conference on advances in computer applications (ICACA), Coimbatore, pp 18–20
Nazzal D, Mollaghasemi M, Hedlund H, Bozorgi A (2012) Using genetic algorithms and an indifference-zone ranking and selection procedure under common random numbers for simulation optimisation. J Simul 6(1):56–66
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Nelder JA, Wedderburn RW (1972) Generalized linear models. J R Stat Soc Ser A (Gen) 135(3):370–384
Ni EC, Ciocan DF, Henderson SG, Hunter SR (2017) Efficient ranking and selection in parallel computing environments. Oper Res 65(3):821–836
Ólafsson S (2004) Two-stage nested partitions method for stochastic optimization. Methodol Comput Appl Probab 6(1):5–27
Ólafsson S, Yang J (2005) Intelligent partitioning for feature selection. INFORMS J Comput 17(3):339–355
Pei L, Nelson BL, Hunter SR (2020) Evaluation of bi-pass for parallel simulation optimization. In: Proceedings of the 2020 winter simulation conference. IEEE, pp 2960–2971
Porcelli M, Toint PL (2017) Bfo, a trainable derivative-free brute force optimizer for nonlinear bound-constrained optimization and equilibrium computations with continuous and discrete variables. ACM Trans Math Softw (TOMS) 44(1):6
Redmond MA, Baveja A (2002) A data-driven software tool for enabling cooperative information sharing among police departments. Eur J Oper Res 141:660–678
Sanz-Garcia A, Fernandez-Ceniceros J, Antonanzas-Torres F, Pernia-Espinoza A, de Pison FM (2015) GA-parsimony: a GA-SVR approach with feature selection and parameter optimization to obtain parsimonious solutions for predicting temperature settings in a continuous annealing furnace. Appl Soft Comput 35:13–28
Sapp S, van der Laan MJ, Canny J (2014) Subsemble: an ensemble method for combining subset-specific algorithm fits. J Appl Stat 41(6):1247–1259
Shashaani S, Hashemi FS, Pasupathy R (2018) Astro-df: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J Optim 28(4):3145–3176
Singh DAAG, Appavu S, Leavline EJ (2016) Literature review on feature selection methods for high-dimensional data. Int J Comput Appl 136(1):0975–8887
Sinha A, Malo P, Kuosmanen T (2015) A multiobjective exploratory procedure for regression model selection. J Comput Graph Stat 24(1):154–182
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, pp 2951–2959
Song E, Nelson BL, Staum J (2016) Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA J Uncertainty Quant 4(1):1060–1083
Song E, Nelson BL, Hong LJ (2015) Input uncertainty and indifference-zone ranking and selection. In: Winter simulation conference (WSC) 2015, pp 414–424
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Urraca R, Sodupe-Ortega E, Antonanzas J, Antonanzas-Torres F, de Pison FM (2018) Evaluation of a novel GA-based methodology for model structure selection: The GA-parsimony. Neurocomputing 271:9–17
Vahdat K, Shashaani S (2020) Simulation optimization based feature selection, a study on data-driven optimization with input uncertainty. In: Proceedings of the 2020 winter simulation conference. IEEE, pp 2149–2160
Vahdat K, Shashaani S (2021) Non-parametric uncertainty bias and variance estimation via nested bootstrapping and influence functions. In: Kim S, Feng B, Masoud S, Zheng Z, Loper M (eds) Proceedings of the 2021 winter simulation conference. Institute of Electrical and Electronics Engineers, Inc, Savannah
van der Laan MJ, Polley EC, Hubbard AE (2007) “super learner’’, statistical applications in genetics and molecular biology, vol 6(25). Walter de Gruyter GmbH & Co. KG, Berlin/Boston, pp 1–23
Vasquez D, Shashaani S, Pasupathy R (2021) The complexity of adaptive sampling trust-region methods for nonconvex stochastic optimization. Working paper
Wang H, Pasupathy R, Schmeiser BW (2013) Integer-ordered simulation optimization using r-spline: retrospective search with piecewise-linear interpolation and neighborhood enumeration. ACM Trans Model Comput Simul (TOMACS) 23(3):17
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103
Xu J, Nelson BL, Hong LJ (2013) An adaptive hyperbox algorithm for high-dimensional discrete optimization via simulation problems. INFORMS J Comput 25(1):133–146
Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Yang J, Olafsson S (2006) Optimization-based feature selection with adaptive instance sampling. Comput Oper Res 33(11):3088–3106
Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn Lett 30(5):525–534
Zames G, Ajlouni N, Holland J, Hills W, Goldberg D (1981) Genetic algorithms in search, optimization and machine learning. Inf Technol J 3(1):301–302
Zeng CTX, Chen Y, Alphen D (2009) Feature selection using recursive feature elimination for handwritten digit recognition. In: 2009 Fifth international conference on intelligent information hiding and multimedia signal processing, Kyoto, pp 1205–1208
Zhou Q, Zhou H, Zhou Q, Yang F, Luo L (2014) Structure damage detection based on random forest recursive feature elimination. Mech Syst Signal Process 46:82–90
Acknowledgements
The authors would like to thank Seth Guikema for the original discussions, Reha Uzsoy and Anton Panchishin for helpful comments, and the reviewer for excellent questions that helped us improve this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shashaani, S., Vahdat, K. Improved feature selection with simulation optimization. Optim Eng 24, 1183–1223 (2023). https://doi.org/10.1007/s11081-022-09726-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-022-09726-3