Improved feature selection with simulation optimization

Shashaani, Sara; Vahdat, Kimia

doi:10.1007/s11081-022-09726-3

Improved feature selection with simulation optimization

Research Article
Published: 30 May 2022

Volume 24, pages 1183–1223, (2023)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Sara Shashaani¹ &
Kimia Vahdat¹

340 Accesses
1 Citation
Explore all metrics

Abstract

Non-informative or redundant features with big data can significantly reduce the performance of any machine learning problem. They render the model training costly and the model interpretability weak. Traditional feature selection methods, particularly wrapper methods, often performed using greedy search, are susceptible to suboptimal solutions, selection bias, and high variability due to noise in the data. Our simulation optimization framework seeks to identify the best subset of features by utilizing resamples of the training and test set, where the random holdout errors produce the simulation outputs. The resulting feature subsets are more reliable since they perform well on several resampled datasets. Our experiments on four actual and simulated datasets indicate the fixed sampling approach’s competitive advantages in various performance metrics. Further, we develop adaptive sampling strategies for large enough datasets, where the number of training and test resamples vary for each solution. Adaptive sample sizes reach the same quality level of recommended feature subsets but significantly faster than the fixed sample size version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A survey on ensemble learning

Article 30 August 2019

A review of predictive uncertainty estimation with machine learning

Article Open access 18 March 2024

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Notes

There is no publicly available code for the nested partitioning method, so we compare results with those published on nested partitioning solvers on a unique dataset from UCI (Dua and Graff 2017).

References

Abramson MA, Audet C, Chrissis JW, Walston JG (2009) Mesh adaptive direct search algorithms for mixed variable optimization. Optim Lett 3(1):35
MathSciNet MATH Google Scholar
Almuallim H, Dietterich TG (1994) Learning Boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305
MathSciNet MATH Google Scholar
Audet C, Dennis JE Jr (2002) Analysis of generalized pattern searches. SIAM J Optim 13(3):889–903
MathSciNet MATH Google Scholar
Audet C, Dennis JE Jr (2006) Mesh adaptive direct search algorithms for constrained optimization. SIAM J Optim 17(1):188–217
MathSciNet MATH Google Scholar
Bareiss ER, Porter B (1987) Protos: an exemplar-based learning apprentice. In: Proceedings of the 4th international workshop on machine learning, pp 12–23
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems series NIPS’11. Curran Associates Inc., Red Hook, NY, pp 2546–2554
Billingsley P (2012) Probability and measure. Wiley, Hoboken
MATH Google Scholar
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) MLR: Machine learning in R. J Mach Learn Res 17(170):1–5
MathSciNet MATH Google Scholar
Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2017) mlrmbo: a modular framework for model-based optimization of expensive black-box functions. arXiv preprint arXiv:1703.03373
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
MATH Google Scholar
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575
Google Scholar
Cardie C (1993) Using decision trees to improve case-based learning. In: Proceedings of the tenth international conference on machine learning, pp 25–32
Chen Y-W, Lin C-J (2006) Combining SVMs with various feature selection strategies. In: Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) Feature extraction. Springer, Berlin, Heidelberg, pp 315–324
Cristianini N, Shawe-Taylor J et al (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
MATH Google Scholar
Derrac J, García S, Herrera F (2012) A survey on evolutionary instance selection and generation. In: Yin, P-P (ed) Modeling, analysis, and applications in metaheuristic computing: advancements and trends. IGI Global, pp 233–266
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Eckman DJ, Henderson SG, Shashaani S (2021) Evaluating and comparing simulation-optimization algorithms (under review)
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26
MathSciNet MATH Google Scholar
Efron B, Tibshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92(438):548–560
MathSciNet MATH Google Scholar
Fisher A, Rudin C, Dominici F (2018) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. arXiv preprint arXiv:1801.01489
Fu CCMC, Hu J, Xiong X (2004) Optimal computing budget allocation under correlated sampling. In: Proceedings of the 2004 Winter simulation conference, Washington, DC,USA, p 603
Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70(350):320–328
MATH Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
MATH Google Scholar
Haury A-C, Gestraud P, Vert J-P (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLOS One 6:e28210
Google Scholar
Hong LJ, Nelson BL (2006) Discrete optimization via simulation using compass. Oper Res 54(1):115–129
MATH Google Scholar
Hunter SR, Nelson BL (2017) Parallel ranking and selection. In: Tolk A, Fowler J, Shao G, Yücesan E (eds) Advances in modeling and simulation. Springer, Cham, pp 249–275
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Global Optim 13(4):455–492
MathSciNet MATH Google Scholar
Jung Y (2018) Multiple predicting k-fold cross-validation for model selection. J Nonparametr Stat 30(1):197–215
MathSciNet MATH Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95—international conference on neural networks, vol 4, pp 1942–1948
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics. Computational cybernetics and simulation, vol 5, pp 4104–4108
Kepplinger D, Filzmoser P, Varmuza K (2017) Variable selection with genetic algorithms using repeated cross-validation of pls regression models as fitness measure. arXiv preprint arXiv:1711.06695
Kim S, Pasupathy R, Henderson SG (2015) A guide to sample average approximation. In: Fu MC (ed) Handbook of simulation optimization. Springer, pp 207–243
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine learning proceedings. Elsevier, pp 249–256
Kleijnen JP (2009) Factor screening in simulation experiments: review of sequential bifurcation. In: Alexopoulos C, Goldsman D, Wilson JR (eds) Advancing the frontiers of simulation. Springer, pp 153–167
Kolda TG, Lewis RM, Torczon V (2003) Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev 45(3):385–482
MathSciNet MATH Google Scholar
Koumi F, Aldasht M, Tamimi H (2019) Efficient feature selection using particle swarm optimization: a hybrid filters-wrapper approach. In: 2019 10th international conference on information and communication systems (ICICS), pp 122–127
Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recogn 33(1):25–41
Google Scholar
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin, vol, p 26
MATH Google Scholar
Le Digabel S (2011) Algorithm 909: Nomad: nonlinear optimization with the mads algorithm. ACM Trans Math Softw 37(4):1–15
MathSciNet MATH Google Scholar
Li R, Lu J, Zhang Y, Zhao T (2010) Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl-Based Syst 23(3):195–201
Google Scholar
Liu W, Wang J (2019) A brief survey on nature-inspired metaheuristics for feature selection in classification in this decade. In: 2019 IEEE 16th international conference on networking, sensing and control (ICNSC), pp 424–429
Mak W-K, Morton DP, Wood RK (1999) Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper Res Lett 24(1–2):47–56
MathSciNet MATH Google Scholar
Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17
Google Scholar
Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern B (Cybern) 36(1):106–117
Google Scholar
Musavi M, Ahmed W, Chan K, Faris K, Hummels D (1992) On the training of radial basis function classifiers. Neural Netw 5(4):595–603
Google Scholar
Muthukrishnan R, Rohini R (2016) Lasso: a feature selection technique in predictive modeling for machine learning. In: 2016 IEEE international conference on advances in computer applications (ICACA), Coimbatore, pp 18–20
Nazzal D, Mollaghasemi M, Hedlund H, Bozorgi A (2012) Using genetic algorithms and an indifference-zone ranking and selection procedure under common random numbers for simulation optimisation. J Simul 6(1):56–66
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
MathSciNet MATH Google Scholar
Nelder JA, Wedderburn RW (1972) Generalized linear models. J R Stat Soc Ser A (Gen) 135(3):370–384
Google Scholar
Ni EC, Ciocan DF, Henderson SG, Hunter SR (2017) Efficient ranking and selection in parallel computing environments. Oper Res 65(3):821–836
MathSciNet MATH Google Scholar
Ólafsson S (2004) Two-stage nested partitions method for stochastic optimization. Methodol Comput Appl Probab 6(1):5–27
MathSciNet MATH Google Scholar
Ólafsson S, Yang J (2005) Intelligent partitioning for feature selection. INFORMS J Comput 17(3):339–355
Google Scholar
Pei L, Nelson BL, Hunter SR (2020) Evaluation of bi-pass for parallel simulation optimization. In: Proceedings of the 2020 winter simulation conference. IEEE, pp 2960–2971
Porcelli M, Toint PL (2017) Bfo, a trainable derivative-free brute force optimizer for nonlinear bound-constrained optimization and equilibrium computations with continuous and discrete variables. ACM Trans Math Softw (TOMS) 44(1):6
MathSciNet MATH Google Scholar
Redmond MA, Baveja A (2002) A data-driven software tool for enabling cooperative information sharing among police departments. Eur J Oper Res 141:660–678
MATH Google Scholar
Sanz-Garcia A, Fernandez-Ceniceros J, Antonanzas-Torres F, Pernia-Espinoza A, de Pison FM (2015) GA-parsimony: a GA-SVR approach with feature selection and parameter optimization to obtain parsimonious solutions for predicting temperature settings in a continuous annealing furnace. Appl Soft Comput 35:13–28
Google Scholar
Sapp S, van der Laan MJ, Canny J (2014) Subsemble: an ensemble method for combining subset-specific algorithm fits. J Appl Stat 41(6):1247–1259
MathSciNet MATH Google Scholar
Shashaani S, Hashemi FS, Pasupathy R (2018) Astro-df: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J Optim 28(4):3145–3176
MathSciNet MATH Google Scholar
Singh DAAG, Appavu S, Leavline EJ (2016) Literature review on feature selection methods for high-dimensional data. Int J Comput Appl 136(1):0975–8887
Google Scholar
Sinha A, Malo P, Kuosmanen T (2015) A multiobjective exploratory procedure for regression model selection. J Comput Graph Stat 24(1):154–182
MathSciNet Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, pp 2951–2959
Song E, Nelson BL, Staum J (2016) Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA J Uncertainty Quant 4(1):1060–1083
MathSciNet MATH Google Scholar
Song E, Nelson BL, Hong LJ (2015) Input uncertainty and indifference-zone ranking and selection. In: Winter simulation conference (WSC) 2015, pp 414–424
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Urraca R, Sodupe-Ortega E, Antonanzas J, Antonanzas-Torres F, de Pison FM (2018) Evaluation of a novel GA-based methodology for model structure selection: The GA-parsimony. Neurocomputing 271:9–17
Google Scholar
Vahdat K, Shashaani S (2020) Simulation optimization based feature selection, a study on data-driven optimization with input uncertainty. In: Proceedings of the 2020 winter simulation conference. IEEE, pp 2149–2160
Vahdat K, Shashaani S (2021) Non-parametric uncertainty bias and variance estimation via nested bootstrapping and influence functions. In: Kim S, Feng B, Masoud S, Zheng Z, Loper M (eds) Proceedings of the 2021 winter simulation conference. Institute of Electrical and Electronics Engineers, Inc, Savannah
van der Laan MJ, Polley EC, Hubbard AE (2007) “super learner’’, statistical applications in genetics and molecular biology, vol 6(25). Walter de Gruyter GmbH & Co. KG, Berlin/Boston, pp 1–23
Google Scholar
Vasquez D, Shashaani S, Pasupathy R (2021) The complexity of adaptive sampling trust-region methods for nonconvex stochastic optimization. Working paper
Wang H, Pasupathy R, Schmeiser BW (2013) Integer-ordered simulation optimization using r-spline: retrospective search with piecewise-linear interpolation and neighborhood enumeration. ACM Trans Model Comput Simul (TOMACS) 23(3):17
MathSciNet MATH Google Scholar
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103
MATH Google Scholar
Xu J, Nelson BL, Hong LJ (2013) An adaptive hyperbox algorithm for high-dimensional discrete optimization via simulation problems. INFORMS J Comput 25(1):133–146
MathSciNet Google Scholar
Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Google Scholar
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Google Scholar
Yang J, Olafsson S (2006) Optimization-based feature selection with adaptive instance sampling. Comput Oper Res 33(11):3088–3106
MATH Google Scholar
Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn Lett 30(5):525–534
Google Scholar
Zames G, Ajlouni N, Holland J, Hills W, Goldberg D (1981) Genetic algorithms in search, optimization and machine learning. Inf Technol J 3(1):301–302
Google Scholar
Zeng CTX, Chen Y, Alphen D (2009) Feature selection using recursive feature elimination for handwritten digit recognition. In: 2009 Fifth international conference on intelligent information hiding and multimedia signal processing, Kyoto, pp 1205–1208
Zhou Q, Zhou H, Zhou Q, Yang F, Luo L (2014) Structure damage detection based on random forest recursive feature elimination. Mech Syst Signal Process 46:82–90
Google Scholar

Download references

Acknowledgements

The authors would like to thank Seth Guikema for the original discussions, Reha Uzsoy and Anton Panchishin for helpful comments, and the reviewer for excellent questions that helped us improve this work.

Author information

Authors and Affiliations

Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, NC, USA
Sara Shashaani & Kimia Vahdat

Authors

Sara Shashaani
View author publications
You can also search for this author in PubMed Google Scholar
Kimia Vahdat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Shashaani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shashaani, S., Vahdat, K. Improved feature selection with simulation optimization. Optim Eng 24, 1183–1223 (2023). https://doi.org/10.1007/s11081-022-09726-3

Download citation

Received: 13 January 2021
Revised: 25 December 2021
Accepted: 22 April 2022
Published: 30 May 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11081-022-09726-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved feature selection with simulation optimization

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

A review of predictive uncertainty estimation with machine learning

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved feature selection with simulation optimization

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

A review of predictive uncertainty estimation with machine learning

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation