ABSTRACT
Representation (feature) space is an environment where data points are vectorized, distances are computed, patterns are characterized, and geometric structures are embedded. Extracting a good representation space is critical to address the curse of dimensionality, improve model generalization, overcome data sparsity, and increase the availability of classic models. Existing literature, such as feature engineering and representation learning, is limited in achieving full automation (e.g., over heavy reliance on intensive labor and empirical experiences), explainable explicitness (e.g., traceable reconstruction process and explainable new features), and flexible optimal (e.g., optimal feature space reconstruction is not embedded into downstream tasks). Can we simultaneously address the automation, explicitness, and optimal challenges in representation space reconstruction for a machine learning task? To answer this question, we propose a group-wise reinforcement generation perspective. We reformulate representation space reconstruction into an interactive process of nested feature generation and selection, where feature generation is to generate new meaningful and explicit features, and feature selection is to eliminate redundant features to control feature sizes. We develop a cascading reinforcement learning method that leverages three cascading Markov Decision Processes to learn optimal generation policies to automate the selection of features and operations and the feature crossing. We design a group-wise generation strategy to cross a feature group, an operation, and another feature group to generate new features and find the strategy that can enhance exploration efficiency and augment reward signals of cascading agents. Finally, we present extensive experiments to demonstrate the effectiveness, efficiency, traceability, and explicitness of our system.
Supplemental Material
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828.Google ScholarDigital Library
- DavidMBlei, AndrewY Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022.Google Scholar
- Emmanuel J Candès, Xiaodong Li, Yi Ma, and JohnWright. 2011. Robust principal component analysis? Journal of the ACM (JACM) 58, 3 (2011), 1--37.Google ScholarDigital Library
- Xiangning Chen, Qingwei Lin, Chuan Luo, Xudong Li, Hongyu Zhang, Yong Xu, Yingnong Dang, Kaixin Sui, Xu Zhang, Bo Qiao, et al. 2019. Neural feature search: A neural architecture for automated feature engineering. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 71--80.Google ScholarCross Ref
- Yi-Wei Chen, Qingquan Song, and Xia Hu. 2021. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 22, 2 (2021), 35--50.Google ScholarDigital Library
- Lin Chih-Jen. 2022. LibSVM Dataset Download. [EB/OL]. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.Google Scholar
- George Forman et al. 2003. An extensive empirical study of feature selection metrics for text classi?cation. J. Mach. Learn. Res. 3, Mar (2003), 1289--1305.Google Scholar
- Nicolo Fusi, Rishit Sheth, and Melih Elibol. 2018. Probabilistic matrix factorization for automated machine learning. Advances in neural information processing systems 31 (2018), 3348--3357.Google Scholar
- Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78--94.Google ScholarCross Ref
- Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google Scholar
- I. Guyon and A. Elissee?. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3 (2003), 1157--1182.Google ScholarDigital Library
- Trevor Hastie, Robert Tibshirani, and Martin Wainwright. 2019. Statistical learning with sparsity: the lasso and generalizations. Chapman and Hall/CRC.Google Scholar
- Franziska Horn, Robert Pack, and Michael Rieger. 2019. The autofeat python library for automated feature engineering and selection. arXiv preprint arXiv:1901.07329 (2019).Google Scholar
- Jeremy Howard. 2022. Kaggle Dataset Download. [EB/OL]. https://www.kaggle.com/datasets.Google Scholar
- Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
- Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1--2 (1997), 273--324.Google Scholar
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50, 6 (2017), 1--45.Google ScholarDigital Library
- Kunpeng Liu, Pengfei Wang, Dongjie Wang, Wan Du, Dapeng Oliver Wu, and Yanjie Fu. 2021. Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 399--408.Google ScholarCross Ref
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
- Public. 2022. Openml Dataset Download. [EB/OL]. https://www.openml.org.Google Scholar
- Public. 2022. UCI Dataset Download. [EB/OL]. https://archive.ics.uci.edu/.Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
- V Sugumaran, V Muralidharan, and KI Ramachandran. 2007. Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing 21, 2 (2007), 930--942.Google Scholar
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarDigital Library
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.Google Scholar
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.Google ScholarCross Ref
- Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.Google ScholarCross Ref
- Dongjie Wang, Kunpeng Liu, David Mohaisen, Pengyang Wang, Chang-Tien Lu, and Yanjie Fu. 2021. Automated Feature-Topic Pairing: Aligning Semantic and Embedding Spaces in Spatial Representation Learning. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems. 450--453.Google ScholarDigital Library
- Dongjie Wang, Pengyang Wang, Yanjie Fu, Kunpeng Liu, Hui Xiong, and Charles E Hughes. 2022. Reinforced Imitative Graph Learning for Mobile User Profiling. arXiv preprint arXiv:2203.06550 (2022).Google Scholar
- DongjieWang, PengyangWang, Kunpeng Liu, Yuanchun Zhou, Charles E Hughes, and Yanjie Fu. 2021. Reinforced Imitative Graph Representation Learning for Mobile User Profiling: An Adversarial Training Perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4410--4417.Google Scholar
- Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, and Xing Xie. 2022. Multi-level Recommendation Reasoning over Knowledge Graphs with Reinforcement Learning. In Proceedings of the ACM Web Conference 2022. 2098--2108.Google ScholarDigital Library
- Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03). 856--863.Google ScholarDigital Library
Index Terms
- Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction
Recommendations
Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningFeature selection is the preprocessing step in machine learning which tries to select the most relevant features for the subsequent prediction task. Effective feature selection could help reduce dimensionality, improve prediction accuracy and increase ...
Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective
Feature transformation aims to reconstruct an effective representation space by mathematically refining the existing features. It serves as a pivotal approach to combat the curse of dimensionality, enhance model generalization, mitigate data sparsity, and ...
Evolutionary Automated Feature Engineering
PRICAI 2022: Trends in Artificial IntelligenceAbstractEffective feature engineering serves as a prerequisite for many machine learning tasks. Feature engineering, which usually uses a series of mathematical functions to transform the features, aims to find valuable new features that can reflect the ...
Comments