research-article

Public Access

Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction

Authors:
Dongjie Wang

University of Central Florida, Orlando, FL, USA

University of Central Florida, Orlando, FL, USA
View Profile

,
Yanjie Fu

University of Central Florida, Orlando, FL, USA

University of Central Florida, Orlando, FL, USA
View Profile

,
Kunpeng Liu

Portland State University, Portland, OR, USA

Portland State University, Portland, OR, USA
View Profile

,
Xiaolin Li

Nanjing Univerisity, Nanjing, China

Nanjing Univerisity, Nanjing, China
View Profile

,
Yan Solihin

University of Central Florida, Orlando , FL, USA

University of Central Florida, Orlando , FL, USA
View Profile

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2022Pages 1826–1834https://doi.org/10.1145/3534678.3539278

Published:14 August 2022Publication History

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 1826–1834

ABSTRACT

Representation (feature) space is an environment where data points are vectorized, distances are computed, patterns are characterized, and geometric structures are embedded. Extracting a good representation space is critical to address the curse of dimensionality, improve model generalization, overcome data sparsity, and increase the availability of classic models. Existing literature, such as feature engineering and representation learning, is limited in achieving full automation (e.g., over heavy reliance on intensive labor and empirical experiences), explainable explicitness (e.g., traceable reconstruction process and explainable new features), and flexible optimal (e.g., optimal feature space reconstruction is not embedded into downstream tasks). Can we simultaneously address the automation, explicitness, and optimal challenges in representation space reconstruction for a machine learning task? To answer this question, we propose a group-wise reinforcement generation perspective. We reformulate representation space reconstruction into an interactive process of nested feature generation and selection, where feature generation is to generate new meaningful and explicit features, and feature selection is to eliminate redundant features to control feature sizes. We develop a cascading reinforcement learning method that leverages three cascading Markov Decision Processes to learn optimal generation policies to automate the selection of features and operations and the feature crossing. We design a group-wise generation strategy to cross a feature group, an operation, and another feature group to generate new features and find the strategy that can enhance exploration efficiency and augment reward signals of cascading agents. Finally, we present extensive experiments to demonstrate the effectiveness, efficiency, traceability, and explicitness of our system.

Supplemental Material

KDD22-rtfp0610.mp4

mp4

32 MB

Download

References

Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828.Google ScholarDigital Library
DavidMBlei, AndrewY Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022.Google Scholar
Emmanuel J Candès, Xiaodong Li, Yi Ma, and JohnWright. 2011. Robust principal component analysis? Journal of the ACM (JACM) 58, 3 (2011), 1--37.Google ScholarDigital Library
Xiangning Chen, Qingwei Lin, Chuan Luo, Xudong Li, Hongyu Zhang, Yong Xu, Yingnong Dang, Kaixin Sui, Xu Zhang, Bo Qiao, et al. 2019. Neural feature search: A neural architecture for automated feature engineering. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 71--80.Google ScholarCross Ref
Yi-Wei Chen, Qingquan Song, and Xia Hu. 2021. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 22, 2 (2021), 35--50.Google ScholarDigital Library
Lin Chih-Jen. 2022. LibSVM Dataset Download. [EB/OL]. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.Google Scholar
George Forman et al. 2003. An extensive empirical study of feature selection metrics for text classi?cation. J. Mach. Learn. Res. 3, Mar (2003), 1289--1305.Google Scholar
Nicolo Fusi, Rishit Sheth, and Melih Elibol. 2018. Probabilistic matrix factorization for automated machine learning. Advances in neural information processing systems 31 (2018), 3348--3357.Google Scholar
Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78--94.Google ScholarCross Ref
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google Scholar
I. Guyon and A. Elissee?. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3 (2003), 1157--1182.Google ScholarDigital Library
Trevor Hastie, Robert Tibshirani, and Martin Wainwright. 2019. Statistical learning with sparsity: the lasso and generalizations. Chapman and Hall/CRC.Google Scholar
Franziska Horn, Robert Pack, and Michael Rieger. 2019. The autofeat python library for automated feature engineering and selection. arXiv preprint arXiv:1901.07329 (2019).Google Scholar
Jeremy Howard. 2022. Kaggle Dataset Download. [EB/OL]. https://www.kaggle.com/datasets.Google Scholar
Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1--2 (1997), 273--324.Google Scholar
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50, 6 (2017), 1--45.Google ScholarDigital Library
Kunpeng Liu, Pengfei Wang, Dongjie Wang, Wan Du, Dapeng Oliver Wu, and Yanjie Fu. 2021. Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 399--408.Google ScholarCross Ref
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
Public. 2022. Openml Dataset Download. [EB/OL]. https://www.openml.org.Google Scholar
Public. 2022. UCI Dataset Download. [EB/OL]. https://archive.ics.uci.edu/.Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
V Sugumaran, V Muralidharan, and KI Ramachandran. 2007. Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing 21, 2 (2007), 930--942.Google Scholar
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarDigital Library
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.Google Scholar
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.Google ScholarCross Ref
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.Google ScholarCross Ref
Dongjie Wang, Kunpeng Liu, David Mohaisen, Pengyang Wang, Chang-Tien Lu, and Yanjie Fu. 2021. Automated Feature-Topic Pairing: Aligning Semantic and Embedding Spaces in Spatial Representation Learning. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems. 450--453.Google ScholarDigital Library
Dongjie Wang, Pengyang Wang, Yanjie Fu, Kunpeng Liu, Hui Xiong, and Charles E Hughes. 2022. Reinforced Imitative Graph Learning for Mobile User Profiling. arXiv preprint arXiv:2203.06550 (2022).Google Scholar
DongjieWang, PengyangWang, Kunpeng Liu, Yuanchun Zhou, Charles E Hughes, and Yanjie Fu. 2021. Reinforced Imitative Graph Representation Learning for Mobile User Profiling: An Adversarial Training Perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4410--4417.Google Scholar
Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, and Xing Xie. 2022. Multi-level Recommendation Reasoning over Knowledge Graphs with Reinforcement Learning. In Proceedings of the ACM Web Conference 2022. 2098--2108.Google ScholarDigital Library
Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03). 856--863.Google ScholarDigital Library

Index Terms

Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Multi-agent reinforcement learning
      2. Supervised learning
    2. Machine learning algorithms
      1. Feature selection

Recommendations

Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Feature selection is the preprocessing step in machine learning which tries to select the most relevant features for the subsequent prediction task. Effective feature selection could help reduce dimensionality, improve prediction accuracy and increase ...
Read More
Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective
Feature transformation aims to reconstruct an effective representation space by mathematically refining the existing features. It serves as a pivotal approach to combat the curse of dimensionality, enhance model generalization, mitigate data sparsity, and ...
Read More
Evolutionary Automated Feature Engineering
PRICAI 2022: Trends in Artificial Intelligence
Abstract
Effective feature engineering serves as a prerequisite for many machine learning tasks. Feature engineering, which usually uses a series of mathematical functions to transform the features, aims to find valuable new features that can reflect the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automated feature engineering
multi-agent reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 318
  Total Downloads
- Downloads (Last 12 months)201
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning

Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective

Evolutionary Automated Feature Engineering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning

Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective

Evolutionary Automated Feature Engineering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media