Fast Feature Selection using Partial Correlation for Multi-valued Attributes

Lallich, S.; Rakotomalala, R.

doi:10.1007/3-540-45372-5_22

S. Lallich⁴ &
R. Rakotomalala⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1910))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2808 Accesses
8 Citations

Abstract

We propose a fast feature selection method in supervised learning for multi-valued attributes. The main idea is to rewrite the multi-valued problem in the space of examples into a boolean problem in the space of pairwise examples. On basis of this approach, we can use point correlation coefficient which is null in the case of conditional independence, and verifies a formula connecting partial coefficients with marginal coeffcients. This property allows to reduce considerably the computing times because a single pass over the database is necessary to compute all coeffcients. We test our algorithm on benchmark databases.

Download to read the full chapter text

Chapter PDF

Attribute Selection Based on Correlation Analysis

A Multi-attribute Classification Method to Solve the Problem of Dimensionality

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Keywords

References

D. Aha. Tolerating noisy, irrelelvant and novel attributes in instance-based algorithms. International Journal of Man-Machine Studies, 36:267–287, 1992.
Google Scholar
R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4):537–550, 1994.
Article Google Scholar
S.D. Bay. The uci kdd archive [http://kdd.ics.uci.edu]. Irvine,CA: University of California, Department of Computer Science, 1999.
Google Scholar
A. Blum and P. Langley. Selection of relevant feature and examples in machine learning. Artificial Intelligence, pages 245–271, 1997.
Google Scholar
J.J. Daudin. Analyse factorielle des dependances partielles. Revue de Statistique Appliquee, 29(2):15–29, 1981.
MATH MathSciNet Google Scholar
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous attributes. In Morgan Kaufmann, editor, Machine Learning: Proceedings of the 12th International Conference (ICML-95), pages 194–202, 1995.
Google Scholar
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Knowledge discovey and data mining: Towards an unifying framework. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996.
Google Scholar
Eibe Frank and Ian H. Witten. Making better use of global discretization. In Proc.16th International Conf. on Machine Learning, pages 115–123. Morgan Kaufmann, San Francisco, CA, 1999.
Google Scholar
L.A. Goodman and W.H. Kruskall. Measures of association for cross classifcations.Journal of American Statistical Association, 49:732–764, 1954.
MATH Google Scholar
R.C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–91, 1993.
Article MATH Google Scholar
G. John and P. Langley. Static versus dynamic sampling for data mining. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining. AAAI/MIT Press, 1996.
Google Scholar
R. Kohavi and G. John. Wrappers for feature subset selection. Journal of Artificial Intelligence, Special issue on Relevance, 1997.
Google Scholar
S. Lallich and R. Rakotomalala. Les entropies de rangs généralisés en induction par arbres. In Proceedings of 7 ^mes Journées de la Société Francophone de Classification-SFC’99, pages 101–107, September 1999.
Google Scholar
K-C. Lee. A technique of dynamic feature selection using the feature group mutual information. In Proceedings of the Third PAKDD-99, pages 138–142, 1999.
Google Scholar
I.C. Lerman. Correlation partielle dans le cas qualitatif. Technical Report 111, INRIA, 1982.
Google Scholar
H. Liu and H. Motoda. Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, 1998.
MATH Google Scholar
H. Liu and H. Motoda. Feature selection for knowledge discovery and data mining, volume 454 of The kluwer international series in engineering and computer science. Kluwer, 1998.
Google Scholar
F. Marcotorchino. Utilisation des comparaisons par paires en statistique des contingences-partie iii. Technical Report F 081, Centre Scientifique IBM-France, 1985.
Google Scholar
D. Michaud. Filtrage et Selection D’attributs En Apprentissage. PhD thesis, Universite de Franche-Comte, 1999.
Google Scholar
M. Olszak and G. Ritschard. The behaviour of nominal and ordinal partial association measures. The statistician, 44(2):195–212, 1995.
Article Google Scholar
R. Rakotomalala and S. Lallich. Sélection rapide de variables booléennes en apprentissage supervisé. In Proceedings of 2nd Conférence Apprentissage-CAP’2000, pages 225–234, 2000.
Google Scholar
R. Rakotomalala, S. Lallich, and S. Di Palma. Studying the behavior of generalized entropy in induction trees using a m-of-n concept. In Proceedings of the Third European Conference PKDD’99, pages 510–517, 1999.
Google Scholar
G. Saporta. Liaisons entre plusieurs ensembles de variables et codage de données qualitatives. PhD thesis, 1975.
Google Scholar
G. Saporta. Quelques applications des operateurs d’Escouffier au traitement des variables qualitatives. Statistique et Analyse de Donnees, (1):38–46, 1976.
Google Scholar
D.A. Zighed, S. Rabaseda, and R. Rakotomalala. Fusinter: a method for discretization of continuous attributes for supervised learning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(33):307–326, 1998.
Article MATH Google Scholar
-D.A. Zighed and R. Rakotomalala. Graphes d’Induction-Apprentissage et Data Mining. Hermes, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

ERIC Laboratory, University of Lyon 2, 5, av Pierre Mendes-France, F-69676, Bron, FRANCE
S. Lallich & R. Rakotomalala

Authors

S. Lallich
View author publications
You can also search for this author in PubMed Google Scholar
R. Rakotomalala
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, O.S. Bragstads plass 2E, 7491, Trondheim, Norway
Jan Komorowski
Department of Computer Science, University of North Carolina, Charlotte, NC 28223, USA
Jan Żytkow
Laboratoire ERIC, Université Lyon 2, 5 avenue Pierre Mendès-France, 69676, Bron, France
Djamel A. Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lallich, S., Rakotomalala, R. (2000). Fast Feature Selection using Partial Correlation for Multi-valued Attributes. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_22

Download citation

DOI: https://doi.org/10.1007/3-540-45372-5_22
Published: 18 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Fast Feature Selection using Partial Correlation for Multi-valued Attributes

Abstract

Chapter PDF

Similar content being viewed by others

Attribute Selection Based on Correlation Analysis

A Multi-attribute Classification Method to Solve the Problem of Dimensionality

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast Feature Selection using Partial Correlation for Multi-valued Attributes

Abstract

Chapter PDF

Similar content being viewed by others

Attribute Selection Based on Correlation Analysis

A Multi-attribute Classification Method to Solve the Problem of Dimensionality

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation