A SVM Regression Based Approach to Filling in Missing Values

Honghai, Feng; Guoshun, Chen; Cheng, Yin; Bingru, Yang; Yumei, Chen

doi:10.1007/11553939_83

Feng Honghai^21,22,
Chen Guoshun²³,
Yin Cheng²⁴,
Yang Bingru²² &
…
Chen Yumei²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3683))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1271 Accesses
43 Citations

Abstract

In KDD procedure, to fill in missing data typically requires a very large investment of time and energy – often 80% to 90% of a data analysis project is spent in making the data reliable enough so that the results can be trustful. In this paper, we propose a SVM regression based algorithm for filling in missing data, i.e. set the decision attribute (output attribute) as the condition attribute (input attribute) and the condition attribute as the decision attribute, then use SVM regression to predict the condition attribute values. SARS data set experimental results show that SVM regression method has the highest precision. The method with which the value of the example that has the minimum distance to the example with missing value will be taken to fill in the missing values takes the second place, and the mean and median methods have lower precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Thomas, T.M., Plymat, K.R., Blannin, J., Meade, T.W.: Prevalence of Urinary Incontinence. Br. Med. J. 281, 1243–1245 (1980)
Article Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
MATH Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall, London (1997)
Book MATH Google Scholar
Hill, M.A.: SPSS Missing Value Analysis 7.5. SPSS Inc., Chicago (1997)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, NY (1995)
MATH Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001), Software is available for download at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Guanghui, Z., Huazhu, S., Hongxia, X., Luo, Z.: Comparison of Missing Data Estimation Methods in Satellite Information for Scientific Exploration. DCABES, 278–280 (2004)
Google Scholar
Cartwright, M.H., Shepperd, M.J., Song, Q.: Dealing with Missing Software Project Data. In: 9th International Software Metrics Symposium, pp. 154–165 (2003)
Google Scholar
Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Evaluating a Nearest-Neighbor Method to Substitute Continuous Missing Values. Lecture notes in computer science, pp. 723–734 (2003)
Google Scholar
Liehr, T.: Data Preparation in Large Real-World data Mining Projects: Methods for Imputing Missing Values. In: Exploratory data analysis in empirical research, pp. 248–256 (2003)
Google Scholar
Shen, J.-J., Chen, M.-T.: A Recycle Technique of Association Rule for Missing Value Completion. In: 17th International Conference on Advanced Information Networking and Applications, pp. 526–529 (2003)
Google Scholar
Kandara, M., Kandara, O.: Association Rules to Recover the Missing Data Value for An Attribute in a Database. In: The 7th World Multiconference on Systemics, Cybernetics and Informatics, pp. 1–6 (2003)
Google Scholar
Shigcyuki, O., Masa-aki, S., Ichiro, T., Morito, M., Ken-ichi, M., Shin, I.: Missing Value Estimation Using Mixture of PCAs. In: International Conference on Artificial Neural Networks, pp. 492–497 (2002)
Google Scholar
Grzymala-Busse, J.W., Hu, M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining. In: 2nd International Conference on Rough Sets and Current Trends in Computing, pp. 378–385 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Urban & Rural Construction School, Hebei Agricultural University, 071001, Baoding, China
Feng Honghai
Information Engineering School, University of Science and Technology Beijing, 100083, Beijing, China
Feng Honghai & Yang Bingru
Ordnance Technology Institute, Shijiazhuang, 050000, Shijiazhuang, China
Chen Guoshun
Modern Educational Center, Hebei Agricultural University, 071001, Baoding, China
Yin Cheng
Tian’e Chemical Fiber Company of Hebei Baoding, 071000, Baoding, China
Chen Yumei

Authors

Feng Honghai
View author publications
You can also search for this author in PubMed Google Scholar
Chen Guoshun
View author publications
You can also search for this author in PubMed Google Scholar
Yin Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Bingru
View author publications
You can also search for this author in PubMed Google Scholar
Chen Yumei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, La Trobe University, 3086, Melbourne, Victoria, Australia
Rajiv Khosla
Centre for SMART systems Engineering Research Centre, University of Brighton, BN2 4GJ, Moulsecoomb, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., Yumei, C. (2005). A SVM Regression Based Approach to Filling in Missing Values. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553939_83

Download citation

DOI: https://doi.org/10.1007/11553939_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28896-1
Online ISBN: 978-3-540-31990-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics