Skip to main content

A SVM Regression Based Approach to Filling in Missing Values

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3683))

Abstract

In KDD procedure, to fill in missing data typically requires a very large investment of time and energy – often 80% to 90% of a data analysis project is spent in making the data reliable enough so that the results can be trustful. In this paper, we propose a SVM regression based algorithm for filling in missing data, i.e. set the decision attribute (output attribute) as the condition attribute (input attribute) and the condition attribute as the decision attribute, then use SVM regression to predict the condition attribute values. SARS data set experimental results show that SVM regression method has the highest precision. The method with which the value of the example that has the minimum distance to the example with missing value will be taken to fill in the missing values takes the second place, and the mean and median methods have lower precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Thomas, T.M., Plymat, K.R., Blannin, J., Meade, T.W.: Prevalence of Urinary Incontinence. Br. Med. J. 281, 1243–1245 (1980)

    Article  Google Scholar 

  2. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)

    MATH  Google Scholar 

  3. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall, London (1997)

    Book  MATH  Google Scholar 

  4. Hill, M.A.: SPSS Missing Value Analysis 7.5. SPSS Inc., Chicago (1997)

    Google Scholar 

  5. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, NY (1995)

    MATH  Google Scholar 

  6. Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001), Software is available for download at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Guanghui, Z., Huazhu, S., Hongxia, X., Luo, Z.: Comparison of Missing Data Estimation Methods in Satellite Information for Scientific Exploration. DCABES, 278–280 (2004)

    Google Scholar 

  8. Cartwright, M.H., Shepperd, M.J., Song, Q.: Dealing with Missing Software Project Data. In: 9th International Software Metrics Symposium, pp. 154–165 (2003)

    Google Scholar 

  9. Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Evaluating a Nearest-Neighbor Method to Substitute Continuous Missing Values. Lecture notes in computer science, pp. 723–734 (2003)

    Google Scholar 

  10. Liehr, T.: Data Preparation in Large Real-World data Mining Projects: Methods for Imputing Missing Values. In: Exploratory data analysis in empirical research, pp. 248–256 (2003)

    Google Scholar 

  11. Shen, J.-J., Chen, M.-T.: A Recycle Technique of Association Rule for Missing Value Completion. In: 17th International Conference on Advanced Information Networking and Applications, pp. 526–529 (2003)

    Google Scholar 

  12. Kandara, M., Kandara, O.: Association Rules to Recover the Missing Data Value for An Attribute in a Database. In: The 7th World Multiconference on Systemics, Cybernetics and Informatics, pp. 1–6 (2003)

    Google Scholar 

  13. Shigcyuki, O., Masa-aki, S., Ichiro, T., Morito, M., Ken-ichi, M., Shin, I.: Missing Value Estimation Using Mixture of PCAs. In: International Conference on Artificial Neural Networks, pp. 492–497 (2002)

    Google Scholar 

  14. Grzymala-Busse, J.W., Hu, M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining. In: 2nd International Conference on Rough Sets and Current Trends in Computing, pp. 378–385 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., Yumei, C. (2005). A SVM Regression Based Approach to Filling in Missing Values. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553939_83

Download citation

  • DOI: https://doi.org/10.1007/11553939_83

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28896-1

  • Online ISBN: 978-3-540-31990-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics