Skip to main content

Data Selection and Machine Learning Algorithm Application Under the Background of Big Data

  • Conference paper
  • First Online:
The 2021 International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy (SPIoT 2021)

Abstract

At present, machine learning, as an important tool in data mining, is not only the exploration of human cognitive learning process, but also the analysis and processing of data. Facing the challenge of large amounts of data, part of the current research focuses on the improvement and development of machine learning algorithms, and another part of the researchers is devoted to the selection of sample data and the reduction of data sets. These two aspects of research work are parallel. Training sample data selection is a research hotspot in machine learning. Through effective selection of sample data, more informative samples are extracted, redundant samples and noise data are eliminated, so as to improve the quality of training samples and obtain better learning performance. This article aims to study data selection and the application of machine learning algorithms in the context of big data. Based on the analysis of machine learning implementation methods, the construction process of random forests, and random group sampling integration algorithms, the application of random group sampling methods is used to accurately select bases. Compared with the previous algorithms, the RPSE algorithm greatly improves the calculation speed of the data in the classifier and training samples, and ensures that the base classifier performs random calculations on the samples during training. According to the integrated gap spacing, a support vector machine training data can be selected, and the selected data set that needs to be filtered is used as a classifier for the support vector machine for training, so as to obtain the final classification. The experimental results show that compared with the more common traditional data selection algorithms, the RPSE algorithm greatly accelerates the accuracy and speed of data selection, and reduces the accuracy and precision of the support vector computer classification under the necessary conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kohli, M., Prevedello, L.M., Filice, R.W., Raymond Geis, J.: Implementing machine learning in radiology practice and research. Am. J. Roentgenol. 208(4), 754–760 (2017). https://doi.org/10.2214/AJR.16.17224

    Article  Google Scholar 

  2. Helma, C., Cramer, T., Kramer, S., et al.: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. 35(4), 1402–1411 (2018)

    Google Scholar 

  3. Buczak, A., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials 18(2), 1153–1176 (2017)

    Article  Google Scholar 

  4. Jean, N., Burke, M., Xie, M., et al.: Combining satellite imagery and machine learning to predict poverty. Science 353(6301), 790–794 (2016)

    Article  Google Scholar 

  5. Sidiropoulos, N.D., De Lathauwer, L., Xiao, F., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Sig. Process. 65(13), 3551–3582 (2017). https://doi.org/10.1109/TSP.2017.2690524

    Article  MathSciNet  MATH  Google Scholar 

  6. Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31(2), 87–106 (2017)

    Article  Google Scholar 

  7. Tomislav, H., Jorge, M., Heuvelink, G., et al.: SoilGrids250m: global gridded soil information based on machine learning. PLoS ONE 12(2), e0169748 (2017)

    Article  Google Scholar 

  8. Byrd, R.H., Chin, G.M., Neveitt, W., et al.: On the use of stochastic hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2016)

    Article  MathSciNet  Google Scholar 

  9. Singh, A., Ganapathysubramanian, B., Singh, A.K., et al.: Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 21(2), 110–124 (2016)

    Article  Google Scholar 

  10. Ying, S., Babu, P., Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Sig. Process. 65(3), 794–816 (2016)

    MathSciNet  MATH  Google Scholar 

  11. Shanks, D.R.: Regressive research: the pitfalls of post hoc data selection in the study of unconscious mental processes. Psychon. Bull. Rev. 24(3), 752–775 (2016). https://doi.org/10.3758/s13423-016-1170-y

    Article  Google Scholar 

  12. Chen, H., Guo, B., Yu, Z., et al.: A generic framework for constraint-driven data selection in mobile crowd photographing. IEEE Internet Things J. 4(1), 284–296 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, J. (2022). Data Selection and Machine Learning Algorithm Application Under the Background of Big Data. In: Macintyre, J., Zhao, J., Ma, X. (eds) The 2021 International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy. SPIoT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 97. Springer, Cham. https://doi.org/10.1007/978-3-030-89508-2_13

Download citation

Publish with us

Policies and ethics