Data Selection and Machine Learning Algorithm Application Under the Background of Big Data

Qiu, Jingyi

doi:10.1007/978-3-030-89508-2_13

Jingyi Qiu⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 97))

Included in the following conference series:

International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy

1409 Accesses

Abstract

At present, machine learning, as an important tool in data mining, is not only the exploration of human cognitive learning process, but also the analysis and processing of data. Facing the challenge of large amounts of data, part of the current research focuses on the improvement and development of machine learning algorithms, and another part of the researchers is devoted to the selection of sample data and the reduction of data sets. These two aspects of research work are parallel. Training sample data selection is a research hotspot in machine learning. Through effective selection of sample data, more informative samples are extracted, redundant samples and noise data are eliminated, so as to improve the quality of training samples and obtain better learning performance. This article aims to study data selection and the application of machine learning algorithms in the context of big data. Based on the analysis of machine learning implementation methods, the construction process of random forests, and random group sampling integration algorithms, the application of random group sampling methods is used to accurately select bases. Compared with the previous algorithms, the RPSE algorithm greatly improves the calculation speed of the data in the classifier and training samples, and ensures that the base classifier performs random calculations on the samples during training. According to the integrated gap spacing, a support vector machine training data can be selected, and the selected data set that needs to be filtered is used as a classifier for the support vector machine for training, so as to obtain the final classification. The experimental results show that compared with the more common traditional data selection algorithms, the RPSE algorithm greatly accelerates the accuracy and speed of data selection, and reduces the accuracy and precision of the support vector computer classification under the necessary conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kohli, M., Prevedello, L.M., Filice, R.W., Raymond Geis, J.: Implementing machine learning in radiology practice and research. Am. J. Roentgenol. 208(4), 754–760 (2017). https://doi.org/10.2214/AJR.16.17224
Article Google Scholar
Helma, C., Cramer, T., Kramer, S., et al.: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. 35(4), 1402–1411 (2018)
Google Scholar
Buczak, A., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials 18(2), 1153–1176 (2017)
Article Google Scholar
Jean, N., Burke, M., Xie, M., et al.: Combining satellite imagery and machine learning to predict poverty. Science 353(6301), 790–794 (2016)
Article Google Scholar
Sidiropoulos, N.D., De Lathauwer, L., Xiao, F., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Sig. Process. 65(13), 3551–3582 (2017). https://doi.org/10.1109/TSP.2017.2690524
Article MathSciNet MATH Google Scholar
Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31(2), 87–106 (2017)
Article Google Scholar
Tomislav, H., Jorge, M., Heuvelink, G., et al.: SoilGrids250m: global gridded soil information based on machine learning. PLoS ONE 12(2), e0169748 (2017)
Article Google Scholar
Byrd, R.H., Chin, G.M., Neveitt, W., et al.: On the use of stochastic hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2016)
Article MathSciNet Google Scholar
Singh, A., Ganapathysubramanian, B., Singh, A.K., et al.: Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 21(2), 110–124 (2016)
Article Google Scholar
Ying, S., Babu, P., Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Sig. Process. 65(3), 794–816 (2016)
MathSciNet MATH Google Scholar
Shanks, D.R.: Regressive research: the pitfalls of post hoc data selection in the study of unconscious mental processes. Psychon. Bull. Rev. 24(3), 752–775 (2016). https://doi.org/10.3758/s13423-016-1170-y
Article Google Scholar
Chen, H., Guo, B., Yu, Z., et al.: A generic framework for constraint-driven data selection in mobile crowd photographing. IEEE Internet Things J. 4(1), 284–296 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tongda College of Nanjing University of Post and Telecommunications, Yangzhou, 225002, Jiangsu, China
Jingyi Qiu

Authors

Jingyi Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Sunderland, Sunderland, UK
John Macintyre
University of Shanghai for Science and Technology, Shanghai, China
Jinghua Zhao
Shenzhen University, Shenzen, China
Xiaomeng Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiu, J. (2022). Data Selection and Machine Learning Algorithm Application Under the Background of Big Data. In: Macintyre, J., Zhao, J., Ma, X. (eds) The 2021 International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy. SPIoT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 97. Springer, Cham. https://doi.org/10.1007/978-3-030-89508-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-89508-2_13
Published: 28 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89507-5
Online ISBN: 978-3-030-89508-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics