Abstract
At present, machine learning, as an important tool in data mining, is not only the exploration of human cognitive learning process, but also the analysis and processing of data. Facing the challenge of large amounts of data, part of the current research focuses on the improvement and development of machine learning algorithms, and another part of the researchers is devoted to the selection of sample data and the reduction of data sets. These two aspects of research work are parallel. Training sample data selection is a research hotspot in machine learning. Through effective selection of sample data, more informative samples are extracted, redundant samples and noise data are eliminated, so as to improve the quality of training samples and obtain better learning performance. This article aims to study data selection and the application of machine learning algorithms in the context of big data. Based on the analysis of machine learning implementation methods, the construction process of random forests, and random group sampling integration algorithms, the application of random group sampling methods is used to accurately select bases. Compared with the previous algorithms, the RPSE algorithm greatly improves the calculation speed of the data in the classifier and training samples, and ensures that the base classifier performs random calculations on the samples during training. According to the integrated gap spacing, a support vector machine training data can be selected, and the selected data set that needs to be filtered is used as a classifier for the support vector machine for training, so as to obtain the final classification. The experimental results show that compared with the more common traditional data selection algorithms, the RPSE algorithm greatly accelerates the accuracy and speed of data selection, and reduces the accuracy and precision of the support vector computer classification under the necessary conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kohli, M., Prevedello, L.M., Filice, R.W., Raymond Geis, J.: Implementing machine learning in radiology practice and research. Am. J. Roentgenol. 208(4), 754–760 (2017). https://doi.org/10.2214/AJR.16.17224
Helma, C., Cramer, T., Kramer, S., et al.: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. 35(4), 1402–1411 (2018)
Buczak, A., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials 18(2), 1153–1176 (2017)
Jean, N., Burke, M., Xie, M., et al.: Combining satellite imagery and machine learning to predict poverty. Science 353(6301), 790–794 (2016)
Sidiropoulos, N.D., De Lathauwer, L., Xiao, F., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Sig. Process. 65(13), 3551–3582 (2017). https://doi.org/10.1109/TSP.2017.2690524
Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31(2), 87–106 (2017)
Tomislav, H., Jorge, M., Heuvelink, G., et al.: SoilGrids250m: global gridded soil information based on machine learning. PLoS ONE 12(2), e0169748 (2017)
Byrd, R.H., Chin, G.M., Neveitt, W., et al.: On the use of stochastic hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2016)
Singh, A., Ganapathysubramanian, B., Singh, A.K., et al.: Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 21(2), 110–124 (2016)
Ying, S., Babu, P., Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Sig. Process. 65(3), 794–816 (2016)
Shanks, D.R.: Regressive research: the pitfalls of post hoc data selection in the study of unconscious mental processes. Psychon. Bull. Rev. 24(3), 752–775 (2016). https://doi.org/10.3758/s13423-016-1170-y
Chen, H., Guo, B., Yu, Z., et al.: A generic framework for constraint-driven data selection in mobile crowd photographing. IEEE Internet Things J. 4(1), 284–296 (2017)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Qiu, J. (2022). Data Selection and Machine Learning Algorithm Application Under the Background of Big Data. In: Macintyre, J., Zhao, J., Ma, X. (eds) The 2021 International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy. SPIoT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 97. Springer, Cham. https://doi.org/10.1007/978-3-030-89508-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-89508-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89507-5
Online ISBN: 978-3-030-89508-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)