Feature Selection and Optimization of Random Forest Modeling

Article Preview

Abstract:

Traditional random forest algorithm is difficult to achieve very good effect for the classification of small sample data set. Because in the process of repeated random selection, selection sample is little, resulting in trees with very small degree of difference, which floods right decisions, makes bigger generalization error of the model, and the predict rate is reduced. For the sample size of sepsis cases data, this paper adopts for parameters used in random forest modeling interval division choice; divide feature interval into high correlation and uncertain correlation intervals; select data from two intervals respectively for modeling. Eventually reduce model generalization error, and improve accuracy of prediction.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1416-1419

Citation:

Online since:

November 2014

Export:

Price:

* - Corresponding Author

[1] J Martin Bland Douglas G Altman, Xu Weiwei(eds), The analysis of small sample continuity data, British Medical Journal, Chinese Version (BMJ) 1, (2010).

Google Scholar

[2] Breiman Randomforests. Mach. Learn., 2001a. 5 -32.

Google Scholar

[3] FT Liu, KM Ting, Variable randomness in decision tree ensembles[J]. Advances in Knowledge Discovery and Data Mining, 2006: 81-90.

DOI: 10.1007/11731139_12

Google Scholar

[4] Efron, B., Bootstrap methods: another look at the jackknife[J]. The annals of statistics, 1979, 1-26.

DOI: 10.1214/aos/1176344552

Google Scholar

[5] YongKai. Random forest feature selection and model optimization algorithm research, Harbin Industrial University, 2008 Master's Thesis.

Google Scholar

[6] Song Yongkang, Shu Xiao, Wang Bingjie. Geological prediction model selection based on cross test. Petrochemical Application. 2013, 12.

Google Scholar