Font Size: a A A

The Research On Random Forest Based On IV Feature Selection

Posted on:2011-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q J YangFull Text:PDF
GTID:2178360308473006Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, an explosive amount of data is brought out in the fields like banking, financial services, e-commerce, bioinformatics and network security. These practical data that mining tasks face are often of high-dimension, redundant features, as well as noises, which may lead to lower precision and cost more time, especially in classification modeling, since high quality data are preferred. Thus, it will be helpful to use those predictive features for improving the performancesIn this thesis, researches are carried out on feature selection and classification as below:(1)According to the challenges that data mining faces, a possible way is to reduce huge data size effectively such as feature selection. We summarize most classical methods of feature selection, and point out their characteristics as well as weak points based on the analysis.(2)Due to the defects of traditional models that we mentioned, the feasibility and the difficulty of using WoE and IV as a feature selection methods are analyzed. Under the analysis, a feature selection model FS-IV is proposed based on the IV index. Experiments show that the model performs with a shortened time and has some noise immunity.(3)For the problems that feature selection brings, such as the notable cut on data and the gathering of superior features, a suitable classification model IV-RF is proposed. Experiments show that the model has a satisfied time cost with little loss of accuracy compare to C4.5, na?ve bayes as well as the ones through FS-IV reduction.(4)We carry out improved FS-IV and IV-RF model on high -dimensional, streaming data and other practical issues with favorable results.
Keywords/Search Tags:Feature Selection, Information Value, Random Forest, Data mining
PDF Full Text Request
Related items