Font Size: a A A

Random Forest Algorithm Research And Application Based On BLB Method

Posted on:2018-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2348330518997609Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
With the ingenious combination of stochastic subspace identification and bagging algorithm, the random forest, a classification algorithm by means of classifier ensemble, is based on decision tree and widely used in many spheres of social life to solve various classification problems.Although random forest was broadly studied by many scholars and remarkable achievements have been gained, it still has some limitations and deficiencies, thus needing further research in its theories and application.The paper mainly studies on a new random forest algorithm based on Bag of Little Bootstraps (BLB) and then it is applied to text classification. There are five parts in the paper.Chapter one mainly introduces the research background, meaning as well as the status quo of the research at home and abroad and then puts forward the major research tasks in the paper.Chapter two, serving as the preliminary knowledge, gives a brief introduction of the basic concept and relevant classification methods of decision tree and random forest and also the methods of text classification.In Chapter three, in light of the studies and analysis on the limitations of the current random forest classification algorithm, the BLB-based random forest algorithm is raised,in which the debut of the BLB's application in the generation of random forest solves the original problem of inadequate operational efficiency,making it more appropriate in the classification of the large data sets. Moreover, the weighting scheme of decision tree is further improved in order to prevent the approximate draw phenomenon in the random forest algorithm. In addition, the corresponding algorithm is given by applying the algorithm in the paper to text classification to establish the text-classification model in random forest based on BLB.In chapter four, the numerical experimentation of text classification and the comparison with the numerical results of the original algorithm have shown the enhancement to some degree of the calculating efficiency and classifying accuracy by using the algorithm introduced in this paper. And the approximate draw phenomenon appearing in the original algorithm is settled in the new one. In the contrast experiment comparing the improved random forest with two broadly-used text classification algorithm --- Rocchio text classification algorithm and neutral-network text classification algorithm, the result has shown higher capability in classification of the improved random forest algorithm than the other two ones,which still maintains in the high-dime ns ion text.As the conclusion and prospect part, Chapter five further summarizes the work done in the paper and put forward the problems of research in the paper which still need to be resolved.
Keywords/Search Tags:Decision Tree, Random Forest, Bag of Little Bootstraps, Text Classify, Comprehensive Vote
PDF Full Text Request
Related items