Font Size: a A A

Random Forest Based On Attributes Combination

Posted on:2012-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:L L SunFull Text:PDF
GTID:2178330338495364Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the field classification problems of machine learning, when we use a number of attributes to determine a sample's classification, the classifier which is established by the traditional algorithm consuming more computer time and space. In recent years, researchers studied combined properties of the data sets, and a variety of new ideas about using combined properties to make classifier were proposed, but only one classifier which is structured by the combination of properties is mostly built in a data set. Because of the different combination attributes for the selection criteria, each algorithm forms different classifiers, and each construction method has its own advantage and disadvantage. Meanwhile, the idea about ensemble classifier has new researchs. Based on the work of scholars, we sum up the ideas and make a little innovation, and we make the following work.This paper first proposed the purpose and significance of the combination of attributes to construct classification. Based on some data sets which have too many characteristics attributes, in this paper we firstly make the sample points which are similar to the same heap using the method of clustering in the data set, so we can select combination properties for each bunch of similar examples, build classifiers which are used for different characteristic samples, and explain the advantages of this method. Secondly, based on the work of scholars in the recent years, we describe the multi-variable decision tree's theory and methods. With fewer and targeted clustering heap, we randomly select properties to generate multi-variable decision tree. Finally, according to the construction of multi-variable decision trees which are built by every clustering, we make the random forest and make the weighted integration, thus try to include all conceptions, and furtherly we can ensure the classification accuracy. Through the above work, we can handle the data sets that have large numbers of data and condition attributes, and we can build the multi-variable decision trees using fewer properties to complete the integration. Experiments show that this method can effectively reduce the size of decision trees, and effectively reduce the time cost, at the same time ensure a certain degree of classification accuracy.
Keywords/Search Tags:Combination attribute, Decision tree, Multi-variable decision tree, Random forest
PDF Full Text Request
Related items