Font Size: a A A

Random Forest And Its Application In Chromatographic Fingerprints

Posted on:2010-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:L SunFull Text:PDF
GTID:2178360302960581Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Random forest is an ensemble classification methods developed by Leo Breiman in 2001. It is a classifier consisting of a collection of CART, and introduces randomly selecting features on the basis of Bagging. It has a good performance and high stability on classification. Since it was proposed, random forest has become a well-known data analysis method, and it has been applied to a wide variety of scientific areas. This paper focuses on the random forests classification of chromatographic fingerprints.On the basis of the characteristics of chromatographic fingerprints, this paper designs two kinds of random forests classification models—the random forests model based on independent attributes and the random forests model based on combinational attributes. The former is applicable to the classification for the chromatographic fingerprints collected in the same period. The latter is applicable to the classification for the chromatographic fingerprints collected in the different periods. The chromatographic fingerprints collected in the different periods have the problems of retention time-shifting, the reduction of the peaks' resolution and so on. By proposing a kind of combinational attributes which are merged peaks in a range of time on the base of the basic random forests algorithm, it reduces the impact of the problem above on classifiers. During establishing the classifiers, the combinational attributes and the independent attributes in the source data are both considered. The data pre-processing function in the two models include chromatographic peak matching, data normalization and attribute filters. The most important part is chromatographic peak matching. After learning the advantages of advanced peak matching algorithms, this paper proposes a chromatographic peak matching algorithm based on the idea of subsection matching.This paper uses "Furong" series tobacco chromatographic fingerprint as examples, to construct random forest tobacco classification models for experiments. After experiments, we optimize parameter, discuss the node split strategy and data pre-processing methods. Experimental results show that the most accuracies of the final established random forests tobacco classification models are more than 90%. They perform better than other tobacco classification models which are based on Support Vector Machines, Naive Bayes, PLS-DA and Bagging respectively.
Keywords/Search Tags:Random Forest, Chromatographic Fingerprints, Decision Tree, Combinational Attributes
PDF Full Text Request
Related items