Font Size: a A A

A Novel Ensemble Classifier And Its Application In Protein Fold Recognition

Posted on:2008-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:X GuoFull Text:PDF
GTID:2120360218458095Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Protein structure determines its function , the proteins with similar fold have similar function. The number of protein three-dimensional structures is about 100,000 while the number of fold is less than one thous,. So, the research of fold recognition not only be of meaningful on biology, but also can simplify the research of the structure.The protein fold recognition methods can be classified into two kinds—the template-based approach , the taxonometric approach. Though the template-based approach is efficient when the similarity is high, both the sensitivity , the creditability descend when there is little similarity. The taxonometric approach which doesn't rely on the similarity can recognize fold efficiently. Nowadays, a number of ensemble classifiers have been proposed to recognize protein fold. The high efficiency of ensemble classifier depends on the efficiency of the basis classifier , the selection of the ensemble weight strategy. ET-KNN has been widely used in multi-class problem. Its efficiency depends on parameters in it. Though a number of methods can improve the performance of ET-KNN through optimizing its interior parameters, these parameters are not global optimum. As the weights of basic classifiers are determined independently, not as a whole, the ensemble classifier can't get the optimal result.In this paper, we present a novel ensemble classifier━GAOEC. First, we use genetic algorithm to generate the optimum parameters in ET-KNN , present an optimized classifier━GAET-KNN. Then, we use two levels GAET-KNN to construct the ensemble. The second level GAET-KNN determines the class of the query protein depending on the classes generated by the first level GAET-KNN. Finally, we present two ensemble weight strategies━the global optimal weight strategy , the selective average ensemble strategy. The global optimal weight strategy use GA to generate optimum weights for all component classifiers; the selective average ensemble strategy use GA to generate binary weights for all component classifiers.GAOEC is used to recognize multi-class protein fold. It achieves higher accuracy than existing classification methods. The GAET-KNN is proven to be an efficient , robust classifier; two levels learning architecture improves the classification accuracy; two weight strategies are reasonable , efficient.
Keywords/Search Tags:protein fold recognition, GA, ET-KNN, GAET-KNN, GAOET
PDF Full Text Request
Related items