Font Size: a A A

Research On The Decision Model In The Protein Boundary Prediction

Posted on:2015-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z H AiFull Text:PDF
GTID:2180330452956853Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the development of biotechnology technology, more and more genomes have beendetected the whole structure, which provides large numbers of raw data for scientists.However, only by genomes we can not get known the bio-function of genes and proteins.Actually, most proteins and ORF are unknown for their function. Domain is a subunit ofprotein, which can also fold and develop independently. To get the accurate edge betweendomain and domain is usually the first step of declare the function of proteins. And ThreaDomis one of such methods.The article analyses and learn from common methods of algorithm for predictingprotein domain edges via protein sequences, then analyses the ThreaDom method aimingto find out the deficiencies after learning the good and bad points of algorithms mentionedbefore, and finally concluded that for some unregularly protein sequences, the basicconstant threshold can not provide enough information. So we hope to use method ofmachine learning and statistics to improve the aimed protein sequences. We choose C4.5algorithm to act. Then article introduce it and according to its algorithm extract, chooseand optimize features from protein sequences. And then introduce the information aroundwhat to do and how to do with features.Based on the algorithm, we finally realize the method of improving the result ofThreaDom method, and test it on all kinds of data. By comparing the result with the resultof the other algorithms, we concluded that the new method has a better result thanThreaDom on predicting the edges between protein domains, thus, it has a goodimprovement on predicting unregularly results in ThreaDom.
Keywords/Search Tags:Protein Domain, C4.5algorithm, Decision model
PDF Full Text Request
Related items