Font Size: a A A

Study On Intelligent Algorithms For Motifs Discovery

Posted on:2010-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:D K SunFull Text:PDF
GTID:2178360275954816Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In the field of motif discovery,two areas are focused on:the one is finding some motifs among different protein super-families;the other is finding some motifs among different sub-families in a super-family.Contributed to several reasons,the latter is more difficult than the former.Firstly,the structure otherness of motifs among protein super-families is more obvious than the ones among protein sub-families in a family. Secondly,the information of motif is hided in the long amino acid sequence of protein; in other words,it is difficult to degrade redundant information.Therefore,how to degrade redundant information in order to find motifs thought some algorithms is the aim of this thesis.In addition,classifying the family,which a protein belongs to,is another aim based on motif information.The thesis is researched based on the sub-families of the ligase enzyme,because the enzyme family plays a very important role to the organisms,which is equivalent to the chemical plant organisms and is responsible for the provision of energy and required material of organisms;and the ligase enzyme database is relatively whole.In the thesis,firstly,some features are selected from the sub-families based on motif characteristics,biological theory and statistical methods,and they are connected in order to find some motifs and construct the classification.Secondly,a better motif discovery algorithm is built to find flexible pattern.Finally,in order to obtain better classification veracity and efficiency,the classification based on immune-fuzzy algorithm is proposed.Therefore,the main contents of the thesis contains the following:The structure characteristics of motif sequence are analyzed based on the biological knowledge.The statistical models are constructed in order to select some features. According to these features,the method based on memory connection is used to connect these features into some longer amino acid sequence and extract some motifs. Finally,based on the motifs,the classification is built to classify protein sequences into different families and validate the validity of the algorithm. To construct algorithm function and coding strategy of computer,the characteristics of flexible pattern is researched.And genetic algorithm,which is suitable to extract motifs of protein sequence,is proposed based on the algorithm function and coding strategy in order to find optimized structure of motifs.Finally,the AMP-binding domain signature is used to validate the validity of the algorithm.They are analyzed that the characteristics of artificial immune algorithm and fuzzy classification,which contains amino acid fuzzy set,sequence length fuzzy set,the gap between amino acid fuzzy set,to construct the Immune-fuzzy classification.The artificial set is used to validate the validity of the classification.In this thesis,some artificial intelligent algorithm and statistical method are proposed to extract motif and construct the classification based on motif.These algorithms are scalable and significative to bioinformatics.
Keywords/Search Tags:motif, immune-fuzzy classifier, statistical memory connection, genetic algorithm, ligase, flexible pattern
PDF Full Text Request
Related items