Font Size: a A A

Based On Data Mining, Biological Sequence Analysis

Posted on:2007-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y S SangFull Text:PDF
GTID:2208360185955840Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The biological sequence analysis is the main research area in bioinformatics, and its primary mission is to mining knowledge from the massive biological sequences, and to explore the mystery of the life. The biological sequence analysis research content mainly includes the sequence alignment, the protein structure prediction, and the genome sequence analysis etc. The thesis studied the pair-wise sequence alignment algorithms and the protein secondary structure prediction methods emphatically.Firstly, this thesis studied sequence alignment algorithms in detail, such as Needleman-Wunch algorithm based on the dynamic programming (DP algorithm), Smith-Waterman algorithm etc, and compared with their advantageous and disadvantageous. Then, the thesis proposed a novel pair-wise sequence global alignment method SAFSS, which is based on the frequent subsequence. With Massive complex computations, DP algorithm can obtain optimized alignment result, but it may neglect the biological significance in sequences. SAFSS mainly processes frequent subsequence in the sequence, which is easy to discovery the biology significance in the redundant fragments. Compared with DP algorithm, SAFSS is an algorithm with lower space complexity and has high performance.Another research subject is the prediction of protein secondary structure. Some prediction methods were discussed, which include neural network, sequence pattern mining etc. Some improvements have been applied to Basic BP neural network, which can improve its Convergence rate and accuracy. During the Feature Extaction of sequence pattern mining, we mainly take account into the hydrophobicity property and the adjacency relationship of acids, and it achieved good results.This thesis discussed two important topics in the biological sequence analysis. SAFSS and Data Mining method for protein secondary structure prediction are of importance, which lay the foundation for the further research for author.
Keywords/Search Tags:Bioinformatics, Biological sequence alignment, Data Mining, Protein secondary structure prediction
PDF Full Text Request
Related items