Font Size: a A A

Research On The Application Of Rough Set Theory For Protein Structure Prediction

Posted on:2006-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Q YuFull Text:PDF
GTID:2168360155453045Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The main idea of rough set theory is to obtain the notional classifiedrules though knowledge reduction with the precondition of maintaining theability of classification. From a new profile, it defines the knowledge as theability of partitioning the theory domain and adapts the equivalence inmathematics into analysis method. By doing this it achieves a newmathematical method which can be used in data analysis especially in inexactand imperfect data analysis. At the same time, any preproved informationexcept the problem related data set is unnecessary for rough set theory. It canabstract and gain the classified rules only by erasing the redundantinformation with observation data and comparing the degree of imperfectdata (rough degree, the dependence and importance between attributes). Itoffers a powerful analysis method to deal with the uncertain information andmake a wide use.Bioinformatics includes acquiring, processing, saving, distributing,analyzing and translating of biological information. It is a growing area ofscience that uses computational approaches to answer biological questions.By using mathematical, computer science and biological tools together, itclarifies and helps people in understanding the biological meaning of greatvolume of data.The target of protein structure prediction is to find some mapping fromproteinic amino acid linear sequence to all atomic 3D coordinate in protein.At present, there are tow main kinds of protein spatial structure predictionmethods, the fist one is molecule dynamics which study and predict theprotein structure and folding process by some basic theory and hypothesis.The other prediction method is based on knowledge, it predicts the unknownprotein structure from the already known protein structure rules which havebeen observed and summarized.Protein secondary structure prediction is one of the important tasks ofbioinformatics. Protein secondary structure prediction forecasts thecorresponding secondary structure of every amino acid in a sequence. Themain protein secondary structure includes helix, strand and non-routinestructure. H, E and C represent helix, strand and non-routine structurerespectively.The basic foundation of protein secondary structure prediction is thatevery neighboring amino acid residue has the tendency of coming into asecondary structure. To forecast the secondary structure we should makesome statistics and analysis then find these tendencies and rules, in anotherword classifying and detecting the problems. Though there is a long distancefrom the veracity of protein secondary structure prediction, the predictionresults can also provide a lot of useful structure information especially thereal protein structure is currently unknown. So usually the protein secondarystructure prediction is the first step of protein 3D structure prediction, it canbe the base of other tasks too. Forecasting the corresponding protein secondary structure from aminoacid sequence is an important step for us to learn the protein structure andfunction. It provide the three levels modeling especially there are short ofproper common source models with important foundation and it will reducethe research space when simulate the protein folding. The prediction resultscan be used in other bioinformatics research fields too, and offer some cluesfor protein functions and attributes analysis. The protein secondary structureprediction based on artificial neural network is well used in bioinformatics.According to the level of information mine, ANN (artificial neural network)method can be divided into three generations: In the first generation, thesources of information only come from the single residue in the sequence. Inthe second generation, the interaction information between local residues wasadded into the prediction arithmetic. In the third generation, homologicsequence information was added based on the second generation. Because ofthat the precision of prediction is higher. At present, the main popularprediction precision arithmetic is Q3. Q3 can evaluate the precision of thearithmetic. As for rough set theory, any preproved information which is unrelatedwith the currently dealing problem data set is unnecessary. On the other hand,ANN which has high classified precision, robust character is a widely usedmachine learning method and the earliest technology used in biology analysisfield. The both methods mentioned above can fix with each other very wellso there will be an excellent future in the protein secondary structureprediction with the effective cooperation of the two technologies. This article gives a new method of protein secondary prediction basedon rough set and artificial neural network. The main idea is to abstractlyoptimize the original data with domain knowledge firstly: The key step isdoing attributes reduction to neural network study samples with rough settheory, and then specifies the initialized number of joint weight valuebetween neural network input layers nodes and hiding layers nodes. Secondly we train the artificial neural network with the optimized inputdata to specify every parameter in the network then finish the network design. It deals with the protein amino acid sequence in advance. The processcan be explained like knowledge acquirement with the guidance of fieldknowledge. It can erase some useless data and turn the complicated databaseinto relatively simple figures and make the database more understandable. With the help of the rough set's data pretreatment and preprovedinformation, we can get the heuristic information and then specify theoriginal network connection weight. During the network studying phrase, theinput layer represents the abstractly optimized "class sequence"of aminoacid sequence, the output layer represents the central residue's secondary...
Keywords/Search Tags:Application
PDF Full Text Request
Related items