Font Size: a A A

Data Mining Of Protein Structure Database

Posted on:2007-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:2120360185496564Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
With the launch of Human Genome Project and the fast development of Bioinformatics, a great amount of original data concerning protein structure are gained by genome sequencing, protein sequencing and protein structure analysis. Meanwhile many protein structure databases have been built up. Of those Brookhaven National Laboratory in America successfully construct the fundamental three-dimensional structure database, Protein Data Bank (PDB), which has been the most integrated database recording protein structure information. PDB is the basis for the researchers to work at the protein structure and relate, which is also the research focus in this paper.One of the main tasks in Bioinformatics is to get the knowledge of relationship between of amino acid sequence and protein three-dimensional structure. Protein structure can be revealed by the amino acid sequence if the relationship is explored. However this process is quite tough. Here our statistical information database by data mining is used for predicting protein secondary structure.The research work is mainly comprised of three phases. Firstly protein sequence and structure slice database is attained by slicing amino acid sequence and structure sequence. Then data mining is carried out by database technology and algorithms to get some principal. Finally the statistical information database based on PDB is achieved. The utmost goal of information database is to construct protein secondary structure prediction system. To verify the proposed method, 20 protein sequences published lately is used for testing set which can not be reached in the statistical information database. The average accuracy of Q3 is 75.10% and 6 samples of the test data set are over 80%.The whole paper is comprised of three sections. The first section mainly describes the basic principal, leading methods and application of data mining; second section introduces the objects investigated and the methods adopted minutely; moreover various statistical information and visualization analysis of information are...
Keywords/Search Tags:Bioinformatics, Protein Data Bank, Data Mining, Protein Secondary Structure, Protein Statistical Information Database
PDF Full Text Request
Related items