Clustering And Classification Techniques In Bioinformatics Applications

Posted on:2006-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Huang

Full Text:PDF

GTID:2190360155961443

Subject:Computer application technology

Abstract/Summary:

Thousands upon thousands biology data have been achieved by biology experiments. How to collect, clean up, search and analyze data efficaciously, how to pick rules from data, which are all we must resolve. Data mining is a new technology, which is based upon database, statistics and artificial intelligence. Data mining is a useful and powerful tool to biologist.In the paper, we mainly research gene expression and protein sequence data. We provide a method of protein sequences classification.We design a method to mine continuous frequent patterns. Classify test data is based on these frequent patterns. We design a method of clustering protein sequences. In this method, we mine continuous frequent patterns at first, then cut some of frequent patterns and use them build feature space, and then the data sequences are projected into the feature space, build similar matrix of sequences, at last, we conduct clustering on similar matrix using k-means to find k clusters. We propose a method of gene expression classification. In this method, firstly, we cut gene by expectation and variance of gene, and then turn gene expression into P-tree structure, at last we use P-tree structure to calculate information gain and build multi-decision trees. We design a parallel algorithm to cluster gene expression. In this algorithm, we divide data into several groups and post them to Servers. Then every Server calculate data's density and acquire core genes, we conduct clustering on core genes by K-means and gain clusters. Last. Client conducts clustering on all clusters by K-means. The expressions show that these methods are superior.

Keywords/Search Tags:

Bioinformatics, Gene expression, Protein sequence, Classify, Clustering

Related items

1	Research And Application Of Spectral Clustering In Analysis Of Gene Expression Data
2	Bioinformatics Analysis Of Expressed Sequence Tags (EST) In Several Mammals
3	Research On Gene Expression Data Analysis Method And Its Application
4	Clone And Sequence Analyze And Expression Of COR From Descurania Sophia(L.)
5	Computing Analysis On Gene Expression And Its Transcriptional Regulatory Mechanisms
6	Mathematical Description Of The Biological Macromolecules And Its Applications
7	Studies Of Filteration Of Distinguishing Expression Genes Based On Clustering Arithmetic Of Extend CF-tree
8	Gene Prediction And Sequence Analysis Of Insect OBP CSP And Sid-1
9	Dissertation Identification Of Human Novel Genes And Protein Sequence Motifs
10	Protein Function Prediction Based On The Sequence Circular Relationship Network