Font Size: a A A

Research On Mining Algorithm Of Association Rule And Its Application For Biological Data

Posted on:2009-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:M MaFull Text:PDF
GTID:1118360242495961Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the quick development of the research of Genomics and Proteomics, at the same time, more advanced biology technology invented, huge amount of biological dataum are accumulated, which provide the data basis for uncovering the nature of life. The biological dataum have many its own features, which consists of plenty of categories, high-throughput and high dimension. All these features make it very diffcult to analyze these biological dataum because it far beyonds the capalicity of the traditional statistical analysizing methods. Analyzing biological dataum has become the bottleneck of biological research. The requirements of processing, mining, analyzing and understanding biological dataum become increasingly urgent.Some problems are with the research of analyzing biological dataum currently. For example, A trend appears that more and more complicated algorithms and models are adopted when analyzing biological dataum.Also, it is hard to interpret the analyzing results biologically from the black box algorithms. While the aim of bioinformatics research is to interpret biological phenomena and dig out the nature of life based on the biological dataum, accordingly, more appropriate analyzing algorithms are needed to analyze biological dataum.Association rule is an important data mining technology. Using such technology, some patterns can be finded form biological data that is significant biologically and mathematically. In this dissertation, the theoretics and application of the algorithm of association rule for analyzing biological dataum are studied. The main content in this dissertation are described below.(1)The study of the algorithm for mining multi-association rules and its applicationBiological data contains abundant connotation, lots of which can't be mined using traditional associaiton rule algorithm. In order to mine more knowledge form biological data, a new form of association rule, multi-association rule, is presented in this dissertation. This dissertation presents the formal definition of the multi-association rule, the mining guid lines for useful multi-association rule and an algotrithm for mining multi-association rule. Applying this algorithm to mine three datset and many useful rules obtained.(2)The study of analyzing protein sturcture data using quantitative association ruleIn 1961, Anfinsen presented such assumption that the amino acid sequences of protein molecule totally determine its spacial structure. To validate such assumption, we can divide it to the following problems: Are the amino acid sequences of protein random? Does different type of amino acid have different orientation for developing different protein spacial structure? Do the occurring patterns exist in the amino acid sequences? Do these patterns have different orientation for developing protein spacial structure? Most current research mainly focuse how to predict protein spacial structure in each site based on the amino acid sequences, which is qualitative analysis. Few research is about the orientation of every teype of amino acid for developing different protein spacial structure using quantitative analysis methods. This dissertation analyzes the association relationship of the amino acid ingredient in protein and the protein spacial structure using quantitative association rule. Many interesting association rules obtained through experiment. Such rules obtained here can hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in protein and the guiding information containing in the amino acids sequences for the development of the structure of the protein. These rules will prove very important in the design and synthesis of artificial peptides outside the cell.(3)The study of application of clustering and association rule mining to analyzing gene expression dataBecause of the high dimension and small sample set of gene expression data, it is impossible practically to mine gene expression data using association rule mining algorithm directly. According, this dissertation incoporate the clustering and association rule mining to analyze the gene expression data. Firstly using clustering menthod to get some gene clusters, and then discritize each gene to seven items, at last, we can get many rules from every gene cluters using association rule mining algorithm. These rules can give not only the information about gene regulation direction but also that about gene regulation strength.(4)The study of mining classifying rule from tumoral gene expression dataClassification based on association rule is a useful predictive technology. Because the gene expression data has high dimension but small sample set, it is hard to construct classifier using traditional association rule mining method based on such data. Hence, this dissertation provide a new method that directly mine classifying rules from gene expression data and construct clsssifier using these classifying rules. The experiment results show that this method has a high predictive accurency and is easy to interpret biologically.
Keywords/Search Tags:data mining, association rule, quantitative association rule, multi-association rule, classifying rule, clustering, gene expression data, protein structure data
PDF Full Text Request
Related items