Font Size: a A A

Study On The Diagnostic Gene Pattern Discovery Technology Based On Microarray Data

Posted on:2012-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2180330467464899Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
DNA microarray technology promotes bioinformatics into a new era of multiple genes and genome-wide study, meanwhile prodeces massive gene expression data far beyond the capabilities of traditional data analysis methods. Designing efficient analysis methods that meet the characteristics of microarray data has become the focus and emphasis in bioinformatics researching. Human disease is usually relevant with genes. Diagnostic genes refer to the genes closely related to specific disease phenotypes, the power of which to distinguish between different classes is often high. This thesis does a through research on the diagnostic gene pattern discovery technology based on the microarry data.First, an unsupervised phenotypes and diagonistic genes discovery algorithm with outlier consideration, called UPID, is proposed. This algorithm uses the heuristic searching method by measuring the similarity within the sub-diviation matrix and the differences between sub-division matrics to simultaneously discover the sample phenotypes and the corresponding diagnostic genes of microarray data. The UPID algorithm overcomes the weakness of basic heuristic searching method. This algorithm takes the noise data exsisting in the microarray data into consideration. By reconciling the sample proportion of each phenotype with the pattern quality, the impact of outliers on phenotype partition is reduced. Meanwhile, incremental iterative strategy is adopted in the iterative process of heuristic search, which reduces the amount of computation per iteration and increases the efficiency of the algorithm. Experimental results show that the proposed algorithm UPID is significantly better than the competing algorithm in efficiency and effectiveness.Second, a diagnostic gene pattern discovery algorithm based on the interesting non-redundant contrast sequence rules is proposed. Aiming to the limitation of singleton discriminability-based model or combination discriminability-based model, this algorithm proposes an EDS model, which characterizes microarray data from a sequence-like perspective. It profitably exploits the ordered expressions among genes based on the defined equicalent dimension group sequences taking into account the "noise" universal in the real data. Then a novel sequence rule, namely the interesting non-redundant contrast sequence rule is devised which is able to capture the difference between different phenotypes and provide as high as possible diagnosis accuracy using as few as possible genes. Futhermore, an efficient algorithm NRMINER is presented to find such rules. Unlike the conventional column enumeration and row enumeration, it performs a novel template-driven enumeration by making use of the special characteristic of microarray data. Finally, extensive experiments show that NRMINER algorithm is one or two orders of magnitudes faster than the competing algorithms and it provides a higher accuracy using fewer genes. Meanwhile, the diagostic genes identified by the algorithm have strong biological significance.
Keywords/Search Tags:Data mining, diagnostic gene, microarray data, sequence rule
PDF Full Text Request
Related items