Font Size: a A A

The Research And Implementation Of Tree-Based Data Mining Algorithm

Posted on:2007-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:C SuFull Text:PDF
GTID:2178360185486124Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is a science using computer to store, retrieve and analyze biological information in biology science. The recent research focuses on Genomics and Proteomics, which investigate structural and functional information of nucleus and protein. As a technology based on database, statistics and AI, data mining provides biologists useful information analyzing tools. Frequent pattern mining technology in data mining is for mining characteristic patterns with frequent occurrences among data. According to the complexity of pattern, the mined characteristic patterns could be sorted as frequent item, frequent sequence and frequent sub tree, etc. The paper builds tree-model of RNA molecules and utilizes frequent sub tree mining algorithm to mine common topological patterns among RNA secondary structures.The paper summarizes the history of frequent pattern mining technology and the development of frequent sub tree mining algorithms; introduces the main methods of RNA secondary structure prediction; analyzes the deficiency of the applications of data mining in bioinformatics. Then the paper formulates related notions of frequent sub tree, distinguishes embedded sub tree from direct sub tree, and defines isomorphic overlapped sub tree and minimum. After that, the existent frequent embedded sub tree mining algorithms, TreeMiner and PatternMatcher, are analyzed. TreeMiner mines vertically, and PatternMatcher mines horizontally. While, both of them can not distinguish isomorphic overlapped sub trees during mining. The paper presents a novel algorithm DistinctTM(distinct tree mining),which can mine embedded sub tree. The algorithm eliminates the redundancy brought by isomorphic overlapped sub trees, and assures the minimum of frequent pattern. The experimental results indicate DistinctTM algorithm is priori to TreeMiner algorithm and PatternMatcher algorithm. At last, the paper gives the method of transforming RNA secondary structures to tree modes, and utilizes DistinctTM algorithm to mine common topology patterns among RNA molecules.
Keywords/Search Tags:data mining, embedded sub tree, isomorphic overlapped sub tree, biological data, RNA secondary structure
PDF Full Text Request
Related items