Font Size: a A A

Structure Of Data Mining And Processing Problems

Posted on:2006-09-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:C WangFull Text:PDF
GTID:1118360155460411Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, data mining and its applications have already come into many disciplines and achieved plentiful fruits in diversified fields, including artificial intelligence and machine learning, database, pattern recognition, bioinformatics, neural computing, and so on. It not only appeals scientists but also catches the attention from governments and industries. The governments, industrial communities, and academic fields are so keen on mastering data mining techniques that they have invested a large deal of money and energy on the corresponding research. Therefore, the progress of data mining will promote the development of science and society.With the progress of data mining techniques, more and more questions have been presented. The demand of mining on complex data is rising now. Experts have paid attention to these fields and tried to solve the problems by virtue of the experience of unstructured data mining like frequent itemsets mining. In this paper, I do the research on structured data mining and processing.In this dissertation, 4 problems standing in need of solutions are investigated, which includes improving the efficiency of semi-structured data mining, promoting the scalability of structured data mining, mining graph data with constraints, and indexing graph database. The main contributions of the dissertation are summarized as follows:Firstly, 4 algorithms, Chopper, XSpanner, ESMiner and ISMiner, have been proposed. Those algorithms mines frequent induced and embedded subtrees by virtue of method of pattern growth and rightmost path growth respectively. Experimental results show that the algorithms perform better than those algorithms presented ago like TreeMiner and FREQT.Secondly, a novel graph indexing structure of ADI is proposed. It is embedded into graph mining algorithm to improve the scalability. Experimental results show that ADI-Mine perform better than others like gSpan, the best graph mining algorithm before. Based on it, I continue to present the ideas on transplanting the ADI indexing structure into other graph mining algorithms for improving their efficiency and scalability.
Keywords/Search Tags:Data Mining, Semi-structured and structured, Frequent subtree, Induced subtree, Embedded subtree, Pattern growth, Monotonic constraint, Anti-monotonic constraint, Non-monotonic constraint, Indexing, Querying, Weblog, XML, Social network, Genetic sequence
PDF Full Text Request
Related items