Structure Of Data Mining And Processing Problems

Posted on:2006-09-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Wang

Full Text:PDF

GTID:1118360155460411

Subject:Computer software and theory

Abstract/Summary:

Recently, data mining and its applications have already come into many disciplines and achieved plentiful fruits in diversified fields, including artificial intelligence and machine learning, database, pattern recognition, bioinformatics, neural computing, and so on. It not only appeals scientists but also catches the attention from governments and industries. The governments, industrial communities, and academic fields are so keen on mastering data mining techniques that they have invested a large deal of money and energy on the corresponding research. Therefore, the progress of data mining will promote the development of science and society.With the progress of data mining techniques, more and more questions have been presented. The demand of mining on complex data is rising now. Experts have paid attention to these fields and tried to solve the problems by virtue of the experience of unstructured data mining like frequent itemsets mining. In this paper, I do the research on structured data mining and processing.In this dissertation, 4 problems standing in need of solutions are investigated, which includes improving the efficiency of semi-structured data mining, promoting the scalability of structured data mining, mining graph data with constraints, and indexing graph database. The main contributions of the dissertation are summarized as follows:Firstly, 4 algorithms, Chopper, XSpanner, ESMiner and ISMiner, have been proposed. Those algorithms mines frequent induced and embedded subtrees by virtue of method of pattern growth and rightmost path growth respectively. Experimental results show that the algorithms perform better than those algorithms presented ago like TreeMiner and FREQT.Secondly, a novel graph indexing structure of ADI is proposed. It is embedded into graph mining algorithm to improve the scalability. Experimental results show that ADI-Mine perform better than others like gSpan, the best graph mining algorithm before. Based on it, I continue to present the ideas on transplanting the ADI indexing structure into other graph mining algorithms for improving their efficiency and scalability.

Keywords/Search Tags:

Data Mining, Semi-structured and structured, Frequent subtree, Induced subtree, Embedded subtree, Pattern growth, Monotonic constraint, Anti-monotonic constraint, Non-monotonic constraint, Indexing, Querying, Weblog, XML, Social network, Genetic sequence

Related items

1	Study On Frequent Subtree Mining And Its Application In XML Mining
2	Embedded And Export Of Frequent Subtree Mining Algorithm
3	Research On Embedded Frequent Subtree Mining
4	The Research On Frequent Subtrees Mining And Corresponding Techniques
5	A Connection And Combination Based Research For Subtree Mining
6	Study On Frequent Subtree Sequence Mining
7	Research On Frequent Subtree Mining And Pruning Strategies
8	Research On Frequent Pattern Mining In XML
9	Research And Application Of Data Mining Algorithm Based On Graph Pattern
10	Research On Frequent Subtree Mining