Font Size: a A A

Research On Method And Application Of Data Mining Combining Global And Local Analysis

Posted on:2008-10-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L WangFull Text:PDF
GTID:1118360215493959Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, data mining and its applications have already come into many disciplinesand achieved plentiful fruits in diversified fields. For example, many data mining theoryand algorithms are proposed; kinds of data are mined; diversified data mining platformsare developed. Data mining is a cross-discpline science, researchers apt to investigatedata mining from their own fields, such as, frequent pattern mining studies the quickcounting from the large scale data sets, the global model summarized from the datamodel comes from machine learning and statistics field.This thesis investigates data mining through combining global model constructingand local pattern mining, especially studies the method and application of global datamining based on local information. First, on the basis of the process of data mining,we divide the problem into three phases, including improving the efficiency of miningusing granular representation, improving the interpretation of local mining using globalmining, improving the efficiency and accuracy of global mining. Then we investigatethese three problems respectively. Finally, based on the literature data, we study thetopic evolving and influence for understanding community development.In summary, the thesis includes the following works:(1)Two algorithms GB-FIM, GrC-FIM for frequent itemset mining. Since datamining in distorted databases brings enormous overheads as compared to normal datasets, the thesis presents two algorithms based on granular representation to addressthe efficiency problem of frequent itemset mining in distorted databases. Especially,using the key granule concept and granule inference, support counts of candidate non-key frequent itemsets can be inferred with the counts of their frequent sub-itemsetsobtained during an earlier mining. This eliminates the tedious support reconstructionfor these itemsets. And the accuracy is improved in dense data sets while that in sparseones is the same.(2)A novel post-processing approach to summarize frequent sequential patterns.In order to reduce the number of frequent sequence pattern mined, we propose a hybriddistance function synthetizing the support and similarity of pattern, and use it to com-press the sequential patterns and extract the order information. We propose a uniformmeasure function to balance the efficiency and quality, and describe the process of ob-taining approximate partial order from clusters. A thorough experimental study withboth real and synthetic datasets shows that ApproxPO-Mix can compress sequences into high-quality partial orders efficiently.(3)Use the local pattern to improve the global model constructing. First, we pro-pose a heuristic algorithm for constructing global partial order efficiently. Using thelocal sequence pattern information, the method avoids the problem of constructing thepartial order model with high mathematical complexity, and can obtain accurate result.What's more, it can reflect the users' requirements about the support difference of theevents. Then, in order to solve the interpretation ability of local information in globalpartial order, we propose a transitive closure to improve the information representationof dynamic bayesian network. With the proposed transitive closure property, a more ac-curate model can be constructed to represent temporal features of the evolving researchfield.(4)Investigate the topic evolving and influence relationship in literature mining.The thesis uses our proposed dynamical bayesian network model to investigate the topicunderstanding, and experiments on real data sets indicate that the proposed methodcan discover interesting model effectively and help researchers get a better insight intocommunity evolution for various research fields. In addition, we propose a solutionwhich discovers hidden topics as well as the relative change of their intensity as afirst step and then uses them to construct a module network. Through this way, wecan produce a generalization module among different topics. In order to eliminate theinstability of topic intensity for analyzing topic changes, we adopt the piece-wise linearrepresentation so that we can model the topic influence accurately. Some experimentson real data sets validate the effectiveness of our proposed method.
Keywords/Search Tags:data mining, global, local, granular computing, bayesian network, literature analysis
PDF Full Text Request
Related items