Font Size: a A A

Research On Correlation Analysis Algorithms In Data Mining

Posted on:2011-09-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:1118330332960589Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rule mining has been one of the most active research directions because of its extensive applications in commerce. The mining algorithms of strong correlated item pair are effective methods to improve the efficiency of data mining, and they are also one of key approaches to solve the current problems of mining problem for relation database. In the traditional support-based framework for mining association rules, real correlation between the data may be undetected, and at the same time may also produce too many rules without real relevance. Statistical association has been increasingly used by researchers to make up for lack of association rules. Correlation analysis is of great theoretical and practical significance for improving data discovery, search efficiency and promotion of da-tabase applications in all areas of society. By close combination with the demand of subject background, strong association rules and association patterns mining issues are comprehensively and systematically explored in this paper.First, in order to reduce the computation cost of candidate pairs, the Taper algorithm is developed according to 1NF property. The developed TaperR algo-rithm can cut the number of candidate pairs to improve efficiency. Experimental results exhibit that the new algorithm is well-worked in the mining of all-strong-pairs. So it is more suitable for real relation database system.Secondly, an efficient algorithm of acquiring item pairs is proposed by through one pass technique, without generating any candidate sets. The problem of finding support based top-k strongly correlated item pairs basically is a prob-lem of computing the 1- and 2- element itemset and use the support to get top-k strongly correlated item pairs. Proposed approach uses correlogram matrix to store the support of all 1- and 2- element itemsets. Later, correlogram matrix is used to calculate correlation coefficientφof all the item pairs and extracts the k mostly correlated pairs. Experimental results verify the effect of new method. Thirdly, for the mining of Top-K all-strong-pairs on relation database, Top-K all-strong-pairs algorithm is proposed based on threshold-estimating. That is to find K item pairs with biggest Pearson's correlation coefficients, using the structured information and comparision algorithm. The proposed method is proved effective by experiment.Forthly, an intelligent minimum support suggestion framework is proposed based on the user preference ontology. The system finds the most similar queries to the user's mining intension, aggregates them and obtains the favorable support range for the user to refer. With this method, the setting of support threshold for Apriori algorithm is not all subjective but also includes extra knowledge from other users'experiences. This improves the efficiency of user's query formulation process and the result rules or the mining tend to be closer to user's require-ment.On the other hand, to solve the problem of finding frequent and correlated pairs of patterns in structured databases, we develop an algorithm with powerful pruning capabities. The applicability of the proposed algorithms to discovery of pattern pairs in single and multidimensional structured databases is discussed. The effectiveness of them is also assessed through experiments.Lastly, an image association patterns mining algorithm is proposed based on domain knowledge-driven. These images contain many region of interest (ROI) with diagnostic significance, and the ROI has its own attributes and there exists spatial relationship among ROIs, and the image itself also contains the attributes and description. All these characteristics are that the traditional relational data does not have. Under the guidance of domain knowledge, we extract the charac-teristics of ROI when the image is being pre-processed, and mine the association rules in class item sets that are clustered according to these features. We propose the EXFP-GROWTH algorithm which is able to quickly mine association rule related to the tasks by filtering out the items having no association with mining.This paper also gives examples of the results analysis and research significance.
Keywords/Search Tags:Data Mining, Multidimensional Structured Database, Association Rules, Top-K Strongly Correlated Item Pair, Domain Knowl-edge-Driven
PDF Full Text Request
Related items