Font Size: a A A

Research Of High Efficient Data Mining Method Of Rough Set Based On Divide And Conquer Method

Posted on:2012-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:F HuFull Text:PDF
GTID:1118330371994836Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the methods of data mining, rough set (RS) is a valid mathematical theory for dealing with imprecise, uncertain, and vague information. It has been applied in such fields as machine learning, data mining, intelligent data analyzing and control algorithm acquiring, etc. In the research of rough set theory, there are some main steps:data preprocessing, data discretization and knowledge reduction (attribute reduction and value reduction). Through these steps, some useful knowledge can be extracted from original data. When the rough set theory is used to process large data set, large data set can be discreted and its dimensions can be decreased. That is, the complexity of processing large data set will decrease. Therefore, rough set theory is an effective way to data mining.However, some theories and methods about rough set need to be solved further, for example, the data mining methods with high accuracy and efficiency, the principle and methods for huge data mining based on rough set theory, the principle and method of processing dynamic data based on rough set theory, the power generalization of rough set theory, the data mining methods based on rough set theory under cloudy computing, and so on. Due to the above problems, it is necessary to do further research and afford better solutions.Divide and conquer method is a simple granular computing method, which can be used to process complex issues. When the size of the data set is large, it is very difficult to be processed, even be impossible. By the divide and conquer method, the complex issue can be divided into a number of sub-problems, then these sub-problems will be solved recursively, and their solutions will be merged finally. Therefore, if the divide and conquer method can be used to data mining based on rough set theory, some novel methods with high accuracy and efficiency may be developed.In this dissertation, combining with the divide and conquer method, the data mining methods in rough set theory are studied. The contents are organized as follows. First, the existing theory and application of rough set theory are concluded. Second, the data mining methods based on rough set theory are studied. The main contents include:the complexity analysis for quick sorting a two dimension table, the quick algorithm for data discretization, the divide and conquer theory and method in rough set theory and quick algorithms for knowledge reduction based on rough set theory. The main contributions of this dissertation are listed as follows.(1) The average time complexity of sorting a multi dimension table is improved to be only O(nx(m+log n)), which improves the capalibitiy of processing huge data sets based on rough set theory.Based on the divide and conquer method, the average time complexity of sorting a two dimension table is improved to be O(nxmxlog n), where, m is the number of the attributes and n is the number of the records. The result has been applied to improve the existing algorithms for knowledge reduction. Good results of computing attribute core and knowledge reduction have been obtained by the improved result. The result may be important for huge data mining.(Chapter2)(2) Combining with dynamic clustering, a quick algorithm for discretization is developed. By the proposed algorithm, the discretization of huge data set can be processed quickly.Discretization plays an important role in rough set theory. So far, many efficient algorithms for global discretization of decision table have been proposed, but few of them can process data sets quickly with high recognition accuracy. In this dissertation, a two steps strategy is adopted. Firstly, the candidate cuts are dynamic clustered according to the importance of candidate cut points. Secondly, the clustered cut points are selected using heuristics. Then, a heuristic discretization algorithm is developed. The dynamic clustering can decrease the number of condition point cuts effectively which is useful to select candidate cut points. Experiment results show that the proposed algorithm can process data sets quickly with high recognition accuracy.(Chapter3)(3) Based on the divide and conquer method, the algorithms for attribute reduction and value reduction are developed. The proposed algorithms can be used to knowledge reduction of huge data sets.Divide and conquer method is an effective tool to process complex problem for its capability to divide the original problem with big size into many sub-problmes with small sizes. Thus, if the divide and conquer method can be used to design the algorithms based on rough set theory, some high efficient methods may be obtained. In this dissertation, the knowledge reduction methods by divide and conquer method are studied. Firstly, combining with divide and conquer method, the decomposing method of decision table under equivalence relation is presented, which can be used to compute positive region, attribute core, attribute reduction and operate discernibility matrix. Secondly, by the divide and conquer method, the decomposing method of decision table under tolerant relation is also presented, which can be used to compute value reduction of decision table. Thirdly, the abstract processes based on the divide and conquer method for designing rough set algorithms is proposed, which is useful to develop high efficient algorithm of rough set theory. Besides, a case of an algorithm for computing positive region is presented to illustrate the proposed abstract processes. Based on the proposed principle using divide and conquer method, the algorithms for attribute reduction and value reduction based on divide and conquer method are studied. In these two algorithms, the divide and conquer method is used to divide the universe quickly and operate the discernibility matrix of the decision table. One is a quick algorithm for attribute reduction, which is the improved one of the algorithm by Wang Jue. The other is a value attribute reduction for certain rules based on the divide and conquer method. Simulation experimental results show that the recognition accuracy is high using the two proposed methods. Moreover, by the proposed methods in this dissertation, the network based intrusion detection data sets (KDDCUP99) over3000000records can be processed with high recognition accuracy on personal computer. The proposed method improves the data processing capalibitiy of existed data mining methods based on rough set theory.(Chapter4and Chapter5)...
Keywords/Search Tags:data mining, rough set, divide and conquer, discretization, knowledge reduction
PDF Full Text Request
Related items