Font Size: a A A

The Research On Theoretic Methods And Application Of Data Mining In Databases

Posted on:2006-01-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:K LuoFull Text:PDF
GTID:1118360152470077Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Data mining is one of the most frontier research directions of database and information decision field. We study the data mining in databases. The articile mainly includes the following contents:1 We study the development of database and data mining technology, and discuss their research, lacks and trends. The cause and features of very large database are analyzed. The main features of the algorithms which fit to mining in very large database are discussed. Thus it is essential that all data mining algorithms have near-linear time complexity with data size O(n).2 The algorithm Apriori in mining association rules is studied. First, we point out the defect. Then present an improved algorithm. Compared to the original algorithm, the new algorithm can reduce the number of database scans. It can improve the efficient of the algorithm dramatically in some cases. Next, we present a fast algorithm for mining association rules based on a transaction tree. The algorithm is very efficient since it scans transaction database only one time and mines frequent itemsets without candidate generation. The experimental results show that the algorithm is about 10 times faster than that of the Apriori algorithm. Finally, we offered a method for implementation of Apriori algorithm by Visual FoxPro.3 The judgment criteria of association rules are studied. The present judgment criteria of association rules are a support and a confidence. If the association rules are generated according to the criteria, a lot of them are invalid and false. In order to reduce invalid association rules, we analyzed the reason, presented three improved methods. that is, to add effect, relative confidence or validity in the judgment criteria. According to the value of them, association rules are classified into positive, invalid and negative association rules. In general, only positive association rules are valid, which only cover a small proportion of all strong association rules. In addition, an algorithm of new judgment criteria in mining association rules is presented and tested. The test results show that the methods can obviously reduce invalid association rules.4 We put forward standards according to which we can evaluate classifiers by analyzing and comparing with a variety of typical classifiers. Next, We discuss SPRINT algorithm, and present two new methods to process categorical attributes. With the help of methods, the SPRINT algorithm can reduce its operations for findindthe optimal splitting point, and get more efficient. Finally, we present a fast data classification algorithm based on sampling. The algorithm is both scalable and parallel. The experimental results show that the algorithm is from 10 to 50 times faster than that of the SPRINT algorithm.5 We put forward standards according to which we can evaluate the clustering method and the trends by analyzing and comparing with a variety of typical clustering. Next, based on the analysis of deficiency of BIRCH algorithm, we propose a new clustering algorithm based on sampling. The experimental results show that the speed of the algorithm is obviously faster than that of the BIRCH algorithm.6 In order to improve the efficiency of data mining. We suggest that constraint and multidimensional techniques should be applied in data mining. This paper introduces the types of constraint in data mining, and discusses what kind of constraints can be applied in mining association rules. And we design the data mining system architecture.7 We discuss data mining application in power system. We study the fast algorithm for optimal power flow. Considering that there exist plentiful boundary constraints of reactive power in OPF problems, we treat the general inequality constraints and the boundary constraints separately. We present a new method for solving Optimal Power Flow (OPF) - a projected asymptotically semismooth Newton algorithm. The numerical examples of some standard tested IEEE systems show that the new algorithm has nice convergent performance and computing effect.
Keywords/Search Tags:database, data mining, association rules, classification, clustering, algorithm
PDF Full Text Request
Related items