Font Size: a A A

Research On Relevant Technique Of Data Mining And Its Implementation

Posted on:2005-07-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q D LiFull Text:PDF
GTID:1118360122496912Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of computer, the development of data storage technique and the wide-spread use of bar codes etc, large amounts of data are stored. How to transform these data into useful information and knowledge becomes a new topic in information science research areas. Data warehouse, OLAP and data mining can provide an efficient way to solve the above problems.Based on the background of the development of Liaoyang area dispatching decision analysis system and bank card analysis system for Qingdao Jiaotong branch, focusing on the characteristics of data mining system such as interactive, multiple-level, complex data types---time series similarity mining, integrated mining, the building of mining platform and its application, on the basis of the research on some data mining algorithms, this thesis designs and implements a SEI_OLAM platform. The following subjects are discussed.First, to discuss locating the data mining space . Locating the mining space is to find important dimensions for the given classification or prediction task. The process is essentially a knowledge reduction process. In the thesis, a knowledge reduction algorithm for locating the mining space is presented, which has combined rough set theory and parallel genetic algorithm. It is robust and has excellent results ability, can help user locate mining space quickly, which can improve the efficiency and precision of the mining process. The disclosed dimensions appropriate for data analysis and thus suggest a refined design of the data cube, which may in turn improve the quality of data warehouse construction. At the same time, it is concluded that the parallel method is more effective and efficient for solving large number of attributes and has great significance.Second, to study the time series similarity matching. Time series are important kinds of complex data. Recently a growing attention has been paid to mining similar patterns of time series. In order to improve the accuracy and efficiency of time series similar matching, this thesis proposes a novel similar matching algorithm. It reduces the dimensionality of time series data with wavelet packet transformation firstly, then, multidimensional index structure such as R tree is built using the selected coefficients, Euclidean distance is used as similarity measurement, finally, range query and k nearest neighbor query algorithms are presented. The method considers not only the first few coefficients, but also some detailed coefficients, which captures more information of time series data. Experimental results on electrical load time series data show the effectiveness of the algorithm. The disclosed similar load pattern is important for dispatching scheduling and economical running of power system, which has great practical meaning.Third, to discuss the integration of rough set and artificial neural network. In order to take advantage of each method's generic characteristics and improve precision, a new methodof data mining based on rough set and artificial neural network is proposed. In the method, the above presented attribute reduction algorithm based on parallel genetic algorithm is used to select the most relevant inputs of neural network quickly, after reduction of the training data sets, artificial neural network is used to predict. The method mixes rough set's strong attribute reduction ability and neural network's high precision ability. It has been applied for characteristic analysis of bank card client, and obtained good results. Parallel reduction algorithm can further improve the efficiency of the combined method.Fourthly, based on the research above, to provide a general introduction of online analytical mining platform SEI_OLAM, which is based on data warehouse. Its applications for electric area dispatching decision analysis and bank card analysis field are also discussed. The thesis introduces the architecture of the platform and functionality firstly, then, design and implementation of main components of the platform-data warehouse, online analytical proc...
Keywords/Search Tags:data mining, data warehouse, OLAP, rough set, genetic algorithm, time series, artificial neural network, wavelet packet
PDF Full Text Request
Related items