Font Size: a A A

Research On Business Data Analyzing Method Based On Empirical Mode Decomposition And Dynamic Data Mining

Posted on:2009-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:H T LiuFull Text:PDF
GTID:1118360245971893Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
With the development of computer technology, data generation, collection, storage and processing has witnessed increasing improvement in enterprises. However, great amount of data has hindered the application of traditional data analysis methods. Meanwhile, data mining has emerged as a novel interdisciplinary subject, which involved with machine learning, pattern recognition, database, statistics, and AI. Business data mining is an important domain of data mining. Currently, static business data analysis models are relatively mature, while the research of empirical mode decomposition and dynamic data mining in business data analysis is still in its infancy.Compared to Fourier-based linear and stationary spectral analysis, empirical mode decomposition (EMD) is a newly developed time-frequency analysis method. With the aim of analyzing business data by empirical mode decomposition and dynamic data mining, in this paper, the EMD-based basic theories and algorithms are investigated; some new data extension methods are proposed to weaken the end effect of EMD; some new algorithms are put forward to improve the precision and efficiency of EMD; new solutions which are based on the study of the unique characteristics of EMD, constructing the covering algorithm of neural network and K-means clustering, as well as the way to integrate these algorithms with other theories, are provided to solve the problems met in the dynamic data mining and further applied in business data analysis. Specially, this dissertation dedicates to the following works:1. The background of this paper is discussed; the development of the time-frequency analysis is summarized; the state-of-the-art of both domestic and abroad research on EMD and dynamic data mining is introduced. Finally, the main content as well as the general framework and innovations of this paper are presented.2. Some basic concepts of the EMD based time-frequency analysis are introduced; fundamental principles and algorithms of EMD based Hilbert transform are given. With that, simulation signals are analyzed and verified with the EMD based time-frequency analysis. The experimental results show that the introduction of this method into the analysis of nonlinear and non-stationary sequences is quite desirable.3. The mechanism of the EMD endpoint problem is introduced, followed by the systematic study on the characteristics and performance of the mirror extension method and the neural network based data sequence extension technology. Finally, advantages and defects of several extension technologies are analyzed through their comparison, and further propose the polynomial fitting based data extension technology.4. The EMD algorithm is systematically studied to improve its efficiency and accuracy. In the first please, the comparison for obtaining upper and lower envelops of sequences shows that cubic spline interpolation is better than Hermite interpolation. Followed by the investigation of using cubic spline interpolation to access envelops of data sequence which shows a satisfactory result. Afterwards, the problem occurs during the process of obtaining envelopes of series with spline interpolation is introduced. We investigate the existing solution of high (three times higher) spline interpolation based algorithm and demonstrate that it can improve EMD accuracy with the sacrifice of more time cost. Finally, in accordance with the characteristics of EMD algorithm, an improved EMD algorithm based on means of successive extrema is proposed and the experimental results show that it outperforms the original one.5. The problem of time sequence similarity-matching in dynamic data mining is studied. Alternative-covering algorithm is firstly utilized to classify sequences and then sequence matching is conducted. It is effective when realizing pattern matching only using the alternative-covering algorithm. But sometimes, two series with similar trend series has not been classified into the same category because of differences in a few dimensions, and the number of "rejection points" is hence increased. In order to improve the accuracy of similarity matching, an EMD and alternative covering based sequence-matching algorithm is proposed. The experiment shows that this method can reduce the number of "rejection points" and enhance accuracy of the matching algorithm.6. Issues of clustering in dynamic data mining are studied. Dimension reduction is firstly conducted considering the high dimensions of sequences used in clustering. This chapter puts forward a dimension reduction method which employs the EMD and bottom-up algorithms, and further combines it with K-means algorithm to support effective clustering on data sequences.7. Under the national high-tech research and development plan (863 plan) thematic topics (2007AA04Z116), i.e. "Research on key technologies of business intelligence oriented manufacturing after-service", applies the sequence matching algorithm based on EMD and the alternative-covering algorithm proposed in chapter 5 to personal credit evaluation, that is, use EMD method to extract trends of customer's credit history sequence data, and use alternative-covering algorithm to classify trends of the history data. Study from a number of samples of each category to identify features of people who are in breach of the contract or not, sum up the classification rules used to measure the borrower's default risk, and provide the basis of decision-making for consumer credit. The clustering algorithm proposed in chapter 6 is implemented for clustering the customer actions in supermarkets, that is, customers are divided into different categories (or sub-market) by means of clustering customers' transaction data. Each sub-market is then described by the commodities which are purchased with higher rates, so as to make respective promotions and advertisements.
Keywords/Search Tags:Empirical Mode Decomposition (EMD), Dynamic Data Mining, Business Data Analysis, Endpoint Issue, Spline Interpolation, Alternative-Covering Algorithm, Bottom-Up Algorithm, K-means Algorithm
PDF Full Text Request
Related items