Font Size: a A A

Characteristic Identification And Dimensionality Reduction Based Complex Data Forecasting And Classification Research

Posted on:2016-04-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ShaoFull Text:PDF
GTID:1109330488493387Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data prediction and classification are two of the most important research in data mining area, which have received the widespread attention for a long period of time. Data prediction and classification theories are the quintessence of cross merging by management science, economics, mathematics, computer and other different disciplines. Nowadays, they have been widely used in energy market analysis, financial market price prediction and risk control, biological information recognition, customer behavior analysis in business intelligence analysis as well as many other areas. In recent years, with the rapid development of Information and Internet Technology, the gradually maturing of Cloud Computing and Big Data Analytics, complex data prediction and classification research are facing a lot of opportunities and challenges. On the one hand, with the development of the emerging technology and industry, the fast collection of large number of real-time and on-line data has eventually been implemented. In this situation, those data mining technology, such as data prediction and classification will play a more important role in many real applications. On the other hand, from the perspective of data dimensions, data types and data sample size, the data object we studied is becoming more and more diversified and complicated, which brings great difficulty in our data mining research. The research of complex data prediction and classification becomes more complicated than before, and therefore presents more requests for accurate, reliable and practical knowledge acquisition.Based on the review and summarize from the data prediction and classification theory at home and abroad, we considered the aspects of the complexity of the frequency domain characteristics, the complexity of the data dimension and multi-source data matching, and focused on the key problems of data prediction and classification. In this paper, total electricity consumption data and the electricity market price data are considered as the main object in our research. Based on these data, the researches and experiments of complex data prediction and classification are conducted and illustrated. The research content is as follows:(1) A semi-parametric and similarity measure based probability density prediction framework is proposed. First of all, the scope of traditional data forecasting method and the problem is analyzed systematically. On this basis, considering the characteristic of multi-source correlation, a new nonparametric smoothing strategy based semiparametric regression model is constructed, which also combines with the Bootstrap probability interval estimation for the analysis. Secondly, due to the fact that the operational mechanism between multiple influencing factors and research objects is difficult to determine, a normalized K-L divergence similarity measure, factor analysis and causality relationship test based recognition method are introduced for the leading variable selection and identification. Finally the effectiveness of the proposed prediction framework is verified in our experiments. The proposed framework could be benefit for providing a reference for exploring the complex operational mechanism between multiple influencing factors and research objects.(2) A dimension reduction based piecewise additive semi-parametric regression prediction framework is proposed. Inspired by additive modeling framework, a piecewise additive semi-parametric regression prediction model is designed for analyzing the periodic, multidimensional, and multi-granularity characteristic of the research objects, which also combines with the Bootstrap probability interval estimation for the analysis. In order to ensure the feasibility and rationality of out-of-sample forecasting, an effective approach for the future trend simulation is presented based on the Bootstrap resampling technology. The proposed framework incorporates the additive modeling ideas for data dimension reduction. During the modeling stage, variable recognition method is introduced for the leading variable selection and identification. Through modelling the probability distribution of target variable, a feasible large-span epitaxial prediction method is presented, while the effectiveness and applicability of our framework to predict the electricity demand have also been proved.(3) A probability density prediction framework based on semiparametric regression using feature extraction is presented. In order to handle the cyclical and multiple frequency domain characteristics of the demand series, the EEMD-based frequency domain decomposition method is utilized to accomplish the mission of multi-scale analysis. Through combining frequency domain feature selection and recognition method to determine characteristic signal and random signal of the original sequences, different frequency domain signals are recognized and reconstructed. After that, by combining the orthogonal least squares estimation method and the Bootstrap probability interval estimation, a new nonparametric smoothing based semiparametric regression model is constructed. According to the proposed model, different characteristic signals and trend signals are modeled and forecasted. Our framework introduces the frequency domain analysis method for feature extraction and dimension reduction processing of the original data. Through identifying the characteristic signals (seasonal and trend components) of the original demand and providing probability distribution modeling separately, our framework have provided new insight into non-stationary time series forecasting and analysis.(4) A new feature selection and support vector machine (SVM) based high-dimensional data prediction (classification) framework is presented. Considering the unstable characteristic of traditional prediction method (continuous numerical method) in handling the non-stationary and extreme fluctuation series, it is necessary to introduce new approach to the modeling. Based on the idea of data classification, traditional data categories can be regarded as a description of the object data variations. Through adding the relevant category object and category range, the interval prediction can be converted to a pattern classification problem. By combining Filter and Wrapper feature selection method, a new multiple classification approach which we called, SVM-RFE-MRMR classifier is presented. A new data classification based complex data forecasting framework is constructed by combining with PCA-DP time series segmentation as well as other methods, which proved a new insight into the traditional continuous time series prediction. Meanwhile, our framework also have offered a new way for analyzing and predicting those continuous time series with non-stationary, high-frequency, and high-dimensional characteristics.
Keywords/Search Tags:Data prediction, Data classification, Characteristic analysis, Dimension reduction, Semi-parametric regression model, Support vector machine, Frequency domain analysis, Feature selection
PDF Full Text Request
Related items