Font Size: a A A

The Research On Time Series Analysis Techniques

Posted on:2014-10-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:VO THI THANH VANFull Text:PDF
GTID:1268330425983974Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is the analysis of observed data sets in order to find the models and tosummarize the data in the new ways that are meant for both understandable and useful. Dataarriving in time order arises in fields ranging from many other areas of physics, finance,medicine, music, and so on. The time series is an important class of temporal data objects andthey can be easily obtained from financial and scientific applications. Time series analysiscomprises methods and techniques for analyzing time series data in order to extractmeaningful statistics and other characteristics of the data. Given the spread of the appearanceof time series data, and the exponentially growing sizes of databases, there has been recentlybeen an explosion of interest in time series data mining. As extremely large time series datasets grow more prevalent in a wide variety of settings, this thesis faces the significantchallenge of developing efficient analysis methods. The researches in this thesis address theproblem in designing fast, scalable algorithms for the analysis of time series.The research on time series analysis with the tasks such as preprocessing andtransformation data for the prediction purpose has a meaningful and popular in the case of bigsize data. If the data or time series data in particular can be preprocessed so as to improve theefficiency and lack of difficulty of the mining and discovering processes. There are a lot ofdata preprocessing data techniques; to remove the noise and correct incompatibilities in data,the cleaning techniques can be applied; to merge data from multi sources into coherent datastorage, the integration techniques can be used; to normalize data, the transformationtechniques can be referred. Data reduction is one of the meaningful techniques in thepreprocessing stage of time series analysis can reduce the data size by collecting, eliminatingredundant features. In general, time series predictability is a measure of how well futurevalues of a time series can be predicted, where a time series is a sequence of observations.Time series predictability indicates to what extent the past can be used to determine the futurein a time series. A time series generated by a deterministic linear process has highpredictability, and its future values can be predicted very well from the past values. A timeseries generated by an uncorrelated process has low predictability, and its past values provideonly a statistical characterization of the future values.This thesis makes four major contributions:Firstly, we propose the data preprocessing method to reduce the dimensions of timeseries in terms of the keeping the shape when compared to the original data in this thesis. The method based on the idea of turning points in a time series; these points are defined as thechange in the trend of the time series data. The turning points in time series are defined as thepoints that separate two adjacent trends and have the shortest distance from the release time ofannouncements. Only some of the critical points are preserved; those critical points, which areconsidered as interference factors are removed. This method only considers the critical pointsof each time series in a certain period in order to reduce the data size by eliminating redundantfeatures. This data preprocessing method, when applied before mining process, cansignificantly make better the overall quality of the patterns mined and the time required forthe actual mining. All of dimensionality reduction techniques are very meaningful topreprocess the large dataset and then use it to analyze and discover knowledge.Secondly, the next contribution mentioned in this thesis is the proposed method ofanalysis trend of the time series. This function is a short term prediction; this term is related toone-step-ahead prediction. The results of the combination method are the predicted valueswhich would be used for making the decisions by the trading rules. In this task, the clusteringis first the procedure of collecting the data into clusters; hence all the objects within a clusterwill have higher similarity than in comparison to one another but are very dissimilar to objectsin other clusters. After that, we consider the data classifcation procedure, where a classifer isconstructed to predict trend labels, such as “upward”,“no-trend” or “downward” for thefinancial data. The classifcation process for prediction trend implements in two sub-processes:learning and classification. The learning sub-process analyzes data by support vector machineand the learned classifer is represented in the form of classifcation rules. Then the next sub-process estimated the accuracy of test data depend on classifcation rules. In the case of theaccuracy is measured suitable, the rules can be applied to the classifcation of new futurevalues.Thirdly, the next contribution is the proposed method for predicting the future valuesdepend on historical values in the multiple time series environment. We think that it is animportant component of procedures research because these data results often supply thefoundation for decision making models. Modeling the time series data is a type of statisticalissue; and time series prediction techniques have been used in many real world applications.Prediction techniques are used in computational procedures to estimate the parameters of amodel being used to allocate limited resources or to describe random processes such as thosementioned above. And the problem of time series predictive analysis of the environment withmultiple time series also mentioned in this thesis. In learning machine approach, the supportvector machine can be used for regression is called support vector regression, support vectorregression has been applied successfully to stream time series analysis, but its optimization algorithm is usually built up from certain quadratic programming packages. A sequentialminimal optimization algorithm based on the support vector machine algorithm can improveoperation speed and reduce longer run time with large data sets of quadratic programming.Fourthly, the next contribution of this thesis is the proposed approach for businessintelligence management. The approach for business intelligence management solves theissues of collecting with filtering stock time series stream, then reducing dimensions with caneasily optimize, combine and test different features to execute a fast similarity search basedon the application’s requirements. Data collection is any process of preparing and collectingdata, and the purpose of data collection is to obtain information to keep on record, to passinformation on to others. When collecting data, it is important that the data collected are ofhigh quality so that they can be reliably used as the basis to make decisions. Data areprimarily collected to provide information regarding this approach. The collected data can benot only stored in storage space, but also analyzed and used for monitoring or evaluationpurposes. Business intelligence has an important role in effective decision making to improvethe business performance and opportunities by understanding the organization’s environmentsthrough the systematic process of information. Consider that business intelligence model is agroup of tasks of gathering the historical data, filtering the necessary data and using them topredict future value. This model helps to improve the performance of the organization.
Keywords/Search Tags:Time series, Stream time series, Dimensionality reduction, Pattern matching, Timeseries trend analysis, Time series predictive analysis, Business intelligence management
PDF Full Text Request
Related items