Font Size: a A A

Analysis Of Time Series Data In Big Data Background

Posted on:2019-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y L YangFull Text:PDF
GTID:2370330572469012Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Based on the research background of big data analysis,this paper systematically studies the mining and analysis of time series data from the perspectives of data char-acteristics,research contents,application techniques,and application scenarios.Based on the research background of big data analysis,analyzing and modeling the collected data is one of the most important processes in current research.With the increasing and abundant amount of data,it is possible to effectively establish mathe-matical models based on the characteristics of the data,in combination with relevant knowledge in the field or within the industry,so as to scientifically explain everyday phenomena and make data or models help us understand the nature of the problem.This paper systematically studies the mining and analysis of time series data from the perspectives of data characteristics,research contents,application techniques and appli-cation scenarios,it retells the problems of predicting and ordinal classification of time series data,and analyzes other issues with sequential data.The application scenario is a pilot to explain the hot issues in time series data mining and analysis in the context of big data:hot spot discovery in individual stocks and commercial circle identification in POI data.From the view of a mathematical researcher,prediction is a topic that is often men-tioned,it is essentially a regression problem.The era of big data brings new colors to the problem of regression.It not only provides a rich data source for modeling,but also makes higher requirements for the efficiency and effectiveness of prediction.Forecast not only keeps at the predicted value for the next moment,but also requires a one-time forecast for several points in the future.In this study,monthly data based on macroe-conomic indicators were used to perform prediction experiments for the next 3 months or next 6 months,and the effects of different prediction algorithms were compared.In addition,the LSTM model of time series neural networks tends to be difficult to adjust parameters,unstable model,and it is easy to become over fit for a little sample set when dealing with regression problems.According to this,by increasing the strategy on val-idation set for selecting models,to a certain extent,The workload training process is increased,but the prediction accuracy and stability of the model can be guaranteed.Different from the general classification problems(such as the identification be-tween passenger cars,taxis,cars,and train),the focus of the ordinal classification(such as the type of stock market soaring,slight increase,fluctuation,slight decline,and heavy fall)is on how to express the relationship between different classes.This paper proposes the use of an ordinal classification algorithm(Paired_Code)based on a pairwise com-parison strategy,and presents the whole implementation steps of ordinal classification using a pairwise comparison strategy,which including:paired to balance samples,how to design a pair of comparative tags for coding,how to transforms the ordinal classifica-tion into an disordered classification with the help of The pairwise comparison strategy,that is a regression problem of the vector,and then how to transforms the result of the disordered classification(vector regression)into an ordinal classification category.At the same time,we study the properties of class-tag codes for ordered and disordered classifications and find that:disordered multiclass class codes(such as one-hot coding)are mutually orthogonal in inner product space,any two different The angle between the class codes is π/2;But,for the ordinal classification,the codes after the pairwise comparison are non-orthogonal,and the angular distance is ordered with the class of the pairwise comparison sample.The angle between any of the different categories is[arccos 1/(?),π/2).In the numerical experiments of ordinal classification,the algorithm is performed with ordinal classification logistic regression[1,2](that is called as Logisti-cOP),SVMOP[3,4],SVORIM[3,5]SVOREX[3,5]and ELMOP algorithm.From the fore-cast results of test sets with balanced distribution of classes,the Paired_Code algorithm has good performance on ordinal metrics.However,the Paired_Code algorithm has a slower speed for classification,which can be improved in the follow-up study.In the context of big data,the needs of the analysis and mining for time series data have evolved along with the development of the society.For example,techniques such as web crawlers,Chinese word segmentation,natural language,and cluster anal-ysis have been continuously applied to analysis scenes.In this study,the discovery of hotspots in individual stocks and the identification of commercial circles in POI data are taken as examples to introduce feasible strategies for discovery and identification.This provides a feasible construction plan for the analysis and mining of time series data.
Keywords/Search Tags:Time series data, Medium and long term forecast, Ordinal classification, Hot spot discovery, Commercial circle recognition
PDF Full Text Request
Related items