Font Size: a A A

Research And Application Of Time Series Mining Method Based On Shapelet

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:M W ZangFull Text:PDF
GTID:2428330614963908Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Time series data is a very important research object in the field of data mining.It exists widely in various fields and has research significance.The characteristics of time series data are high latitude and large amount of data.The use of supervised mining algorithms requires a lot of human resources to add class labels to data objects.When there is no information about the class in the data set,it is more efficient to use unsupervised solutions to classify large amounts of data.In recent years,as new and emerging concepts such as cloud computing and big data are widely used,research on unsupervised solutions such as clustering algorithms has also increased.These algorithms can extract effective information from large amounts of data.According to the characteristics of time series data mining in the context of big data,this thesis proposes a time series mining algorithm ESUs based on the sequence local feature shapelet.ESUs is a clustering algorithm,where the local feature used is U-shapelet,which is an unsupervised form of shapelet,which has the advantages of strong interpretability and strong anti-noise ability.First of all,the article analyzes the principle of ESUs is to use the original shapelet discovery algorithm OSF improvement ideas,by analyzing the characteristics of the original algorithm and the improved algorithm,and drawing on the improvement ideas of the representation method and measurement method.Secondly,an enhanced symbolic representation method is proposed,which adds a sequence of trend information on the basis of symbolization to prevent excessive loss of trend information in the dimension reduction representation.In addition,the improvement in the measurement method is by proposing a measurement method suitable for symbolized sequences,symbolized Hamming distance,which is characterized by simple calculation,low complexity,and can reflect the significant differences between sub-symbol sequences.Finally,two improvement ideas are applied to the original U-shapelet clustering algorithm to improve OUs E,and experiments have proved that ESUs have significantly improved accuracy and time efficiency.This thesis also considers the practical value of the algorithm in engineering and implements a distributed time series processing system.The system design adopts the popular distributed microservices idea at this stage,and splits the system services into modules.The system module includes three parts: system registration login and authentication module,time series storage and sorting module,and data analysis module.In addition,a simple web visual interface is designed to provide a graphical user interface,including the following functions: server subscription management,processor status management,and data analysis operations.In the system design,each service function has low coupling and is easy to expand.The data processing function includes a large number of data processing pipeline templates,which are very convenient to use and maintain.
Keywords/Search Tags:time series, data mining, clustering, shapelet, distributed
PDF Full Text Request
Related items