Font Size: a A A

Data Stream Clustering And Outlier Detection Based On Grid Coupling

Posted on:2020-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhangFull Text:PDF
GTID:2428330575489320Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A data stream is a sequence of data that can arrive sequentially,quickly,in large numbers,and continuously over time.In recent years,with the development of the Internet and hardware and software,more and more data streams are generated in various industries,making data streams gradually become a mainstream data form.There are many interesting knowledge and laws hidden in these data streams.If they are mined and used,they can play an important role in guiding and referencing people's decisions.However,unlike the previous static data,the data stream has the characteristics of infinity,time series,evolution,high dimensionality and temporal locality,which makes the traditional data mining algorithm can not be directly transplanted into the data stream.So how to mine useful information in these massive data streams to assist people in decision-making is a challenging problem and has received wide attention.Data stream clustering and anomaly detection are two important research branches in the field of data stream mining.Data stream clustering is a process of dividing a continuously arriving data stream into clusters according to their similarity.The data flow anomaly detection is to find abnormal data in the data stream that deviates from the normal value.In order to be able to process data streams quickly,most existing data stream clustering and anomaly detection algorithms use a grid structure to summarize data streams.However,when mapping data streams to grids and incrementally updating them,they ignore the interaction between the grids,assuming that the grids are independent of each other.Such a processing method causes the extracted data stream summary information to be inaccurate,directly affecting the accuracy of the data stream clustering and the anomaly detection algorithm.In order to solve the above problems,this paper mainly has four aspects:First,the idea of grid coupling is proposed.Grid coupling refers to the processing of the data stream to the grid and incremental update,and the impact of the data changes on the surrounding grid is considered,so that the correlation between the data can be more accurately expressed.Secondly,a data stream clustering algorithm GCStream-CL based on grid coupling is proposed.In the process of mapping data objects to grids,the algorithm considers the interaction between grids according to the distribution state of data in the grid.Based on this effect,it is determined whether the update of one grid increases or decreases the weight of the adjacent grid.Secondly,the GCStream-CL algorithm generates clusters by searching for density-connected meshes,and captures the evolution of clusters based on changes in high-weighted meshes.Thirdly,a data stream anomaly detection algorithm GCStream-OD based on grid coupling is proposed.The algorithm also follows the idea of mesh coupling in the summary data flow,and proposes a pruning strategy,which is to periodically detect the grid list and treat some grids with smaller weights as possible grids.Then,according to the mesh density and the distance from most data ob.jects,an anomaly factor is assigned to each low-weight grid to quantify the degree of grid anomaly.Fourth,the algorithm quality of GCStream-CL and GCStream-OD is verified on two artificial data sets and three UCI real data sets.In the GCStream-CL algorithm experiment.parameter selection,data set processing,clustering quality and clustering efficiency were verified respectively.In the GCStream-OD algorithm experiment,parameter selection,algorithm quality,memory occupancy and algorithm efficiency were verified.Experiments show that both GCStream-CL and GCStream-OD have high algorithm accuracy and efficiency.Fifth,an application case is designed for the GCStream-CL algorithm.The GCStream-CL algorithm was implemented on the "Completion and Ranking of Monitoring Indicators in Yunnan Provinces in January-February 2016",which realized a more reasonable assessment of the economic situation of each county.
Keywords/Search Tags:Data stream, Data stream clustering, Data stream outlier detection, Grid coupling
PDF Full Text Request
Related items