Font Size: a A A

Data Stream Clustering Analysis And Anomaly Detection Method Research

Posted on:2016-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:W JiFull Text:PDF
GTID:2348330491961525Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With information technology in the industry production process continuously improve, resulting a large number of data in the process, the real time datum reflect the state of production system. Take the advantage of data mining to analysis the data not only improves production efficiency but also optimizes the production process and realizes the real-time detection of the production. So that, the detect anomalies make the production process more secure, efficient and environmental than before. The clustering of data stream is a important technologies to achieve the above objectives. So researching the method of data stream clustering has important significance and application value. The following are main disadvantages of current data stream clustering algorithm:(1) parameters have high dependence; (2) Online anomaly detection effect is poor; (3) historical data processing is not reasonable;This paper mainly studies the data stream clustering method and its real-time detect abnormal aspects of the application in chemical production, specific as follows:(1)To study the traditional grid clustering method, in view of the algorithm is sensitive to the grid width parameter settings in the shortcomings, puts forward a grid clustering analysis method based on rough set-SCGD (Soft cutting the grid of dimension space) algorithm. By rough sets data set dimension spatial clustering of particles is changed and then produce dimension space clustering cluster, reduce man-made grid set unreasonable, so that the grid size is more accord with the practical distribution status data, but also reduced the number of grid, effectively improve the clustering accuracy and speed.(2) For chemical production data can not realize online detection of defects, put forward the dynamic incremental clustering method information entropy, in this paper, based on the density of grid ideas proposed a maximum information entropy based on dimension cluster space of DSC-Stream(The Dimension Space Cluster-Stream) data stream clustering algorithm, the algorithm inherits the advantages of rapid processing data density grid algorithm realizes the combination of maximum information entropy principle on on-line detection of abnormal data.(3) The simulation of human memory characteristics of the damping model for data flow attenuation of historical data is proposed is a good solution to the historical data in data stream. Damping model is different from the previous model. The time factor is not the only factor, according to the principle of different type has different attenuation of strategy for its to thick the essence of data mining to provide quality "source", improving the quality and efficiency of the mining.In addition, the sensitivity of parameters in the algorithm is reduced by the similarity dimension of cluster. By reason that the similarity is not affected by specific density threshold constraint, the algorithm is better adapt to stream data. Through the experiments, it is found that the proposed algorithms in this paper is effective according to UCI data set and TE data set.
Keywords/Search Tags:data-stream cluster, rough set, grid, maximum entropy, anomaly detection
PDF Full Text Request
Related items