Font Size: a A A

An Online Approach To Detecting Correlations Among Large-scale And Multi-source Data Stream

Posted on:2018-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:X B WangFull Text:PDF
GTID:2348330515983292Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of sensing technology and the widespread use of Internet of Things technology,tens of millions of sensors have been deployed in industrial control systems in various fields.Since these sensors continue to generate data streams and send to the data center in a industrial control system,the volume of accumulated data has increased explosively.In the face of these sensors corresponding large-scale sensor flow data,how to effectively detect the hidden correlations in such large-scale sensor data streams has become an important research topic and faces a huge challenge.At present,many researchers focus on analyzing the statistical correlations among multi sensor data streams.The straightforward method is to compute the correlation for each pair of sensor data streams.However,owing to the various speed of different sensor data streams,there may exist delay.In this case,two sensor data streams have a stastistical correlation when one of them is delayed by t timestamps.To solve such delay,we can shift one sensor data stream with different distance and compute stastistical correlation between the shifted stream and the other one.The largest value will be selected out as the final correlation between the two sensor data streams.However,such approah is too expensive of processing data streams.In this paper,we discuss the above delay problem,and model the correlation among multi sensor data streams as the statistical correlation with time delay.To handle the chanllege brought by delay,we firstly cluster the input multi sensor data streams by similarity.Our proposed clustering method firstly symbolize the input numeric sensor data streams on top of existing work in the field of time series analysis.Using the frequent sequence length as the similarity among symbolized sequences,our clustering method discovers frequent sequences to do the clustering and then perform multiple linear regression analysis in each cluster with more than two elements.The main contributions of this paper is listed as follows:Firstly,taking the features of sensor data streams into consideration,we model the correlations among multi sensor data streams as statistical correlation with time delay from the aspect of correlation analysis..Secondly,borrowing the studies in the field of time series analysis,we transform input numeric sensor data streams into symbolic sequences.Using frequent sequence length as the similarity among symbolic sequences,we propose a similarity-based clustering method to cluster sensor data streams.Following this,we perform multiple linear regression analysis in each cluster to handle more than two data streams.Based on these algorithms,we design an online correlation detection framework to detect statistical correlation with time delay among multi sensor data streams.Thirdly,based on the real scenario of power plant,we design a correlation-based anomaly detetion system.A lot of experiments are done on the real data set of a power plant to verify the effectiveness and efficiency of our method.
Keywords/Search Tags:correlation, multi sensor data streams, online detection, anomaly detection
PDF Full Text Request
Related items