Font Size: a A A

System Anomaly Analysis Based On Big Data

Posted on:2021-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:T S LiFull Text:PDF
GTID:1488306305453134Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of new technologies such as Internet and Internet of Things,the era of big data has come.The amount of data is massive,and the data have very complex relations.However,there are very few valuable data among the data collected,so called 'anomaly data'.Some of the anomaly data are negative,such as industrial alarm data;some are positive,such as good news.The research on how to find out the rare anomaly data or exceptions among the flooding data quickly and accurately is thus very important in theory and practice.This dissertation aims to study the methods to detect the patterns of anomalies based on big data.According to data type,the research area of this dissertation can be divided into three parts:anomaly detection for structured data;anomaly detection for unstructured data;and correlation between anomaly data.Hence,main contents of this dissertation include:1.Multi-variate alarm detection method for structured data is studied.An extremum extraction algorithm is proposed in the dissertation,which is used to extract variables for multi-variate alarms.This algorithm can pull out variables related to the alarm from many monitored variables one by one through a procedure similar to the Markov chain.A reward function is used to measure the relation between the variables to be tested and the alarm.Based on the sample data extracted with the proposed algorithm,a multi-variate alarm detection model is built via a machine learning method,which can improve the accuracy of alarm detection.The proposed method is applied to the icing detection problem of wind turbine blade.A blade icing alarm model is buit via delay timer.The model is verified with real wind turbine monitoring data,and it is shown that the model can find out proper variables related to the blade icing and good prediciton results are obtained.2.Anomaly text document detection method for unstructured data is studied.A text feature cluster concept is proposed for unstructured data,and a detection algorithm is designed to find out the text feature cluster based on the traditional CHI statistical algorithm with improved TF-IDF word frequency weight.The text feature cluster can more precisely describe a text than the text feature so it can help improve the accuracy of text classification.Then a text vector space model is built to detect text class based on the text feature cluster concept.The proposed method is applied to the massive text documents on bidding announce from the internet.Taking the biding announce documents as text data sample set and the biding announce documents focused by each user as one anomaly text set,text feature cluster of each user can be obtained via the proposed detection algorithm,and then the real-time acquisition data from the internet will be determined whether to meet the need of the users via the built text vector space model.The classification results via the text feature cluster are more accurate than the traditional classification via the text feature.3.Correlation between alarms is studied.An improved agglomerate hierarchical clustering algorithm is proposed to cluster the time-sequenced alarm data.Then taking the time of alarms as direction,the time pseudo vector concept is proposed in the time dimension.To measure the strength of the time pseudo vector relations,conditional probability is used.With the help of the two concepts,a mining algorithm to detect vector relation between alarms is designed,which can be used to compute the degree of correlation between alarms in the massive alarm data by statistical analysis.A pre-set correlation threshold can determine whether the correlation between alarms is strong,and strong correlation between alarms can help to predict the direction and probability of alarm backward transmission.The proposed method is applied to the massive alarm data of a power plant and good results are obtained.
Keywords/Search Tags:big data, anomaly detection, text feature cluster, text vector space, alarm correlation
PDF Full Text Request
Related items