Font Size: a A A

Research On Collective Anomaly Detection Approach For Mining Abnormal Patterns Of Sequence Data

Posted on:2021-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D HuangFull Text:PDF
GTID:1368330614459970Subject:Business Administration
Abstract/Summary:PDF Full Text Request
Sequence data refers to a series of data that are collected in sequence which have specific meanings and cannot be reversed in the process of analysis.With the advent of the era of big data,Internet,Internet of things and computing intelligence technology have been developed rapidly,the magnitude of data collected and exchanged by various kinds of interconnected sensor devices has grown explosively.Sequence data,as one of the most common forms of information collection,widely exists in various business processes.For example,the vibration signals generated by machines,the flow of various kinds of traffic media in the urban traffic system,the price fluctuation curve of the stock market,and the biological waves of the human body monitored in clinical medicine,etc.Although these data are significant differences in attributes,structure and relationship,they are all formed by sequence structure.As the main information output mode of business process,its sequence features often imply the specific laws and potential characteristics of the system.Thus,how to analyze the sequence data to reconstruct the dynamic behavior of the observed systems and predict as well as regulate the system by mining business process patterns with management value is an urgent problem to be solved.The research of sequence data is a new interdisciplinary field,which integrates a variety of mature theories and tools such as database,probability and statistics,machine learning,artificial intelligence,etc.In recent years,the pattern mining based analysis method is active for the characteristics of sequence data,such as temporality,relevance and high dimension.According to application requirements,pattern mining research can be divided into frequent pattern mining and abnormal pattern mining.Although the general research mainly focuses on the discovery of frequent periodic transformation patterns,in some specific application scenarios,the discovery of abnormal business process patterns often has greater value.In this paper,firstly,sequence data are divided into temporal and spatial sequence data based on the data attribute;secondly,according to the characteristics of various kinds of sequence data and the target of abnormal pattern mining,we constructs the metric of collective anomaly and transforms the problem of abnormal pattern mining into collective anomaly detection;finally,targeted collective anomaly detection methods and algorithm frameworks are designed to improve the efficiency of abnormal pattern mining in each kind of sequence data.The main research work of this paper is as follows:1.For the problem of abnormal pattern mining with pre labeled information,if the data to be tested has sufficient normal sample label information,the abnormal pattern mining goal is to mine the pattern with the largest difference in similarity between the normal pattern and the temporal data.Based on it,a collective anomaly detection approach based on data distribution fitting is proposed in this paper.In the model,the multiple mixture Gaussian distribution is used to fit the distribution function of the collective anomaly in the sequence data.According to the maximum likelihood method,the similarity measurement method of the distribution characteristics of the sample data and the data to be tested is constructed,and a method of solving the likelihood equation based on the fixed point iteration is designed.However,if the data to be tested has sufficient abnormal sample label information,the goal of abnormal pattern mining is to mine the pattern with the smallest difference in similarity between the abnormal pattern and the time series data.Due to this,a collective anomaly detection approach based on the hierarchical clustering algorithm is proposed to match the characteristics of abnormal patterns.In the model,firstly,hierarchical clustering is carried out according to different abnormal measurement rules,and then the similarity measurement method of sample data and data to be tested is constructed by comparing the same level clustering cluster and the information in the upper and lower level clustering clusters.Besides,an improved clustering algorithm FPK medoids(fixed point k-medoids)based on fixed point iteration is designed,which improves the convergence efficiency by processing each cluster in parallel.2.For the problem of mining abnormal patterns without sufficient sample labels,it is impossible to determine the boundaries of various patterns,and there is no clear criteria for judging abnormal patterns.The goal of abnormal pattern mining is to divide the data boundary corresponding to different patterns from time series data,and then identify abnormal patterns by comparing the characteristics of each pattern.Hence,we propose a collective anomaly detection method based on the transformation probability between different patterns.In the model,firstly,ant colony algorithm is used to fit the data boundary corresponding to all kinds of patterns in the sequence data,secondly,the conversion probability measurement between each pattern is constructed by the pheromone concentration,besides a continuous ant colony algorithm based on the simplex method of fixed point is designed to optimize the initial parameters.3.For the problem of mining abnormal patterns in spatial homogeneous sequence data,sequence data derived from different spatial attributes are used to describe similar behavior attributes and these behavior attributes are intended to express the same goal.Therefore,most homogeneous sequence data are generated by similar mechanism and have similar data distribution.The main idea of abnormal pattern mining is to fuse multi-source sequence data to eliminate the influence of spatial attributes,and then analyze the fused data according to the way of processing temporal sequence data.To solve this problem,by taking the fusion of multiple types of traffic data to predict the real-time abnormal traffic states in cities as an example,this paper proposes a new method of detecting collective anomaly based on the fusion analysis of homogeneous sequence data in different resolutions.In the model,firstly,the urban traffic information is analyzed from three kinds of resolution of single traffic data,multiple traffic data of traffic detection points and traffic data of traffic hub points.Secondly,the collective anomaly measurement method is constructed by comparing with the overall data change trend,besides an improved DDWk-medoids clustering algorithm is designed based on the metric of "distance-density-weight",which can adaptively determines the optimal initial parameters such as the number of clusters and the initial center point.4.For the problem of abnormal pattern mining in spatial heterogeneous sequence data,sequence data with different spatial attributes are intended to describe different behavior attributes of the same object,but these behavior attributes are interrelated.Hence,most heterogeneous sequence data come from different generation mechanisms,and they are different in data category,structure and distribution.The main idea of abnormal pattern mining is to fuse the analysis results of multi-source sequence data according to the association between each kind of heterogeneous sequence data,then the abnormal pattern is mined according to the fusion results.For this,we design a collective anomaly detection method based on adaptive weighted fusion heterogeneous sequence data.In the model,firstly,multi-window technology and correlation analysis technology are used to determine the correlation relationship,and then the weights are adaptively determined through the two-layer particle swarm framework.Besides,an improved FP-PSO(fixed point based PSO)algorithm is designed based on the fixed point simplicial method.The approximate set of fixed points searched by the fixed point simplicial method in the solution space is taken as the initial population and other parameters are set accordingly.
Keywords/Search Tags:Sequence data, Abnormal pattern mining, Clustering algorithm, Collective anomaly detection, Swarm intelligence algorithm, Fixed point theory
PDF Full Text Request
Related items