Font Size: a A A

Research Of Online Anomaly Detection Method For Streaming Data

Posted on:2016-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G DingFull Text:PDF
GTID:1108330482477046Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the fast development of modern information technology, data are emerged at the rate of million-magnitude per day or per hour and grow explosively. In big data era, the pressure and challenge are advent not only for the massive data storage, but also for the online processing. For these massive data, almost all of them are normal and demonstrate only some obvious facts, which have little value for decision-making; On the contrary, a very small part of them implies more value.The activity of finding difference observations from the massive data(i.e., anomaly detecting) has focused many researchers’ attentions in academic communities and engineering fields, which just like mining gold from the diggings and has exclusive research and application values. There exists some anomaly detection methods. However, in the big data era, the continuous and fast emergence of massive streaming data poses these traditional methods impractical and unsuitable for online analysis and processing. After analyzing the difficulties and challenges introduced by the anomaly detection of streaming data, several novel methods are proposed for anomaly detection of streaming data; Further, for some specific application fields, such as wireless sensor networks(WSNs), combing the networks topology and resource constraint, corresponding anomaly detection methods are proposed. The whole works are summarized as follows:Firstly, considering the intrinsic characteristic of anomalous data, i.e., few and different, an isolation-based anomaly detection is researched. Due to the online ensemble learning can deal with the dynamic distribution of streaming data in some extent, an improved anomaly detection framework for streaming data is proposed based on the isolation principle and online ensemble learning. The initial ensemble detector is trained based on the historical dataset and the update strategy is designed to online perfect the current detector corresponding the change of data distribution. Four real-world dataset are used to validate proposed method and the results are acceptable. What’s more, because the random selection the split attribute value is employed in the procedure of building the detecting tree, the ensemble detector needs to build much more individual detectors and this strategy degrades the whole detection speed. To solve this problem, isolation mechanism of building individual detector is further explored and statistic histogram is introduced to select the better split value for the selected split attributes, an improved online anomaly detection method is proposed based on statistic histogram and online ensemble learning theory. Here, the sliding window mechanism is used to adapt the dynamic distribution change. Three key parameters related to the method performance are exploited(i.e., sliding windows size, ensemble size and the histogram bin size). The performance of proposed method is validated on the same four real-world dataset, comparing to the existed methods, the proposed method has more advantages.Secondly, the isolation principle is researched furtherly and the hyper-grid concept is explored. Considering the initial hyper-grid based anomaly detection method has the expensive searching cost and long runing time due to the numberous detection neighbour fields, the improved L1 detection neighbour fields is proposed and some heuristic detection rules are defined. Further, because obtaining the best hyper-grid structure is always impossible in streaming data and considering the online ensemble learning theory has good generality and adaptive for the dynamic streaming data, some hyper-grid-based individual detectors are built to form an ensemble detector with the different size of hyper-cube. The simulation experiment on simulated datasets and real-world dataset all demonstrates proposed method effectiveness.Thirdly, wireless sensor networks(WSNs) is one of the major sources of streaming data generation, the anomaly detection methods are explored for this concrete application fields. The topology of network is analyzed and the spatio-temporal correlation existed in the sensed data of WSNs are explored firstly. Further, considering the limited resources in each node of WSNs, a distributed ensemble anomaly detection method is proposed. Consequently, the computing and memory requirement are evenly distributed on different sensor nodes and avoiding the resource-starvation of cluster head node for lengthening the lifetime of network. What’s more, considering the much more expensive communication cost in WSNs, how to reduce the communication quantity in the designing of anomaly detection method is an important factor. Though the aforementioned method has good generality performance, it needs to broadcast many individual detectors building in different sensor nodes to the cluster, which is hard to accept in the real applications for expensive communication cost. Inspired by the famous rule in community of ensemble learning, i.e., many could be better than all, the ensemble pruning strategy is introduced based on the biology-based optimization method. An improved method is introduced to save the communication resources and have the same or better detecting performance. The simulation experimental results show proposed method is useful and can meet the real applications in some extent.Finally, for the project of the Smart Car Networking, an anomalous trajectory detection approach is proposed to solve the problem of urban taxi-cab detouring fraud based on the researched results aforementioned. The location data in the process of driving, which are online collected from GPS devices installed on the taxi-cab, are analyzed timely. Based on the hyper-grid theory proposed, the mapping trajectory is proposed and an online anomalous driving path detection method is developed to prevent from the urban taxi-cab detouring fraud. The proposed method is validated in the real-word driving trajectory dataset, the experimental results show that the proposed method effective and useful in some extent, it can find the taxi-cab detouring timely and provides some help for urban civilization supervision and law enforcement...
Keywords/Search Tags:Streaming Data, Anomaly/Outlier Detection, Online Ensemble Learning, Isolation Principle, Hyper-grid Space, Resource Constraint, Ensemble Pruning, Wireless Sensor Networks, Taxi Detour
PDF Full Text Request
Related items