Font Size: a A A

Data Flow Anomaly Detection Technology Research And Application

Posted on:2011-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:F ChenFull Text:PDF
GTID:2208360308467127Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Because of various applications, such as network security, credit fraud detection and finance analysis, outlier detection is always considered as an important research area in domain of information science. Techniques from statistics, data mining, information theory etc. have been applied in this area and various effective outlier detection techniques have been proposed. However, with the increasing applications of information technology and rapid development of data acquisition automation, lot of data sets appear in form of data stream, such as data of commodity exchange, online media communication, etc. Thus data stream, considered as a kind of dynamic data set, has been brought into focus among academe's and industry's attention. Contrasted with static data set, data stream has a nature of magnitude, infinity and variousness (concept-drift), all of which bring challenges to the traditional way of outlier detection techniques. Especially the concept-drift of data stream, which makes outliers detection task much more difficult. How to find an outlier detection approach based on data stream, which can express the changes of outlier detection mode caused by concept-drift as well as finding outliers efficiently and effectively, will be an important research subject.We focus on detecting outliers over data stream in this paper. Solutions of concept-drift have been used in the research area of outlier detection. Through capturing and tolerating the change of data stream's distribution, an outlier detection approach, which is appropriate over data stream, has been put forward. The proposed new approach is of great value in applications.In the thesis, traditional outlier detection techniques and outlier detection techniques over data stream have been analyzed in the first place. Brief introductions and personal views of some typical detection approaches have been given. Based on above analysis, a reactive outlier detection approach over data stream has been introduced. Firstly, supervised and unsupervised algorithms have been combined to realize the basic outlier detection method over data stream, which can catch the unknown outliers and keep relative high efficiency as well. According to the key problem of concept-drift, tolerance strategy and detection of concept-drift have been applied to adapt the change of data stream. Through training data selection, training data adaptation and multiple classifiers ensemble, the robustness of classifiers has been improved. By quantitatively analyzing and compare characters of different data chock, which extracted from data stream at different time, the concept-drift detection has been implemented. And based on the detection results, adaptation of outlier detection mode can be activated properly. The experiments shows that, the approach of reactive outlier detection over data stream proposed in this thesis can not only adapt the detection mode in time by capturing concept-drift, but also dig out the potential outliers over data stream effectively. Finally, the proposed approach has been tried to apply in the area of P2P botnet detection. Through capturing the P2P bots traffic in the proceeding of bot's spread, we can help to find the compromised host hiding in the network and provide evidence to further detection of the whole botnet.
Keywords/Search Tags:Outlier detection, Data stream, Concept-drift, P2P bots traffic
PDF Full Text Request
Related items