Font Size: a A A

Distributed Anomaly Detection Based On Data Mining

Posted on:2011-09-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L ZhouFull Text:PDF
GTID:1118360308965901Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a very important tool in knowledge discovery. It aims at discovering the valuable patterns hidden in large volume of data. Anomaly detection plays an important role in the four tasks of data mining. Comparing with predicting, clustering and association roles, anomaly detection better reflects the original purpose of data mining. The results of anomaly detection are always more valuable than that of other data mining tasks. For example, ten thousand normal records could only be represented by one pattern, but ten anomalies may indicate ten patterns. Anomaly detection has been widely applied in many areas, such as credit card fraud detection, drug research, medical analysis, consumer behavior analysis, weather forecasting, network intrusion detection, etc. Nowadays the information industry has been developed in a very high speed. With the continued expansion of business scale and constant updation of services content, there is an urgent need for enterprises to adopt a distributed solution for managing the complex heterogeneous environment and co-operating of different hardware devices, software systems, network environments and databases. It is a real challenge for researchers and engineers to achieve the global anomaly detection in a distributed environment. In a distributed environment, the main problem we need to consider for anomaly detection is how to transfer the minimal amount of data and provide maximum sharing of information, while to ensure the accuracy of anomaly detection and data privacy of all parties. To resolve this problem, we have done many innovative and exploratory researches in distributed anomaly detection from the perspective of data mining. Our main researches are described as follows:1. First of all, we explained the definition of anomaly detection and described a variety of existing data mining based anomaly detection methods in detail. Then we analyzed their advantages and disadvantages respectively, and compared with related researches in recently.2. By comparing with the difference of centralized and distributed anomaly detection, we proposed a framework for distributed anomaly detection based on ensemble learning. Applying this framework, respectively, for supervised learning and unsupervised learning anomaly detection methods have been studied. Experimental results showed that the proposed framework for distributed anomaly detection achieved comparable or even better detection results comparing with centralized methods, while ensuring data privacy of all parties.3. We studied the anomaly detection of data streams in distributed environment and proposed a reactive model for detection the concept drifting in data streams. The proposed method can effectively and efficiently detect the outliers in concept drifting data streams with one pass.4. We studied the unsupervised anomaly detection of high-dimensional data in distributed environment. For high-dimensional scientific data, an adaptive spectral clustering method has been proposed. The experimental results on numerical simulation of molecular dynamics scientific data have shown the improvement of proposed method.5. We studied the privacy issues in distributed anomaly detection research, and proposed a privacy preserved support vector machine classifier. Experiments demonstrated that the proposed method can guarantee the privacy of data while the comparable detection performance can be achieved comparing with the original support vector machine. We studied the personalized privacy preserving data mining issues and proposed a method for distributed anomaly detection based on data perturbation.
Keywords/Search Tags:data mining, anomaly detection, distributed ensemble learning, concept drifting detection
PDF Full Text Request
Related items