Font Size: a A A

Research And Implementation On Key Techlogy Of Data Stream Mining

Posted on:2016-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:X DengFull Text:PDF
GTID:2308330473455910Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As classic topics in the area of data mining, outlier detection and data clustering have always been widely concerned in the academic world. But with the popularity of sensor networks and the arrival of the era of big data, more and more data has transformed from the traditional static data sets to the dynamic data stream, this bring new challenge to the outlier detection algorithm and data clustering algorithm that based on the static data sets. Compare with the static data sets, dynamic data stream is massive, real-time and dynamic changing with the time, and in some situation data stream may be high dimensional. These features greatly increase the difficulty of outlier detection and the classification and clustering of the data stream. Therefore, the design of the outlier detection model and clustering model that based on the data stream is a pressing problem.This thesis starts from the basic features of the data stream, researching on the questions of data stream mining, and here are my main work and research results:1. In order to solve the problem that the traditional sliding window based data stream outlier detection algorithm(SWDSOD) may not be accurate in some scenarios, such as the outliers in the sine wave signal. We proposed an outlier detection method that based on the time fading model(Fading Model-based Data Stream Outlier Detection,FMDSOD). This algorithm not only use the time distance between any two data points to calculate the weight of these two points, but also reduce the calculation of original algorithm by defining a clever data structure. Through experimental analysis, it shows that FMDSOD algorithm is better than SWDSOD regardless of the accuracy or the efficiency.2. In order to solve the problem that performance of the data stream clustering algorithm E-Stream sharply decline during the process of high dimensional data stream clustering, we proposed a method to calculate the feature dimensions of the cluster. By analyzing the feature dimensions of each cluster formed by the sample data stream, in the judgment of which cluster should the newly arrived data points belong to, we can only calculate the feature dimensions associated with the cluster, ignoring the calculations of the redundant dimensions. Effectively reduce the computational of the algorithm, effectively improve the execution efficiency of the algorithm in the case of cluster purity comparable to the original algorithm. The introduction of this method is a good method to make up for the lack of E-Stream in high-dimensional data stream clustering.3. By using the FMDSOD algorithm and the feature dimension based high-dimensional data stream clustering algorithm proposed in this thesis, we have designed and implemented the distributed data stream processing system, including the data stream outlier detection and data stream clustering scheme. FMDSOD algorithm is responsible for the system’s outlier detection module, and feature dimensional based data stream clustering algorithm is responsible for data clustering module of the high-dimensional data stream. Through system deploying, running and testing, both algorithms eventually achieve the desired effect with high practical value.
Keywords/Search Tags:high-dimensional, data stream, sliding window, outlier detection, clustering
PDF Full Text Request
Related items