Font Size: a A A

An Outlier Mining And Paralleling Method Based On The Grid Cell And P Weights

Posted on:2017-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:T T FengFull Text:PDF
GTID:2348330509452860Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is the nontrivial process of finding valuable information and knowledge from mass data which can provide service to decision support,and outlier mining is one of an important part in the field of data mining. Outlier often contains some new and important information, which has very high value,and has been widely used in network intrusion detection, the credit card frauddetection, environment monitoring, medical science and other fields. In this thesis, outlier mining methods and parallelization based on grid cell,P weight and MapReduce programming model are studied by combining the spatial distribution characteristics of data sets and the grid cell.The main research results are as follows:1) An outlier mining algorithm CWOP is presented by using the grid cell and P weights. In the algorithm, the data set of every dimension and grid cell are divided. In every grid cell, the outlier and normal data grid cell are selected.For the grid cell contained both outlier and normal data, the outlier objects are measured and determined by using P weights method, so as to further improve the accuracy of outlier mining. Finally, the experimental results validate the feasibility and effectiveness of the algorithm by using the UCI data sets.2)A parallel outlier mining method based on the grid cell and P weights is presented under MapReduce programming model. Three Map and Reduce functions was introduced to calculate maximum and minimum values of every dimension data object, and object number of each cell and its neighborhood cell.P weights of the outlier candidate objects are calculated. A parallel outlier mining algorithm based on grid cell and P weights is designed and implemented using Java programming language and Eclipse development tool.In the end,experimental results validate the effectiveness,scalability and extensibility of the algorithm using UCI and the celestial body spectrum data.
Keywords/Search Tags:Data mining, Outlier, Grid cell, P weights, MapReduce programming model
PDF Full Text Request
Related items