Font Size: a A A

Research On Algorithm Of Outlier Mining In Data-intensive Computing Environments

Posted on:2015-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ChenFull Text:PDF
GTID:2298330431979338Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, data show explosive growthin many fields, such as in the medical, the commercial, the people’s livelihood, scientificresearch and military field. The research of data mining algorithms under data-intensivecomputing environment has been more and more popular among people. Data mining indata-intensive computing environment consists of four aspects, namely, clustering datamining, classification, frequent item-sets mining and outliers mining. Outliers mining isone of the hot spots in the current research.This paper first elaborates the characteristics and forms of data in data-intensivecomputing environment, and discusses the current research status of outliers miningunder data-intensive computing environment and its necessity for depth analysis. Then,classical algorithms of outliers mining in traditional data sets are discussed also. Basedon research and analysis, current outliers mining research mostly focuses on algorithmsdepending on the statistical distribution, depth, distance, clustering and grid method.Studies for outliers mining algorithms of the data-intensive computing environment arerare.This paper presents MR_LOF and MR_DBScan outliers mining algorithms indata-intensive computing environments, and introduces the working principles of thealgorithms in detail. MR_LOF and MR_DBScan algorithms are on the basis of LOF andDBScan algorithms depending on MapReduce model. They are outlier mining methodsby combining grid with density technology. In the Map phase, we reduce data with gridand send the representative information to the master node. During the Reduce phase,we use outliers mining algorithm based on density and screen the populated areas withthe help of grid expectations E. The algorithms only need to calculate outlier levels ofobjects in sparse region. Then the complexity of the algorithm is reduced. Theexperimental results show that in data-intensive computing environment, the algorithmsare effective for mining outliers.
Keywords/Search Tags:data-intensive computing, outliers, MR_LOF algorithm, MapReduce
PDF Full Text Request
Related items