| Data Mining is a new technique developed from 1980s.It aims to extract the implicit, unknown, and potentially useful knowledge from voluminous, non-complete, fuzzy, stochastic data. Outliers' analysis is an important part of data mining research. Its purpose is to find the "small patterns" from dataset. An outlier is an object that is considerably dissimilar or inconsistent with the remainder of the data. After 20 years of development, on the theory, data mining techniques is becoming more and more consummate and is expanding its application area. Now, data mining has been used in telecom, finance, busyness, weather forecast, DNA, stock market, intrusion detection and customer segmentation etc. So in this paper we first research the algorithm of outlier detection based cell, point out its short comings, and then designs a new algorithm based on the grid model.The main works in the thesis are listed as following:1. Summarizing the problem of outlier mining from the realistic meaning, algorithms, application ranges detection tools, algorithm's evaluation, etc.2. To overcome the limitation of existing algorithms for outlier detection, this paper provides a new algorithm based on the gird model. It improves the algorithm's efficiency through the gird modeling on the data set and the transforming of the model.3. Gives the method to partition the data space by the grid model, defines the boundary value of judging if there's outlier existing in one grid, and gives the algorithm, which can be used in detecting the outliers correctly with less time.4. An experimental platform named ED(Elnino Detection) has been constructed, which integrates the algorithms proposed or improved in this paper, and provides a tool for the analysis of outlier detection .It can obtain data from the official Elnino dataset and the data can be searching and analyzing by this platform easily .5. Combining the characteristic of the collection of the climate dataset, this paper discusses the necessity and method for using outlier detection in the climate monitor system.Our purpose is to construct an experiment platform for Mining Outlier from real data. Several organized aspects are included in this paper:Gives the way to partition the data space by the grid model; Defines the boundary value of judging if there's outlier existing in one grid;Gives the algorithm based on data grid model;Verification of the algorithm's based on the real dataset.This paper implements a platform based on Eclipse RCP, and used this platform to verify the algorithm by analyzing the Elnino dataset. The test result shows that this algorithm proposed by this paper is efficient. |