Font Size: a A A

Data Mining, Outlier Detection Algorithm

Posted on:2010-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J FanFull Text:PDF
GTID:2208360278470294Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the most exciting fields of database research, data mining, refers to a process in which the implied, unknown knowledge of potential use is excavated from a vast collection of data. Outliers detection is a important part of data mining research. Its purpose is to find the "small patterns"from dataset. An outlier is an object that is considerably dissimilar or inconsistent with the remainder of the data. With the development of nearly twenty years, outliers detection has been used in many fields. Traditional outlier detection algorithms have encountered some insurmountable obstacles. For instance: to select the algorithm's parameters is difficult which lead to an unstable result; the algorithm is difficult to meet the high-dimensional data characteristics; and so on. These paper mainly for the above problems do some research on outlier detection algorithm.The existing outlier detection algorithms are introduced and analyzed in this thesis. The applying extent and shortages of the existing outlier detection algorithms are pointed out. The main content of this thesis is as following:Based on the research of the algorithm of based-cell outlier detection, a new outlier-analysis algorithm is presented. A dynamic adjustment function on dataset boundary threshold is used to solve the problem of mis-judgement about the outlier on the boundary. As for the problem that it is needed to input the value of distance D manually, the mean is used to in place of manual inputting. The new algorithm not only reduces the mis-judgement to the outlier on the boundary, but also decreases the input of parameters and promotes the degree of automatization of calculation.As for the problem that the algorithm is difficult to meet the high-dimensional data characteristics, an algorithm based on rough for outlier detection is proposed. A new method is offered for the research of outlier detection algorithms. In addition, in order to prove its feasibility, relevant experiments have been performed.
Keywords/Search Tags:Data mining, Outlier Detection, Benford law, Rough
PDF Full Text Request
Related items