Font Size: a A A

Analysis And Research Of Distance-Based Outliers

Posted on:2008-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:H X HanFull Text:PDF
GTID:2178360242488886Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier Mining is an important part of Data Mining,it includes outlier detection and outlier analysis. In order to detect outlier effectively,researchers have proposed a lot of algorithms to detect outlier. Such as algorithms based on statistical,distance, density,depth,deviation and so on.All the algorithms are focus on the research of outlier finding,although can detect outliers,can not reflect the causation and origin of outliers. In most instances,these algorithms lack actual applied value. The purpose of researching outlier is not only finding outliers,but analyzing the causation and origin of outliers. These problems win the user's more and more attention,also they are the aim of analyzing outlier.Outlier Knowledge Sets are the least attribute sets which can explain and describle why outliers are exceptional;strongest and weak outliers are the classification of outliers;attribute lattice can intuitionisticly denote all the outliers,knowledge sets and the relation between them; outlying similar pattern can describle the possibility of outliered action. The paper thoroughly analyzes the outliers based on these aspects.On the basis of the above study, we converted our focuses on outlier analysis. The author's main work are listed as follows:(1) The paper presents an algorithm to find outlier knowledge sets, the algorithm combines with outlier detection. It analyzed attributes from low to high dimensions,it can find all knowledge sets. The experimental results show the algorithm could find knowledge set and explain the outlying reasons of outliers effectively.(2) To improve the efficiency,a sampling-based approximate method has been developed. With this method the algorithm can manage large dataset well. Experiment has been carried out with real data,the results and academic analysis indicate the efficiency of algorithm syncretizing sampling method.(3) The definition which is combined with knowledge sets classifies the outliers, and finds the relation botween different outliers. Use attribute lattice can denote the outliers and knowledge sets effectively and intuitionisticly.(4) The paper uses directed graph denote outlier groups and the similar relation botween them,the issue of mining outlying similar pattern is the course of finding the most longest path. Author uses examples analyze the similarity and the course of outlying similar pattern mining.
Keywords/Search Tags:knowledge set, strongest outlier, weak outlier, attribute lattice, similarity, outlying similar pattern
PDF Full Text Request
Related items