Outlier Data Mining Algorithm Based On Distance And And Application

Posted on:2014-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:S J Lou

Full Text:PDF

GTID:2248330395991763

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology, especially networktechnology, People’s capabilities to collect, storage and transmit data continue toimprove. The situation of data-rich and lack of knowledge is formed. A newdiscipline-Data mining, come into being. Outlier mining is one of theimportant research content in the field of data mining. In this thesis, outlier datamining algorithms and its application are studied by using the triangle inequalitymodel-based pruning techniques and p-weight as well as information entropy,according to clustering characteristics in most of datasets and attribute weightimpact on the mining results. The main works are as follows:1) An outlier mining algorithm (OMAW) based on p weights is presented.Firstly, outlier candidate set is found out by using the pruning techniques oftriangle inequality, only candidate set is in memory. Then the candidate from thegroup focused each data point, divide kind of case: if the neighbor not reach theK value, giving a relatively power value, if reach K value, using p weightsmethod, computed the data object and its KNN distance sum, the sumbigger,the more object is a outlier; From the group will focus on each of thecandidates data point weights according to size, and determine whether from thegroup for data, so that concealment and submergence phenomenon are overcamein outlier detection process.2) A new outlier mining algorithm based on information entropy andWk-distance is presented. Firstly, information entropy is employed to determineweight indicating each attribute’s importance degree. Secondly, the dataset isreduced by using pruning technologies based on neighbor radius, so that thecandidate outlier set is obtained by removing the data objects in advance, whichcan not be outliers. Thirdly, the weighted distance sum Wkof each object in thecandidate outlier set is calculated and the objects whose Wkvalue ranks the firstTOP-N are regarded as outliers. The experimental results validate feasiblity ofthe algorithm.3) On the basis of the above, outlier mining system of astronomical spectral data based on the value of the attribute weights is designed and implementedby using VC6.0as development tool, and its software function modules andimplementation techniques are described in detail. System running Results showthe system provide a new way of finding unknown and special astronomicalspectral data objects.

Keywords/Search Tags:

p weights, Outlier mining, Pruning, Similarity Search, Entropy

PDF Full Text Request

Related items

1	An Outlier Mining And Paralleling Method Based On The Grid Cell And P Weights
2	Based On Information Entropy And The Subspace Outlier Mining Algorithm
3	Study On Spatial Outlier Mining
4	Outlier Mining And Parallelization Based On Reverse K-Nearest Neighbor Count And Weight Pruning
5	Outlier Detection Based On Distance And Information Entropy Uncertainty
6	Research Of Detection Outlier Based On Outlier Degree
7	Research Of Similarity Search And Outlier Detection Algorithm On Time Series
8	Research And Application Of Outlier Detection Algorithm
9	The Research On Web Searching And Commending For The Topic-specific Search Engine
10	Study On The Density-Based Local Outlier Mining Algorithm