Font Size: a A A

Outlier Data Mining Algorithm Based On Distance And And Application

Posted on:2014-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:S J LouFull Text:PDF
GTID:2248330395991763Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, especially networktechnology, People’s capabilities to collect, storage and transmit data continue toimprove. The situation of data-rich and lack of knowledge is formed. A newdiscipline-Data mining, come into being. Outlier mining is one of theimportant research content in the field of data mining. In this thesis, outlier datamining algorithms and its application are studied by using the triangle inequalitymodel-based pruning techniques and p-weight as well as information entropy,according to clustering characteristics in most of datasets and attribute weightimpact on the mining results. The main works are as follows:1) An outlier mining algorithm (OMAW) based on p weights is presented.Firstly, outlier candidate set is found out by using the pruning techniques oftriangle inequality, only candidate set is in memory. Then the candidate from thegroup focused each data point, divide kind of case: if the neighbor not reach theK value, giving a relatively power value, if reach K value, using p weightsmethod, computed the data object and its KNN distance sum, the sumbigger,the more object is a outlier; From the group will focus on each of thecandidates data point weights according to size, and determine whether from thegroup for data, so that concealment and submergence phenomenon are overcamein outlier detection process.2) A new outlier mining algorithm based on information entropy andWk-distance is presented. Firstly, information entropy is employed to determineweight indicating each attribute’s importance degree. Secondly, the dataset isreduced by using pruning technologies based on neighbor radius, so that thecandidate outlier set is obtained by removing the data objects in advance, whichcan not be outliers. Thirdly, the weighted distance sum Wkof each object in thecandidate outlier set is calculated and the objects whose Wkvalue ranks the firstTOP-N are regarded as outliers. The experimental results validate feasiblity ofthe algorithm.3) On the basis of the above, outlier mining system of astronomical spectral data based on the value of the attribute weights is designed and implementedby using VC6.0as development tool, and its software function modules andimplementation techniques are described in detail. System running Results showthe system provide a new way of finding unknown and special astronomicalspectral data objects.
Keywords/Search Tags:p weights, Outlier mining, Pruning, Similarity Search, Entropy
PDF Full Text Request
Related items