Font Size: a A A

Information-theoretic Spatial Outlier Mining

Posted on:2014-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:F HeFull Text:PDF
GTID:2268330422970437Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier mining has become one of the most active branch in data mining research fields.It isattracted widespread attention in the field of databases, data mining, machine learning andstatistics. It has a broad application prospect in fraud detection, intrusion detection, fault detection,ecological system disorder, and abnormal outbreaks of disease in public health, public safetyemergencies happen, abnormal in the discovery of natural climate and so on.With the development of the sensor device technology, the number of data acquisition devicesis becoming more and more and the data’s precision is higher and higher. The number of spatialdata is becoming larger and the dimension is becoming higher. The existing outlier detectionmethods for spatial data are mainly based on distance and density. And they are faced with thechallenges of the curse of dimensionality and the amount of data scalability. Outlier detectionalgorithm based on information theory is the study of the classification properties. Thesealgorithms are usually assumed that the property is independent with each other. Theautocorrelation and heterogeneity of spatial data determine the existing outlier detection methodsbased on information theory are difficult to adapt to data mining, so spatial outlier miningalgorithm based on the theory of information has not been reported.Therefore, this paper will study the spatial outlier detection algorithm which can adapt to thediscrete attributes and the continuous attributes based on holographic entropy. The algorithm isbased on the characteristics of spatial data itself and the holographic entropy concept which isunder the comprehensive considering of the correlation between information entropy and attribute.The followings are the contributions of this paper.(1) By analysis and experimental comparison for the existing typical outlier detectionalgorithms associated with spatial outlier detection, their respective advantages andlimitations are pointed out.(2) The determination of the existing spatial neighbor (domain) just relies on spatialrelations, so it leads to the limitations of high complexity. In order to resolve the aboveproblems, a method using spatial identifier attribute to partition the region is putforward.The method uses the hierarchy characteristic of spatial identifier attributes toestablish a hierarchy tree until a certain level. The spatial neighbor is determinedthrough spatial relations in a certain area. It uses R*-tree to search, reduces thecomputational complexity and lays the foundation for distributed parallel computing.(3) For the existing spatial outlier detection algorithm is difficult to adapt to the problem ofhigh-dimensional data, spatial outlier mining algorithm of the holographic entropy isproposed which comprehensively consider the mutual information between theinformation entropy and attributes. The algorithm proposes dissimilarity measurementsfor different types of property and metrics based on information entropy, and putforward the attribute weights calculation method based on information entropy. On thisbasis, outlier degree metrics based on the weighted holographic entropy are proposedand spatial outlier mining algorithm based on outlier degree is also designed.Theoretical and experimental results show that it can implement the automaticcalculation of data partitioning and weight effectively because of considering thecharacteristics of the spatial data. It has advantages in terms of the computingcomplexity, computational accuracy, the user dependence and interpretability of theresults.
Keywords/Search Tags:holographic entropy, information entropy, spatial outlier, regional identity, spatialindex
PDF Full Text Request
Related items