Font Size: a A A

Study On Spatial Outlier Mining

Posted on:2009-07-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:A R XueFull Text:PDF
GTID:1118360275951020Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood.Spatial outlier mining is an important branch of spatial data mining,it can reveal important phenomenon in the applications of traffic control,sensed image analysis,weather forecasting and analysis of demographic data and others.With the development of sensor technology,the number of equipment for data acquisition is more and more,the desired precision is higher,more and more projects collected,therefore increasing the amount of data,the higher dimension.However,the existing spatial outlier mining algorithm is mainly for the small and medium-sized datasets which is one-dimensional or low-dimensional,difficult to adapt to the large high-dimensional data mining,and did not fully consider the characteristics of spatial data,the data it mined is not the true spatial outliers,but the global outliers.Their disadvantages are the high user-dependency,low detection accuracy,low efficiency of mining.In addition,with the development of network technology,sensor technology and wireless communication technology,the acquisition,collection,preservation and processing of data appear a state of decentralization,so the data mining based on the distributed environment is also cause for concern.However,spatial outlier mining algorithm based on the distributed environment hasn't been reported.According to the characteristics of spatial data,this article will research on the methods of attribute partition and weight value setup,the measurement of spatial outlier score,achieving the high-performance spatial outlier mining algorithms with high mining precision,less user-dependency.The disadvantages of existing algorithms mainly limited to numerical data,by transforming the non-numerical data into numerical data,make the unified algorithm based on the mixed attribute come true.For high-dimensional large amount of data, use pruning strategy,the outlier mining based on subspace and ensemble learning methods to achieve the data mining of high-dimensional large amount of data sets; For the spatial outlier mining of distributed environment,the privacy preserving spatial outlier mining algorithms were proposed.The main contribution of the paper is as follows:(1) Propose the method based on the attribute division to resolve the problem of local outlier mining.The general local outlier mining uses the method of full-dimensional attributes,such as LOF(Local Outlier Factor) method.As a result,it is very time-consuming in determining the local neighborhood,since all-dimensional attributes are indiscriminately equated,the accuracy of the measurement of outlier score affected,the mining accuracy and speed of data mining also affected.The attributes of data object can be categorized as the ID attributes,context attributes and inherent attributes.The ID attributes play the role of marking the data object,such as the name of data object and so on.The context attributes decide the environment of the object,such as location,time,sequence,it can be used to identify neighborhood. The inherent attributes is the unique attributes of data object,including behavior attributes and status attributes,decide the behavior and characteristics of the status of the object,we can use it to determine the spatial outlier score of data objects.(2) Propose a new method for the measurement of the spatial outlier score of data objects.That is,the measurement method of SLOF(Spatial Local Outlier Factor) which is based on the characteristics of spatial data.Propose the spatial outlier mining algorithm ASLOF(Algorithm based on SLOF).The attributes of data object can be categorized as the ID attributes,spatial attributes and non-spatial attributes,use the spatial attributes to determine the spatial neighborhood,establish the spatial index,use the non-spatial attributes to determine the spatial outlier score,and introduce the weight value of attributes in the measurement of outlier score,improving the measurement accuracy.Based on these,propose the spatial outlier mining algorithm based on the spatial outlier score.The theory and experimental results show that the proposed ASLOF algorithm outperforms the other existing algorithms in mining accuracy,user-dependency,and efficiency.(3) Propose a unified measurement of the spatial outlier score and mining algorithm of mixed attributes.Start with the nature of outliers,through counting the frequency of classified attributes,transform the classified attributes into numeric attributes,and through weight value setup and standardization of the attributes,after the above mentioned deal,make the unified mining algorithm of spatial outlier which based on the mixed attribute come true.The experimental results show that it can effectively achieve the unified measurement of spatial outlier score with mixed attributes and mining.(4) Propose the subspace spatial outlier ensemble algorithm based highdimensional large data sets(S2OEAHL).Due to a lot of geographical identity contained in the ID attributes of the spatial data objects,according to the geographical identity to construct of the hierarchy coding tree of object,based on the tree,achieve the division of data and rapidly search of the object,by calculating the upper and lower bound of the division and minimum bounding rectangle(MBR) method,cutting the division which obviously not contain outliers,reserving the division which may contain outliers as a candidate division,it realizes the rapid pruning of the division, consequently reduce the number of data processing.Adopting the subspace mining method for the candidate division,in order to avoid a large number of search which has an exponential relationship with the dimension of the attributes,using a subspace-based mining and ensemble learning based on subspace-weight to address the issue of outlier mining of high-dimensional data.Algorithm use the outlier factor mining method of one-dimensional subspace,and use the optimizational method of calculation to achieve the corresponding weight of attributes of the detected object. On this basis,the outlying-ness of each data object is measured by fusing outlier factors in different subspaces using a combination function.According to the sort of outlier factors we can acquire the outliers.The theory and experimental results show the effectiveness of the algorithm and the high efficiency of calculation.(5) Propose the spatial outlier mining algorithm DPPASLOF(Distributed Privacy Preserving Algorithm based on SLOF) of the protection of privacy based on distributed environment.The algorithm using the locality of spatial data,exert the ability of active participation of every data holder party,with the spatial index technology and privacy preserving protocols in order to improve the ability to search and privacy preserving.Theory shows the safety of the algorithm,the high-performance of computing and the low cost of communications.
Keywords/Search Tags:Attribute partitation, local outlier factor, spatial outlier, ensemble learning, spatial index, pruning strategy, privacy preserving, data mining
PDF Full Text Request
Related items