Research On Connectivity-based Outlier Detection And Clustering

Posted on:2015-08-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Q Wan

Full Text:PDF

GTID:1228330422971492

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Outlier detection and clustering are important research of data mining. Althoughtheir technology has been very mature, there are still many challenges in sorts ofapplications. This paper hopes to continually promote the development of the basictheory of outlier detection and clustering, and solve the problems of current researchand applications. The main work is: Based on the research of relationship description,the models of outlier detection and clustering are studied, and then their solutionschemes are verified and optimized.Describing the relationship between data is basic research in the field of datamining. In practical applications, the quality of the relationship description oftendirectly affects the outcome of data mining. Therefore, this paper discusses therelationship measure–dissimilarity, and presents some related theories. Euclideandistance is the most commonly used measure of dissimilarity. However, this measuredoes not fully characterize the dissimilarity between data, and even sometimes give awrong description of dissimilarity. Hence, a neighborhood-based andconnectivity-based dissimilarity measures are proposed. The two measures respectivelyintroduce density and connectivity information so that dissimilarity can be reflectedwell. For ease of use, this study analysizes the neighborhood-based measure, and obtainits estimated value in a uniform distribution; via some theoretical derivation,connectivity-based dissimilarity is addressed as the longest edge of a minimumspanning tree path.From the point of view of connectivity, the paper presents an outlier detectionalgorithm based on the k-th most similar neighbor. The proposed method takes theconnectivity of the k-th most similar neighbor as the outlier-ness score. Thisconnectivity just corresponds to the k-th smallest value of transitive closuredissimilarity. It is proved that the the k-th smallest value is also equal to theconnectivity of the k-th merged objects of a Prim minimum spanning tree (MST). Then,the outlier-ness of an arbitrary object is defined as the longest edge of the MST pathbetween it and its k-th most similar neighbor. Besides, outlier detection algorithm alsotakes into account density. So, the proposed algorithm is suited to arbitrary density andshaped data, and exhibit better performance in local outlier and outlying clusterdetection. Connectivity is also a noteworthy factor in clustering. This paper proposes apartitioning model based on connectivity and the corresponding clustering algorithm.This model ensures the minimization of a total loss function in polynomial time,namely to get the theoretically optimal partition. The most important is that theclustering algorithm can group data with arbitrary manifold. In particular, the more adataset satisfies the constraints--intra-cluster connectivity greater than or equal to theinter-cluster connectivity, the better the clustering result. In addition, dividingsub-clusters can greatly reduce the time complexity, and can maintain the theoreticallyoptimal partition in high probability. Especially in image segmentation, theoreticallyoptimal time of the clustering algorithm can be much lower than quadratic, andmaintain good segmentation results.

Keywords/Search Tags:

Outlier detection, Clustering, Connectivity, k-th most similar neighbor, k-way partitioning model

PDF Full Text Request

Related items

1	Study Of Parameterless Outlier Detection And Complex-manifold Clustering Algorithm
2	Study On The Algorithms Of Clustering And Outlier Detection Based On Neighborhood
3	Research On Intrusion Detection Based On Clustering And Outlier Detection
4	Outlier Detection Algorithm Without Parameter Based On Natural Neighbor
5	Natural Neighbor:The Concepts And Applications In Data Mining
6	Study On Cluster Analysis And Outlier Detection Based On Natural Neighbor And Local Resultant Force
7	Research On LDoS Attack Detection Based On Shared Nearest Neighbor Clustering And Outlier Factor Algorithm
8	Research Of Shoeprint Image Fine-grained Outlier Detection Algorithm
9	Study On Space Partitioning-based Optimized Clustering Algorithms And Related Techniques
10	Outlier Detection Algorithm And Its Parallelization Based On Weighted K-Nearest Neighbor