Font Size: a A A

The Research Of Dimension-Oriented High Dimensional Clustering Boundary Detection Technology

Posted on:2018-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:X F CaoFull Text:PDF
GTID:2348330515973150Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cluster is an important research work which could help us analyze the data distribution,research the data characteristic and find the latent structure in data mining.Then,we could use the data deeply.Clustering boundary representatives the objects with clear labels but will miss.Now,it plays important roles in medical recessive diseases,gene expression data,handwritten signature,target tracking,etc.Based on the proposed clustering boundary theories and technologies,this thesis will try to research the high dimensional clustering boundary detection in two different viewpoints since there are few work about it,i.e.Space oriented and Dimension oriented.To test the proposed detection modes of this thesis,we also design and introduce many high dimensional datasets.The contributions of this thesis are described as follows:(1)We firstly introduce varying sampling window based on k nearest neighbors to reduce the sensitive to sliding window with fixed size on density estimation.Then,we take the density of sample as the weight of corresponding drift vector.Then,we propose the BorderShift algorithm.Experimental results on both synthetic and real data sets demonstrate that BorderShift could successfully detect the clustering boundary pattern of high dimensional space.(2)To improve the accuracy of detecting clustering boundary,we extend the Hopkins Statistics to high dimensional space and propose the Symmetry Statistics which can describe the uniformity of high dimensional space.Then,we introduce the particle symmetry of physics,and inverse the space positions of the k nearest neighbors and project them on the high dimensional space coordinate system.Based on the two techniques,we propose the Spinver algorithm.Experimental results from synthetic dataset,medical dataset,handwritten dataset etc.,demonstrate the effectiveness and high accuracy of this algorithm.(3)On the basis of proving the existence and uniqueness of the fulcrum on the lever,we propose the idea of analyzing the characteristic of the neighborhood space from a single perspective,i.e.simulating each dimension of a data point's k nearest neighbors(kNN)space as a lever.Then,we model the distance between the projected coordinate of the data point and the balance fulcrum on each dimension,and propose the Lever algorithm.Experiments on both low and high dimensional datasets validate the effectiveness and higher efficiency of our proposed algorithm.(4)To solve the problem of detecting the clustering boundary in high dimensional space with higher dimension,we propose a new clustering boundary detection algorithm based on Markov graph model of knight's tour,called Knight.This thesis simulates high dimensional space as discrete state space,and transforms the Markov process of knight's tour to corresponding graph model.Through constructing the Hard coefficient to judge the difficulty of solving path,we propose the Knight algorithm.Experiments results on gene expression data sets,target tracking,complex face image datasets,and ten-thousand dimensional datasets verify the effectiveness of our proposed algorithm.(5)Propose a simple matrix model to construct the MMC algorithm.It aims to research the clustering boundary detection technology which can more easier to apply in the real world.Judging the symmetry of data point and its knn space is the main idea of MMC.(6)Propose the Dimension oriented technology,i.e.decomposing the high dimensional space to one dimensional space and analyzing the data distribution in each subspace.The research work show the chance process of Space oriented to Dimension oriented,i.e.the theory research of clustering boundary detection in high dimensional space is enriched;the detection performance of the technologies are improved;the difficulty of technologies are reduced;the research scope of the clustering boundary is extended;the positive exploration on the field of gene expression data,face recognition,object tracking,ten thousands dimensional datasets,etc.push the change of theory research to real world application.
Keywords/Search Tags:High dimensional space, clustering boundary, Dimension oriented
PDF Full Text Request
Related items