Font Size: a A A

Research On Directional Clustering And It's Applications

Posted on:2007-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiuFull Text:PDF
GTID:2178360185995885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis uses mathematics method to study and deal with the classification of the data ,the goal of which is to find the structure of unknown data .For recent years ,the ability of getting and making data have been increased significantly .Using clustering analysis technology ,people can discover the distribution patterns in this tremendous data and useful relations between the attributes of data. This technology has successfully applied in the fields such as bioinformatics, archaeology, psychology, information retrieval, digital image proceeding engineer controllingThe high dimensional data are frequently met such as gene expression data , text data, multimedia data .The universality of high dimensional data makes researches on high dimensional clustering analysis very important .This paper focuses mainly on investigating and studying clustering analysis problems of high directional dimensional data ,which includes gene expression data and text data .And we present some methods to solve these problems. The works in this paper have much important theoretical and practical significance. The majority of our work is summarized here :1) A new similarity–directional similarity to measure the proximity of gene expression data is presented by analyzing the characteristic of directionality of the high dimensional data .By using this new similarity , a new clustering algorithm is presented .The algorithm overcome the initialization sensitivity of other directional clustering algorithms .It is also robust to find outliers of dataset and automatic estimate the class number of dataset .2) Considering the shortcoming of the conventional spherical Kmeans clustering algorithms , based on the maximum entropy of information theory ,we present a new spherical Kmeans algorithm with maximum entropy , the new algorithm can avoid local minima and get global minima, which partially solved some of the problems ,e.g. sensitive to the initial conditions.
Keywords/Search Tags:Clustering analysis high dimensional data, similarity-based clustering directional data, directional similarity, gene expression data, text clustering maximum, entropy theory
PDF Full Text Request
Related items