Font Size: a A A

Covering Clustering Algorithm Based On Quotient Granularity

Posted on:2008-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:L L YanFull Text:PDF
GTID:2178360215496606Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information technology, the data base application has been enlarging in term of dimension, area and depth, and this will lead to the accumulation of a large number of data, behind which much important information is hidden. Higher level analysis has been expected, so that these data can be better used. The current data system can effectively and conveniently realize many functions such as input, query, statistic etc, but various relations and rules between data can not be explored, let alone the future trend forecast of the current data. Data Clustering Analysis(DCA) is one effective way to solve this problem, and it is also one important part of Data Mining. The class of unknown object can be discovered by means of DCA, which provided powerful support to Data Mining, and it has been widely researched in recent years.In Data Clustering Analysis technology, the data has been divided into natural colony, and each colony characteristic describes one Data Mining Method. This is the basic way of Data Mining and Knowledge Discovering, but the traditional clustering algorithms can not efficiently handle a large number of data. We focus on finding one high-efficient method to deal with these large number of and high dimensional data base. One cross-cover algorithm has been developed to effectively cope with the large-scale data clustering problem, in the mean time, we realized that the thickness of clustering can be described by granularity, in this thesis, the granularity concept is introduced into clustering based on the cross-cover algorithm.Firstly, Data Mining technology including the detailed information about clustering analysis had been presented, and the relative parts of clustering analysis algorithm such as data express, distance calculation and common algorithms had been discussed; Secondly, the basic idea of cross-cover algorithm and quotient space granularity is introduced and one cover clustering algorithm on the basis of quotient space granularity is presented, the simulation results show the effectiveness and feasibility of this algorithm in dealing with high-dimensional and large-scale data samples. Lastly, the supervised characteristic selection methods can not be directly applied to text clustering because of the lack in class information, one characteristic selection method based on class information is presented, information gain characteristic selection method is used in the results of density clustering algorithm to rechoose these more competitive class information characteristics, and the experiments results approved the feasibility of this method.The summary of work:(1) One cover clustering algorithm which can deal with large-scale high dimensional data was proposed in this thesis, it is based on the traditional clustering algorithm and the cross cover algorithm which has excellent performance on data classification has been extended, thus the improved cover clustering algorithm was generated to handle with the automatic clustering problem of data.(2) The concept of granularity was introduced, while the different granularity are chosen in calculation, the physical meaning of within-class and between-class can be directly shown, that has a guide meaning to practical application.(3) Text clustering belongs to un-supervised learning method, due to lack of class information, it is very difficult to directly apply the supervised characteristic selection methods, in this thesis a characteristic selection method on the basis of class information is presented, it greatly applied the information gain characteristic selection method of un-supervised learning, the simulation results proved its effectiveness.Some work has been done in terms of granularity clustering, but some shortcomings still exist currently, a lot of work can be done in the future, such as:(1) the effectiveness of the algorithm;(2) the retractility of the algorithm;(3) the system alternation ability of the algorithm etc.
Keywords/Search Tags:Clustering Analysis, Cover Algorithm, quotient space, granularity
PDF Full Text Request
Related items