Font Size: a A A

The Clustering Algorithm Research Based On Covering

Posted on:2006-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q YuFull Text:PDF
GTID:2168360155961019Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
"Things of one kind come together, and man lives in gang". Clustering ceaselessly progresses with the development of the human society. Human being use clustering to distinguish objects and to analyze the similarities of different things. Clustering is one of the important research subjects in Data Mining. It not only can be used as a separate tool to find data distributing information, but also can be used as a pretreatment step of other algorithms in Data Mining. Thus, it is significant to research how to improve the performance of the clustering algorithms.The basic concepts and methods of clustering are firstly introduced in this thesis, and the pre-work of the clustering are summarized, such as the pretreatment of the sample data, the methods of distance and the degree of correlation computing. Secondly the existing clustering algorithms are analyzed in detail, their advantages and disadvantages are pointed out. Finally a new clustering algorithm which called "Cover Clustering Algorithm" is putted forward, and it is also applied. In summary, the major contents of this thesis are included as follows.(1) The current typical clustering algorithms are discussed in this thesis, such as the hierarchical clustering algorithms based on statistics theories, K-Means algorithm based on partition and LBG algorithm based on vectors. Their mathematic theories, processes and functions are commented on, and their advantages and disadvantages are discussed. The work lays the foundation for the further researches.(2) Because of the defects of the current clustering algorithms, such as efficiency, selection of initial data, dependence of parameters and so on, a new clustering algorithm "Cover Clustering Algorithm" is putted forward in this thesis. With this algorithm, the close samples are clustered and the groups implied in samples are found out. To the sparse samples, the shortest distance method can be adopted. After covering, the gravity centers of covers are recalculated to achievebetter effect. The iterativeness of samples is not needed. Therefore, the problems, such as the choice of initial sample and the clustering speed, are solved, which can not be solved with other algorithms.(3 )Finally, the Covering Clustering Algorithm is applied to drugs marketing in medical business company, the experiments indicate the practicability and effectiveness.
Keywords/Search Tags:data mining, clustering, covering clustering algorithm, artificial neural network
PDF Full Text Request
Related items