Font Size: a A A

The Research On Several Issues Of Clustering And Clustering Validity Indexes

Posted on:2010-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L LvFull Text:PDF
GTID:1118330338476992Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering plays an important role in many engineering applications, such as data mining and so on. There are many mature methods to do clustering. The scope of application of different methods may be not same with others. For the same data set, different clustering methods may get different clusters. To choose the suitable clustering methods, some clustering validity indexes have been presented. However, different indexes may lead different conclusions. Each method has its own application scope. When the application exceeds the scope, the method may be invalid. It is very important to ensure the validity of both clustering and clustering validity index in applications. This paper will discuss the related problem about clustering and clustering validity index from the basic concepts of clustering, including the following aspects.1. The normal form of the similarity of clustering is presented in this paper. Clustering is a process of dividing the data set by the given similarity, which is the key of clustering. Similarity could be described by many models, such as distance, density and so on. However, these models do not show the essence of similarity. To catch the essence, there is a discussion about the form of similarity and its intuitive property in this paper. Base on this discussion, the normal form of similarity could be gotten.2. The hypothesis space of clustering is presented in this paper. The hypothesis space is the important theoretic basis of machine learning. The hypothesis space of clustering is built in this paper base on the normal form of similarity in order to discuss the related problem of clustering. Base on the hypothesis space, the main reason of the invalidity of clustering and index is shown in this paper.3. Describe the clustering by modal logic. This paper shows the method to describe clustering by Kripke structure. And then the modal formulas could describe the status of each data in syntax. This description may be the theoretic basis of the further discussion.4. A universal clustering validity index is presented in this paper. The modal formulas show the cluster information of each data. According to these information, a method of clustering representatives based on modal logic description is presented in this paper. Based on the representatives, a universal clustering validity index, which does not limited by the calculation method of the similarity, is presented in this paper. Besides the quantitative result of the index, the representatives are able to show the qualitative analysis of the clustering results.5. The theory of incremental clustering risk and the method to check the incremental clustering validity is presented in this paper. The increment clustering could be seen as an induction part. Induction is inherently risky because it is not truth-preserving. The modal description is helpful to check the validity of not only the normal clustering but also the incremental clustering. The modal formulas of each data show the risk of induction. The high quality incremental clustering could be gotten by minimized the risk. Meanwhile, the validity of the incremental clustering could be checked by calculating the total risks.At last, the above concepts are verified by the engeering application in this paper. The application of"combining small samples"could show the feasibility of clustering hypothesis space and validity index based on the space. The other application"classifying the grades of delay fights"show the feasibility of the clustering validity indeex based on modal representatives. The advatanges of this method is also shown in this application.
Keywords/Search Tags:Machine learning, clustering, hypothesis space, modal logic, representatives, incremental clustering
PDF Full Text Request
Related items