Font Size: a A A

Research On Clustering Analysis And Its Applications In Telecom

Posted on:2008-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:K NiuFull Text:PDF
GTID:1118360215483642Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Knowledge Discovery in Databases (KDD) is a special process to extract valid, novel, useful potentially and understandable ultimately knowledge or patterns from database. Data mining is the essential step of KDD, where intelligent methods are applied in order to extract data patterns on user interest, explain and visualize data mining results by knowledge representation techniques, such as trees, tables, rules, graphs.Cluster analysis is one of the most important functions of data mining, and clustering is the process of grouping a set of physical or abstract objects into classes of similar objects. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. This thesis focuses on key techniques and algorithms of cluster analysis and their applications in telecom.Chapter one reviews the content of data mining field. It focuses on the basic concept of knowledge discovery and the birth, evolution of data mining firstly. Then the main functions of data mining are discussed, include concept or class description, classification and prediction, cluster analysis, frequent pattern or association rule analysis, outlier analysis and sequence or time series analysis. Moreover, the applications of data mining technique in telecom are proposed. Finally, the main content and architecture of this thesis is also described.The basic concept of cluster analysis is discussed in chapter two, include the definition of cluster analysis, main evaluation criteria and basic requirement of clustering algorithm. Then various algorithms and their most representative algorithms are proposed: partitioning method, hierarchical method, density-based method, grid-based method, model-based method.Chapter three first studies the requirements of cluster centers initialization and the current algorithms. Then two methods for initializing cluster centers based on composite neighbor and direction pointer are proposed, which integrate both grid-based and density-based clustering algorithm.Chapter four investigates subspace clustering algorithms for high-dimensional data. First, the instances of high-dimensional data are proposed. Their characteristic and the influence on traditional clustering methods are studied. Then three approaches of subspace clustering are introduced: overlapping subspace clustering, non-overlapping subspace clustering and other subspace clustering. Moreover, two methods for subspace clustering are proposed. One is based on the theory of maximum clique; the other is based on attribute clustering, and it discards the search strategy like apriori. Experiments on both real and synthesis datasets prove the validity.Chapter five makes researches on outlier detection algorithms. First, current outlier detection methods are reviewed, including the statistical approach, the depth-based approach, the deviation-based approach, the distance-based approach and the density-based approach. The characteristic of those algorithms is discussed. Then two methods for outlier detection in feature space are proposed: the outlier detection method based on difference of double radius density and the method through distance distribution clustering. The former detects outlier by comparing the density difference of the double radius and the radius of each point in data space, while sampling techniques are used to further improve efficiency; the latter redefines the problem by clustering in the distribution difference space rather than the original feature space. The effect of the two methods has been validated by experiments.Chapter six makes researches on constrained clustering. Constraints of various forms could guide clustering process and improve clustering performance. Besides of different types of constraints, the advantage and disadvantage of adding constraints to clustering are discussed. Then the conclusion that distortion caused by constraint sets is the radical reason to influence on the accuracy of constraint clustering algorithms is depicted. And also, minimized distortion method for K-means is presented.Based on the new algorithms in this thesis, chapter seven focuses on the applications of cluster analysis in telecom customer relationship management.Chapter eight makes a conclusion of the research and puts forward the future research in this field.
Keywords/Search Tags:data mining, cluster analysis, cluster center, subspace clustering, outlier detection, constrained clustering
PDF Full Text Request
Related items