Font Size: a A A

Research On Incremental Clustering Algorithm Based On Artificial Immune Systems And Its Optimization And Application

Posted on:2010-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H LiFull Text:PDF
GTID:1118360302465968Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of internet and communication technology, the abilities of obtaining and collecting data are improving a lot. The amount of information grows rapidly with unprecedented speed. Therefore, data mining is emerged as the time requiring and becomes a hot research field. Clustering analysis is one of important research themes of data mining. It groups the data according to the maximum similarity of intra-cluster and the minimum similarity of inter-cluster. Clustering technology is developed greatly and applied to many fields such as data mining, pattern recognition, machine learning, biological information processing and statistics and so on.There are two different kinds of clustering technology based on static data and dynamic data, namely non incremental clustering and incremental clustering. However, non incremental clustering has not solved some problems effectively because a great deal of information emerges with the development of society. Additionally, the data source of clustering varies over time. New data items are added into it continuously. So the results from clustering analysis do not match for the new data items. Re-clustering costs too much. On the other hand, computing resource is wasted as the previous information of clustering is not used. Therefore, how to design a clustering algorithm to improve clustering efficiency and extract the useful information from data sets, has been an important challenge for the current clustering analysis.Researchers have been inspired by the biology and biological intelligence continuously in recent years. They proposed many learning systems obtained by computing and applied to clustering analysis. The artificial immune system (AIS) is one of them, which comes from the high evolved biological immune system. AIS learns the natural defending mechanism of foreign substance to provide many evolved learning mechanisms, such as unsupervised learning, self organization, memory and so on. Therefore, AIS has the potential to solve problems, which is a novel method. Through analyzing the bionic mechanism of AIS, AIS is suitable for incremental clustering very much. Firstly, AIS has the features which are needed by clustering, such as feature extraction, recognition and learning. Secondly, AIS has two particular features: diversity and immune memory, which make AIS to be more suitable for processing the incremental data. From the computing perspective, it is possible for diversity to recognize the data that have never met before, and immune memory means to memory the relevant information of incremental data that have met before. This process simulates the process of immune response. However, there is little work on it. Thus, the author organizes the paper base on it. The main contributions and detailed contents of the paper are as follows.1 An artificial immune system framework for incremental data clustering is proposed. Meanwhile, the core algorithm-incremental clustering algorithm based on immune response( IRA), is also proposed based on the framework. IRA considers the data distributed in the data space in advance as the free antibodies of immune system. The relevant information of the existing clusters is considered as the memory antibodies. The incremental data which are ready to be analyzed are viewed as the foreign substance-antigens. When the antigen data enter the data space, IRA firstly judges whether it can be recognized by memory antibody data or not. If it can not, IRA simulates the primary immune response. Otherwise, IRA simulates the secondary immune response. In order to improve the scalability of IRA and realize the feature extraction or data compress, a cluster is represented by the centroid and representative points of the cluster. The compacted information will be updated when the antigen data are recognized.2 IRA is analyzed, compared and tested in detailed according to the typical requirement of clustering analysis. In terms of IRA itself, the author analyzes the generating mechanism of Ab population, discusses the computational complex and compress of IRA, displays the differences between IRA and other immune-based incremental clustering algorithms and the advantages of IRA. In the experimental part, IRA is applied to both artificial data set and classic UCI data sets. Both average running time and average correct are used to evaluate the clustering quality. The ability of feature extraction of data set is evaluated by both homogeneity and separation. Meanwhile, the comparing experiments are done. Some conclusions can be obtained through the analysis and experiments of IRA. IRA has few parameters and it is not sensitive to the changing of parameters. IRA is also not sensitive to the order of inputting data. Furthermore, IRA is able to recognize the other shapes of clusters besides spherical clusters. IRA gets better effect for high dimensional data set. In addition, IRA realizes the data compress.3 Two optimizing algorithms are proposed in order to solve the problem of too many small clusters from IRA. They are the optimizing algorithm based on the nearest neighbor and edge betweenness(NN_EB) and the optimizing algorithm based on ant(OAA).Inspired by the clustering algorithm of complex networks and the intra-cluster loss and inter-cluster loss of the nearest neighbors, the learning results from IRA, namely the compress information, can be merged, that is the results form an undirected graph. Then the undirected graph is split by used of edge betweenness. In other words, the optimizing algorithm undergoes a process from merging to splitting. The second optimizing algorithm-OAA can be as an independent incremental clustering algorithm. In the paper, the author firstly proposes an ant movement model represented by five-tuple. The five-tuple concludes the active space of ant, the basic ant, the state of ant, the perceptive rules among ants and the algorithm-OAA whose function is to guide the agglomeration of ants. The learning results from IRA, namely the compress information of each cluster, can be seen as the ants. The ants are distributed on the two-dimensional grid randomly. Then the ants will be merged at certain probability according to OAA. The two algorithms solve the problem of too many small clusters to some degree. The number of clusters is near to the actual cluster numbers after applying the optimizing algorithms. However, the results from both the algorithm are obtained at the cost of lower correct.4 IRA is improved and applied to 3D model retrieval system. According to the high dimensions and dense distribution of 3D models, IRA is improved. The purposes of the improvements are to reduce the recognizing radii of memory antibody data, simplify the computational process and accelerate the computation speed. The improve IRA is applied to some 3D models, sovles the problem of incremental 3D models and obtains better effect. The experiments demonstrate that it is feasible for methods based on biology to apply to 3D model retrieval field. The author hopes that the work in the paper gives meaningful enlightenment to the researchers engaged in 3D models.Summarily, the paper focuses on the incremental clustering algorithm based on immune response and its optimization and application. The algorithms have some theory and application values. In the future work, the algorithms proposed are needed to improve further and the author hopes the algorithms proposed can be applied to more fields。...
Keywords/Search Tags:Artificial immune system, immune response, clustering, optimization, 3D models
PDF Full Text Request
Related items