Font Size: a A A

The Research Of Clustering Algorithm Of Data Mining Based On Invasive Weed Optimization Algorithm

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhouFull Text:PDF
GTID:2268330428482628Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and information technology, people’s life is immersed in the ocean of data and information, human society has entered digital age. People need rich data and information to guide the social activities and to make the life convenient, especially in business, enterprise production process and many other engineering fields, we achieve decision-making information for commercial profits, finish online monitoring, identification and diagnosis of the industrial production and complete making control strategy based on a large amount of data and information. Therefore, the concept of data mining was proposed at that moment, data mining, as an effective technology to mine the valuable and useful information from large data, has caused much attention of more and more scholars and been applied in many engineering fields.Invasive weed optimization algorithm (IWO) is a kind of bionic optimization algorithm to effectively simulate the process of weeds colonizing, growing and competing in the nature. Due to its good robustness, strong optimization ability, fast convergence speed, simple structure and easy to implement, and being better than other intelligent optimization algorithms in many optimization problems, it widely received attention in academia. Fuzzy C-means clustering (FCM), a kind of soft partition method, is uncertain description for sample generic and in line with the laws of human know things, and it has been widely used in many kinds of engineering and scientific fields, but it still has some shortcomings. For the problems of FCM, this paper mainly makes a deep research and analyze, the main research content includes the following aspects:1. For the problems such as FCM algorithm is sensitive to initial clustering centers and easily fall into local optimum, a fuzzy clustering algorithm of data mining based on IWO (IWO-FCM) is proposed in this paper. IWO is used to find the optimal initial solution in the method, ensures to search the solution space, and then implement clustering analysis, it effectively overcomes the shortcomings of FCM. Compared with the results of using genetic algorithm, particle swarm optimization algorithm optimal FCM algorithm, the simulation results indicate IWO-FCM has better clustering efficiency than FCM, GFCM and PSO-FCM.2. The FCM algorithm based on invasive weed optimization algorithm (IWO-FCW) has stronger global optimization ability and higher clustering precision than the basic FCM algorithm, but the IWO-FCW algorithm exists some questions such as the convergence become slow and the clustering precision is not high for high and complex testing data sets. So an improved IWO-FCM algorithm is proposed in this paper. This algorithm uses the chaos sequence to initialize the initial population in order to improve initial solution (seed) quality, the crossover, mutation and part selection operation of the differential evolution algorithm are applied in the spatial distribution and selection process of IWO-FCM algorithm to keep the weed population diversity and enhance global optimization ability. By testing multiple high-dimensional data sets, the simulation results show that the proposed algorithm has faster convergence speed and higher optimization precision than FCM algorithm and IWO-FCM algorithm.3. Because there is difficulty to cluster analysis for practical chemical industrial process data with high dimension and nonlinearity, an IWO-FCM data mining algorithm based on diffusion map is proposed. This algorithm firstly uses diffusion map to extract low-dimensional manifold characteristics from high-dimensional data, it integrates local characteristics of data so that geometry information of original data is retained. Then IWO-FCM is used to cluster analysis for the low-dimensional manifold data. The experimental results demonstrate the proposed algorithm has stronger stability and robustness and better optimal ability and convergence effect than using FCM algorithm based on diffusion map for TE process multiple fault test data set. The clustering effect is improved obviously, and the proposed algorithm can quickly and effectively identify fault features and confirm its validity and superiority.4. Fuzzy C-means clustering (FCM) algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy clustering (PFCM) algorithm overcomes the problem well, but PFCM has some questions such as it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy C-means algorithm (IKPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization algorithm (IWO) to seek the optimal solution as the initial clustering centers based on KPFCM, improve the robustness and optimization ability of the algorithm, at the same time, the sample variance is introduced in the objection function to simplify the parameter of objective function, further enhance the effectiveness of the clustering algorithm, then the proposed algorithm is used to cluster data. The simulation results of the UCI datasets and artificial data sets show that the proposed algorithm has stronger noise immunity, higher cluster accurate and faster convergence speed than PFCM algorithm.
Keywords/Search Tags:data mining, invasive weed optimization algorithm, fuzzy C-means clustering, chaos, differential evolution, diffusion map, TE process, possibilistic fuzzyclustering, kernel possibilistic fuzzy clustering
PDF Full Text Request
Related items