Font Size: a A A

Research And Simulation Of Clustering Algorithm In Data Mining

Posted on:2022-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:L F ZhuFull Text:PDF
GTID:2518306350994729Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As an important research method in data mining,clustering has been widely used.Clustering is very useful in the following areas: pattern analysis,browsing,aggregation,decision making,and machine learning,as well as data mining,file recovery,image segmentation and pattern classification.The main application examples include the commercial site selection based on user location information,Chinese address standardization processing,national grid user portraits,non-human malicious traffic identification,job search information improvement,biological population inherent structure recognition,search engine query clustering for traffic recommendation,Website keyword source clustering and image segmentation,etc.In this paper,clustering is taken as the research direction to carry out a series of researches,mainly focusing on the clustering validity function in soft clustering FCM and the clustering of intelligent optimization algorithm in hard clustering.The main research of this paper can be divided into three parts:(1)Determining the optimal classification number of a data set is the key to effectively partition the data set in the clustering problem.In this paper,the validity function of a fuzzy C-means clustering algorithm is presented by using the methods of class compactness division and class separation,and its minimum value represents the optimal clustering.Then,through the simulation experiment,the validity function of the proposed FCM clustering algorithm is compared with the known typical validity function to compare the related clustering performance.Three data sets were used for FCM clustering,including three classical data sets,two artificial data sets and six real data sets in UCI database.The simulation results show that the proposed validity function can divide the data sets efficiently.(2)Aiming at the problem of how to classify the clusters correctly and effectively when the number of clusters is known,a clustering method based on improved bat algorithm is proposed.Bat algorithm(BA)is a swarm intelligence optimization algorithm inspired by bat's ultrasonic echo localization foraging behavior,but it has the disadvantages of being easily trapped into local minima and not being highly accurate.So an improved bat algorithm was proposed.In the global search,a Gaussian-like convergence factor is added,and five different convergence factors are proposed to improve the global optimization ability of the algorithm.In the local search,the hunting mechanism of the whale optimization algorithm(WOA)and the sine position updating strategy are adopted to improve the local optimization ability of the algorithm.This paper compares the clustering effect of the improved bat algorithm with bat algorithm,flower pollination algorithm(FPA),harmony search(HS)algorithm,whale optimization algorithm,particle swarm optimization(PSO)algorithm and three kinds of improved algorithm on seven real data sets under six different convergence factors.The simulation results show that the clustering effect of the improved bat algorithm is superior to other intelligent optimization algorithms.(3)A hybrid dynamic clustering method based on Marine Predators Algorithm and particle swarm optimization algorithm is proposed to solve the dynamic clustering problem where the number of clustering cannot be determined in advance.The position updating strategy of PSO is used to make up for the shortcoming of MPA in global search.The fixed length coding strategy of real number coding method is used to deal with the problem of variable length clustering optimization.The infeasible solution processing strategy and penalty function punishment strategy are proposed to improve the performance of the algorithm,so as to realize the simultaneous optimization of cluster number and cluster center.Will offer MPA-PSO algorithm and particle swarm optimization algorithm(PSO)algorithm,Marine Predators Algorithm(MPA),differential evolution algorithm(DE),spotted hyenas optimization algorithm(SHO),the lightning search algorithm(LSA)and balance in 4 kinds of artificial data set and 6 kinds of UCI database of real data sets(Iris,Wine,Wisconsin breast cancer,Vowel,Seeds,Wdbc)clustering simulation experiment was carried out,using clustering number,ARI and Accuracy were used to evaluate the clustering results.The experimental results showed that the proposed method could not only successfully find the correct clustering number,but also obtain stable results for most test problems.
Keywords/Search Tags:Clustering Analysis, Clustering Validity Index, Dynamic Clustering, Bat Algorithm, Marine Predators Algorithm
PDF Full Text Request
Related items