Font Size: a A A

Theory And Practice Of Hybrid Clustering Algorithm Based On Density And Ant Colony

Posted on:2020-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:T HeFull Text:PDF
GTID:2428330596975061Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Unstructured data(text,pictures,images,videos,etc.)is experiencing an explosive growing with the rapid developmen of the Internet,which make people spend more time on information screening.How to dig out potentially useful information from a large amount of data has become a hot topic for scholars.In this case,the paper focuses on the mining of text data,using clustering algorithm to organize and classify text data,find useful information,reduce the workload of manual sorting of documents,which has a wide range of application scenarios and far-reaching research significance.The paper focuses on the application of text clustering and analyzes some advantages and disadvantages of traditional clustering algorithms.By evaluating the effectiveness of clustering results,a hybrid clustering algorithm based on density peak and ant colony clustering is proposed.The algorithm proposes the corresponding calculation process from the selection of class center points,the allocation of data points,and the class merging,and finally applies the algorithm to text clustering.The paper mainly has the following work:First,the ant colony clustering algorithm is a meta-heuristic algorithm with global optimization ability,randomness and exploratoryness,but it has a slow convergence.The paper improves the calculation method of ants picking up and dropping items,which is more effective to calculate the probability of picking up and dropping by using the similarity between the data and the surrounding data.To some extent,the convergence speed of the algorithm is accelerated.Secondly,although the fast search and find of density peaks algorithm is efficient and concise,there are some problems,such as we need to use visualization to manually participate in the center point selection,when the cluster distribution is uniform,some clusters are divided into several sub-clusters.Aiming at these two problems,the paper considers the new data index?_i,introduces the ant colony clustering algorithm,combines the ACO algorithm with the density peaks alogrithm,proposes a new clustering method-DPACO alogrithm.The algorithm uses the exploratory and randomness of the ant colony clustering algorithm to perform initial clustering of the data,and combines?i to obtain the clustering center,then uses distance-based clustering method to cluster the data.Experiments show that the method achieves the best results on the dataset.Finally,the paper applies the DPACO algorithm to text clustering,and implements the text clustering processing by modular design.The experimental dataset uses Sogou classification text corpus,which uses jieba word segmentation to segment the text,deals with text preprocessing and forms vector by multiple text vectorization models.And then it's combined with DPACO algorithm to cluster the data set.Experimental results show that the algorithm have more effective results in text mining.
Keywords/Search Tags:Text clustering, fast search and find of density peaks algorithm, ant colony clustering alogrithm, class center, class merging
PDF Full Text Request
Related items