Font Size: a A A

Application Of Improved Density Peak Algorithm To Text Clustering

Posted on:2019-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:C YuFull Text:PDF
GTID:2428330548487412Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data today,information technology has penetrated into all walks of life.With the increasing demand for information,how to help users deal with a variety of data into a logical form has become an urgent problem to be solved.At present,the most important information carrier is Chinese text,and text clustering has become one of the important ways to quickly organize and analyze text information that users are interested in.Through data preprocessing and clustering analysis,the original unstructured text data is formally described and finally clustered into different clusters according to the similarity degree,which has important theoretical significance for information retrieval and news topic discovery.Based on the in-depth study,this paper selects a density peak algorithm with few parameters,simple principle and easy implementation to be used for text clustering.For the defect of clustering center point selection problem of density peak algorithm,this paper improves the particle swarm algorithm and combines it with the density peak algorithm.Finally,the improved algorithm is applied to text clustering to improve the clustering effect.Mainly do the following two parts of the research work :(1)In order to deeply study the parameters of the standard particle swarm optimization algorithm,especially the inertia weight,a dynamic adjustment inertia weights strategy is proposed,which takes into consideration the different inertia weights given by the particle fitness value,and the global search and local search capability of the balance algorithm at different iterations..Then,the problem that the particles tend to fall into local optimum when the high-dimensional multimodal function is optimized is studied and analyzed.A perturbation factor strategy consisting of the Cauchy operator is proposed to enhance the population variability and broaden the optimal particle search space and help to Escape from local optimum.Based on the above two points,an adaptive exponential inertia weight particle swarm optimization algorithm is proposed.Finally,the comparison experiments are performed on different test functions.The experimental results show that the algorithm has improved both in accuracy and stability.(2)A new fitness function construction method is proposed as a bridge combining AEW-PSO and density peak algorithm,which combines the local density,distance parameter and intra class discreteness to guide the selection of cluster centers more scientifically and improve the effectiveness of the algorithm.Then,the density peak algorithm based on AEW-PSO optimization is applied to text clustering.The cosine distance,which is more consistent with the text distance calculation,is used instead of the original Euclidean distance measure,and the overall framework of the algorithm is proposed.Finally,through the comparison experiment,it is proved that the algorithm proposed in this paper has the best value on three evaluation indexes,such as accuracy,recall and F1 value,and effectively implements text clustering.
Keywords/Search Tags:Text clustering, Density peak algorithm, Inertia weight, Fitness function
PDF Full Text Request
Related items