Font Size: a A A

Text Clustering And Its Application Based On CFSFDP Algorithm

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:C X ZhanFull Text:PDF
GTID:2428330548476399Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the past decade or so,the rapid development of Internet technology has led to the generation of large quantities of network data.Tens of thousands of messages are generated every day in a variety of ways,and text is one of the ways most information is stored.The processing and analysis of these text data is a hot topic in contemporary research.One of the major branches of text processing technology is text clustering,which is generally applicable to pattern recognition,user recommendation,data mining,emotion analysis and topic recognition.Candidates in the selection of universities,there are many questions for schools of interest.The experimental data used in this paper is the candidate's question for the Admissions Office,which is the text of the college entrance examination consultation.The text clustering algorithm CFSFDP used in the college entrance examination consulting text on these text data analysis and processing is conducive to finding candidates hot issues,quickly and effectively answer questions of candidates,to reduce admissions consulting pressure.The clustering by fast search and find of density algorithm(CFSFDP)is a new density-based clustering algorithm published by Alex et al in Science in 2014.The advantages of the algorithm is simple in thought and has the ability to find data of arbitrary shape Set and do not need to pre-specify the number of clusters.It is found through research that the cutoff distance in CFSFDP can not be obtained automatically.In this paper,the k-distance is introduced to analyze the structure of the dataset and the cutoff distance is obtained automatically and reasonable.The difference of cutoff distance will affect the local density value,and ultimately affect the accuracy of the whole algorithm.At the same time,since the local density and distance of data points are not in the same order of magnitude,these two attributes are normalized.By validating the improved algorithm in the college entrance examination consulting texts,and the effectiveness of the improved algorithm is verified through experiments.In order to further improve the clustering effect,this paper analyzes the problem that the product of the local density and distance of data points in the basic CFSFDP algorithm is used as the selection basis for the clustering center.A CFSFDP algorithm based on particle swarm optimization is proposed.Particle swarm optimization(PSO)algorithm is used to find the optimal density threshold and distance threshold in CFSFDP algorithm,and the influence of randomness of density and distance threshold on clustering accuracy is reduced.At last,it is proved on the text data set of college entrance examination consultation that the clustering results of the CFSFDP algorithm based on PSO achieve better or equivalent results than the DBSCAN algorithm,the basic CFSFDP algorithm and the Agglomerative Clustering algorithm,which verifies the effectiveness of the algorithm.
Keywords/Search Tags:Text Clustering, Clustering by Fast Search and Find of Density Peaks, Cutoff Distance, k-distance, Particle Swarm Optimization
PDF Full Text Request
Related items