Font Size: a A A

Research On Chinese Text Clustering Based On Bare Bones Particle Swarm Optimization

Posted on:2018-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:M H WangFull Text:PDF
GTID:2348330518986574Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet technology,there is more and more information in the data.It is a problem to find useful information and discover the disciplinarian from the large amount of data.To solve this problem,the data mining discipline is raise.As one of the key technologies in data mining,clustering has been applied for discovering the different classes of unknown dataset.In the face of a large amount of unstructured data,text clustering has been attracted more and more attention.Because of the weakness of high dimension,sparse spatial and unstructured for the text data,it is important to choose a feature selection method for the clustering results.In recent years,the swarm intelligence algorithm has been deeply researched and has been widely used in many fields effectively owing to its good balance between exploration and exploitation ability.This thesis optimizes a swarm intelligence algorithm and improves the clustering algorithm.Based on the above discussion,the purpose,method and result of this work includes following sections:1.To address the disadvantages such as premature convergence and easily falling into local optimum of bare-bone particle swarm algorithm(BBPSO),an improved BBPSO based on Von Neumann topology(VBBPSO)is proposed.The evolutionary center and discrete control are adjusted by applying Von Neumann topology,replacing the global optimal position with neighboring optimal position,and adopting center adjustment coefficient.Besides,the concept of giving consideration to the lagging particles can improve the exploration and exploitation ability.2.Analyzes the difficulties of text clustering problems for the high dimensions,sparse spatial and latent semantic structure of text data.K-means algorithm defined K value manually and its clustering results influenced by initial center.Thus,it has the weakness of sensitivity to noise and instability.To solve these problems,a new K-means algorithm for text clustering is proposed.Firstly it uses the physical significance of Singular Value De-composition(SVD)to classify the data rough,and then uses K-means for text clustering.The new algorithm applies SVD to decompose and keeps semantic features,removes noise,makes smoothing process of text data as well as takes advantage of physical significance of SVD to have rough set classification,and finally regards classification results as initial center of K-means.3.To improve the weakness of text vector space for the high dimension and sparse,the VBBPSO algorithm is applied to Chinese text feature selection.Firstly,the text vector is coded to transform the discrete problem into a linear problem,and a new fitness function is designed for the text clustering feature selection algorithm.Secondly,the improved BBPSO is used to optimize the global best particles and decide the best program for the feature selection.Finally,according to the selected text feature vector,the SVD-Kmeans algorithm is used to cluster the text.The clustering results show that the VBBPSO algorithm can effectively optimize the clustering quality for the feature selection of text,and has effect on most clustering algorithms at the same time.
Keywords/Search Tags:BBPSO, text clustering, SVD, K-means, feature selection
PDF Full Text Request
Related items