Font Size: a A A

Research On Short Text Feature Selection Algorithm Based On Fuzzy Entropy And Particle Swarm Optimization

Posted on:2020-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:F ChaiFull Text:PDF
GTID:2428330578952876Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The continuous development and deep penetration of Internet technology has enabled Weibo,WeChat,QQ and other online social software for ordinary users to gradually communicate to a public platform including special information such as national leaders,news hosts and entertainment media.In the era of big data,these platforms generate massive amounts of data information every day,and the length of texts gradually becomes shorter and shorter.The root cause is that short text publishing is more convenient and takes less user time.However,the length of the short text is short,and the user is usually more casually slang,with fewer text characters,a large amount of information,and a relatively high feature set dimension.These shortcomings make the processing of social media text data a huge challenge.In the whole text categorization process,text feature selection plays an important role.Therefore,this paper conducts in-depth analysis and research on short text feature selection,as follows:Firstly,the classification of short text sample features may be classified into one or more types,and the concept of fuzzy entropy is introduced.For the design of membership functions,the local and global aspects are considered:local aspects are classified.When considering the relationship between categories and categories and between specific classes,the intra-class dispersion and inter-class dispersion are introduced;the global aspect adds the feature class frequency,that is,The proportion of the occurrence times of feature words in a specific class to the occurrence times of the whole training set.Finally,the algorithm is introduced into the feature selection,and the simulation and experiment are carried out to compare and analyze the algorithm before and after the algorithm is completed to verify the effectiveness and feasibility of the algorithm.Second,the short text content is small,the amount of data is large,and it is easy to cause dimensional disaster.In order to achieve the purpose of dimensionality reduction in feature selection,it is common practice to select a k sample feature that can represent the short text topic content as the feature subset.However,some common feature selection algorithms such as mutual information and chi-square statistics,including the fuzzy entropy algorithm proposed in this paper,have problems in determining the k value.In this paper,particle swarm optimization algorithm is introduced.,which continuously updates the particles during the search process.Find the excellent solution k by its own speed and optimal position.However,the algorithm may have problems such as premature convergence,diverging and low convergence precision.Therefore,the improvement of this paper is to dynamically adjust the weight of inertia factor for premature convergence,and to eliminate the influence of particle velocity on the divergence and low convergence caused by particle velocity by simplifying the particle swarm optimization equation.Third,the design simulation comparison experiment verifies the feasibility and effectiveness of the improved algorithm proposed in this paper,and makes a theoretical analysis and summary.
Keywords/Search Tags:Text categorization, Short text, Feature Selection, Fuzzy Entropy, Particle Swarm Optimization
PDF Full Text Request
Related items