Font Size: a A A

Research On Text Feature Dimension Reduction Method Based On Genetic Algorithm

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:C K LiuFull Text:PDF
GTID:2428330629950891Subject:Cyberspace security law enforcement technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the need to tap the potential value of the data,information processing technologies such as big data and machine learning have also risen rapidly.Among them,the text classification technology is playing a huge potential value in many realistic fields such as public opinion analysis,topic classification,sentiment analysis,mail filtering and financial prediction.The effect of text classification technology is very closely related to the selection of text features.In order to make the selected text features have good text classification performance and higher classification efficiency,this paper selects and improves the text feature dimension reduction method based on genetic algorithm.The main improvements are as follows: 1.Changed the selection rules of gene groups.According to the advantages and problems of the word frequency-inverse document frequency algorithm and the mutual information algorithm,a multi-rule fusion filtering feature selection algorithm is proposed,which is used to initially select the features of the original text to form a gene group of individuals in the population;2.Associate the individual's generation method with probability rules.Increase the attention to population diversity,by calculating the internal and external population diversity of the first generation,improve the performance of the starting point individuals in the classification results,alleviate the contradiction between the convergence speed and the population diversity,and shorten The reproductive algebra required for the emergence of the optimal individual;3.Added the dimension influence factor to the fitness function to measure the individual's external performance in a more comprehensive way;4.Use the adaptive method for the crossover operator and mutation operator To speed up population convergence.The traditional genetic algorithm and the general filtering feature selection algorithm are compared with the improved genetic algorithm.The results show that the improved genetic algorithm has a greater improvement in the fitness function during the optimization process;the classification performance evaluation index is accurate The rate,recall rate and macro average are the highest;compared with the traditional genetic algorithm,the time to find the best individual has declined.It is proved that the improved genetic algorithm has faster convergence speed and better searching ability when performing text feature dimension reduction,and achieves the purpose and effect of text feature dimension reduction.
Keywords/Search Tags:text features, dimension reduction by feature selection, genetic algorithm
PDF Full Text Request
Related items