Font Size: a A A

The Research On Topic Model With Multiple Topic Number

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2428330566998894Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the information age,the internet produce massive texts every day,how to use the effective value from the massive texts has become a very important problem in current.In the process of dealing with the text information,the problem of topic model occupies a decisive position.In an effective topic model,the massive texts can be applied in the fileds of news recommendation,intelligent dialogue,information retrieval and others,playing a corresponding role.As an algorithm of topic model,the Latent Dirichlet Allocation topic model has been widely used in related fields because of its high efficiency and accuracy.But in the progress of establish the LDA matrix,if the selected topic granularity is too big,it cann't focus on the details of topic effectively,and if the selected topic granularity is too small,it is likely to cause excessive separation of topic information.Based on this problem,this paper does some research on the topic model with the technology of machine learning and genetic algorithm.Firstly,the Multiple Topic Number algorithm is proposed,which constructs a series of the LDA matrices according to multiple topic number,and parallel them together to form a matrix of compound topic dimension.Combining the LDA matrix of compound topic dimension with the structure of serial tree,make the different levels of the tree to use the part of the matrix which contains the corresponding information granularity,and make the text classification more accurate.In order to optimize the MTN algorithm,genetic algorithm is introduced into the topic model of multiple topic number.it is used to deal with the combination of topic dimensions.Through the combination of topic dimensions mapping to the formal of solution vector,make the genetic algorithm combine the topic model perfectly.Then using the selection,crossover and mutation operators make the combination of topic dimensions more reasonable,so that the accurate of text classification has been further improved.The relevant experiments are carried out in a plurality of text data sets.Through compare with the classical algorithm of topic model and academic frontiers,it is proved that the MTN algorithm make the accurate of topic classification has a certain promote,and the genetic algorithm combined with MTN will further optimize the results.In the aspect of running time,the integration of MTN algorithm and genetic algorithm will increase the running time in a certain range.The running time increases with arithmetic progression,and it can be controlled.The specific problems need to consider the balance of accuracy and efficiency.According to the actual situation,we must choose the algorithm and control the parameters to ensure the accuracy of the classification results in time.
Keywords/Search Tags:LDA topic model, information granularity, multiple topic number, ensemble learning, genetic algorithm
PDF Full Text Request
Related items