Font Size: a A A

Research On Text Public Opinion Analysis Algorithm For Wechat Public Platform

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:H C GongFull Text:PDF
GTID:2428330623481255Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the "explosive" growth of the number of online users on the Internet,massive amounts of data are continually brought together to become a super large network database.Processing and analyzing the contents of the database has evolved into an online public opinion platform.At present,the research of public opinion monitoring system is paid more and more attention by academia,because public opinion and public opinion are often obtained from massive data.Many Chinese scholars often take Weibo,Douban,Zhihu and other platforms as the object of their public opinion research.In recent years,with the popularity and rapid development of WeChat,WeChat has become a household name,and the general public has generally accepted the "fast reading,random comment" method.It can be said that the public prefers to obtain knowledge and make comments from WeChat public account,WeChat subscription number and other channels.According to the research,the public opinion analysis for WeChat public platform is not as hot as the research on other platforms,so this article will focus on the WeChat public platform in WeChat.The text data based on the WeChat public platform is divided into three parts to study the algorithms used in public opinion analysis,which are topic classification,automatic abstract extraction and sentiment analysis.Based on the above considerations,the first study of this article is topic classification.This part is based on the more popular LDA topic model and combines the TF-IDF algorithm.An LDA-based keyword matching and subjective statistical value word comparison method is designed.This method solves the problem of determining the optimal number of topics when using the LDA topic model for topic classification.The traditional method of calculating the perplexity can only adjust the number of topics manually,and then find the lowest perplexity value to determine the optimal number of topics.A small number of parameter adjustments will cause the model to not converge,and a large number of parameter adjustments will waste computing resources.Compared with traditional methods,the clustering effect of the method designed in this paper is better.The second research content is abstract extraction.This part is based on the TF-IDF algorithm and improved.A text abstract extraction algorithm based on TF-IDF and multi-features is designed.This method solves the problem that the traditionalmethod only depends on the word frequency to determine the weight of the sentence where the word is located.In this paper,based on the traditional algorithm,the three factors of word position,part-of-speech and topic keywords are added to the feature extraction,and the weight calculation method of each factor is proposed.Through weighted summation,comprehensive weights are obtained to make the machine summary more accurate.Experiments have verified that the text recognition accuracy and recall rate are greatly improved compared with traditional methods.The last part of the research content is sentiment analysis.This part is based on the comparison experiment of Word2 vec and Doc2 vec models,combined with the deep learning model Long Short Term Memory(LSTM),a text sentiment analysis algorithm based on Doc2 vec model and deep learning model is designed.In the traditional method,the Word embedding generated by the Word2 vec model is used,but the linguistic order between words is ignored,and the Doc2 vec model has the ability of "semantic analysis" analyzed above and below.In this algorithm,LSTM can have long-term memory and retain the semantic information between words.Therefore,the Doc2 vec model is used to train the word vector,and then the generated word vector is used as the LSTM input.The final result is verified by experiments,and a higher accuracy rate,recall rate,and F1 value are obtained.
Keywords/Search Tags:Internet public opinion, LDA, TF-IDF, Doc2Vec, LSTM
PDF Full Text Request
Related items