Font Size: a A A

Research And Application Of Hybrid Topic Model For Public Opinion Analysis

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HanFull Text:PDF
GTID:2428330602489836Subject:Mathematics
Abstract/Summary:PDF Full Text Request
As an important information distribution center,online public opinion is an important way for people to participate in and understand social topics.However,the big data feature of public opinion information has increased the difficulty of information understanding and monitoring to a certain extent.How to extract effective information from a large amount of unstructured data,and even track the evolution analysis of specific topics has become the important task of public opinion analysis research.Therefore,this article selects the above problems as the research background,starting from improving Chinese word segmentation technology,taking the topic model and its expansion as the main method.Strive to improve the application efficiency and quality of current research methods in public opinion analysis through the innovation of research ideas.The main work of this paper is summarized as follows:(1)To solve the problem of incomplete meaning of word segmentation in the existing Chinese word segmentation algorithm.This paper proposes an improved Chinese word segmentation algorithm by defining generalized nominal token,which implements the improvement of Pkuseg word segmentation algorithm on semantic integrity and unregistered word recognition based on the three rules of replacement,merge and modification.In practical application,it is found that the feature segmentation based on this algorithm has higher efficiency and quality.In view of the influence of the improved algorithm on word length features,this paper verifies its influence on three common feature weighted calculation methods.The experiment found that the feature weighting method based on TFIDF is better than the other two methods involving word length features.(2)Aiming at the problem that the traditional online topic model has rough processing of new words and insufficient consideration of a priori information sources during the prior transmission of adjacent topic parameters.This paper uses the doc2vec model to extract document vectors and word vectors to expand the semantic space of the traditional online topic model,and to achieve the improvement of the existing online topic model.To a certain extent,the semantic mining depth of the topic model is improved.Through experimental demonstration,it is found that in the topic evolution analysis of mixed corpus,it has a more stable effect than the existing advanced methods.In addition,in the evolution analysis of subtopics of similar corpora,the improved model captures subtopics generated by topics more sensitively and accurately.(3)Aiming at the problem of the limited ability of traditional theme models in short text.Based on the structure of online topic model,this paper proposes an extension scheme of short text topic model with initialization.First,add pseudo-timestamp labels to the external corpus and the current modeling corpus:the initial time and the current time.Then,based on the parameter process of the online topic model,the current text topic modeling efficiency is improved.The purpose of this structure setting is to increase the characteristics of the text while ensuring the balance of the topic relevance between the external corpus and the modeling corpus,and this method of simulating the parameter transfer process is conducive to alleviating the topic modeling on short texts that is susceptible to noise status quo.In practical application,it is found that the degree of topic consistency is significantly improved,which is better than the Dirichlet multinomial mixture(DMM)on short text.
Keywords/Search Tags:Chinese word segmentation, nominal segmentation, topic model, online topic model, prior information transfer
PDF Full Text Request
Related items