Font Size: a A A

Research On Text Topic Analysis Algorithm Based On Information Fusion

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:D WuFull Text:PDF
GTID:2428330623467771Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology and the rise of big data,the Internet has become an important source for people to publish and obtain information.Among them,most of the information generated by users is accumulated in the network in the form of text.How to extract hidden topics from these large-scale unstructured texts is one of the important research issues in natural language processing,and it has been widely used in the fields of public opinion monitoring,comment analysis and content recommendation.At present,many domestic and foreign scholars have proposed a series of algorithms for text topic mining,but there are still the following deficiencies.First,the existing methods do not effectively use auxiliary information in the metadata attached to the document,such as the reviewer 's rating and emotional polarity included in the review,and authors and citations included in the paper.These metadata contain rich structural information,and the impact of different types of metadata on document content is different.Current topic analysis methods either ignore these metadata,or use only a specific type of metadata,or perform the same processing on different metadata,resulting in the inability to make full use of the auxiliary information in the metadata.Also,existing methods cannot handle noise contained in metadata.Second,the existing methods do not consider the relationship between the emotional information of a word and the topic of the word,and the two are actually closely related.Specific to problems mentioned above,this paper proposes two improved topic analysis algorithms,which respectively incorporate metadata and sentiment information into topic mining.The study of this paper can be summarized as follows:(1)For the academic paper network containing author and citation metadata,a probabilistic topic model that combines author and citation metadata information is proposed.Considering the interaction between the two metadata and the document topics,the model has designed a differentiated fusion strategy,which makes full use of the prior knowledge of the author and citation information related to the document topics.Specifically,the model fuses coarse-grained author information into the prior distribution parameters of document topics,while designing a topic propagation mechanism to fuse topic-related information in fine-grained citation information.At the same time,in order to deal with the noise in the metadata,the importance of each author and the influence of different cited documents on the topics of the document are modeled to ensure the robustness.Experimental results on real datasets verify the effectiveness of the proposed metadata fusion method to improve the quality of topics.(2)For online reviews containing sentiment information,a deep text topic analysis algorithm combining sentiment classification is proposed.Given the inefficiency of single topic vector in extracting topic sentimental information,the model assigns an attribute vector and sentiment vector to each topic category,which are used to extract the attribute and sentiment information of the corresponding topic in the text,respectively.At the same time,the model extracts the unique local topic information of each sentence and integrates it into the global attribute vector and sentiment vector,so that the attention layer can use this information to obtain a more accurate attention distribution.Finally,the model describes the semantic association of attribute words and sentimental words under the same topic through a coupled attention mechanism,and uses a multilayer attention network to model the complex semantic interactions and long-distance grammatical dependencies between the two kinds of words.
Keywords/Search Tags:Topic analysis, metadata fusion, probabilistic graphical models, sentiment analysis
PDF Full Text Request
Related items