Font Size: a A A

Research On Short Text Aspect Extraction Base On Topic Model And Word Embedding Mechanism

Posted on:2022-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:H X WuFull Text:PDF
GTID:2518306548961179Subject:Engineering
Abstract/Summary:PDF Full Text Request
Sentiment analysis is one of the most concerned research directions in the field of text analysis,and its related difficulties can be quickly applied in the industrial field.The aspect extraction task is an important basic work in the sentiment analysis task,and the result will directly affect the effect of sentiment analysis.Traditional aspect extraction algorithms are used in long text types,such as newspapers,articles,blogs,etc,when these models are used in short text scene,the effect is really poor.However,with the increasing popularity of the Internet,the explosive growth of short text data has become more apparent.Therefore,it is very urgent to design algorithms specifically for this type of data.This article focuses on a certain degree of research on aspect extraction algorithms in short text scenarios.The main work and results of this paper are as follows:(1)The data in the short text scene has the characteristics of less vocabulary,large sparseness,and large ambiguity.The traditional long text model is not ideal when dealing with these problems.This paper proposes an improved aspect extraction algorithm based on BTM.The BTM model originally did not consider the impact of the semantic relationship between words on topic mining,and also ignored the semantic information of context words.This article proposes two improvements to this: First,the word vector model is introduced to calculate the relevance between words;second,the self-attention mechanism is introduced to strengthen the semantic relevance between words.This article conducted a series of experiments on two standard data sets to verify that the performance of the model has been significantly improved compared to the previous model.(2)Aiming at the characteristics of short text data such as large sparseness and insufficient contextual semantic information,this paper proposes an aspect extraction algorithm WESM based on word embedding mechanism and self-attention mechanism.The WESM algorithm introduces a word embedding mechanism and a self-attention mechanism on the basis of the vocabulary co-occurrence network,and adds the correlation between words and contextual semantic information,which greatly alleviates the problem of polysemous words in the text.Experiments show that the WESM algorithm has a good performance on two standard data sets.
Keywords/Search Tags:aspect extraction, topic model, word embedding, attention mechanism
PDF Full Text Request
Related items