Font Size: a A A

Sentiment Analysis Of UGC Video Reviews Based On Self-attention Mechanism

Posted on:2021-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2518306041461384Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of mobile Internet,video websites and related applications,the number of Internet and User Generated Content(UGC)videos showed explosive growth.As the number of video viewers increases,a large amount of review data will be generated by each video.Through the analysis of these data,video creators can better create video content,provide better services for the platform,explore cultural phenomena behind the data,thus providingguidance for the formulation of relevant cultural policies.As one of the basic tasks in the field of natural language processing,sentiment analysis is also a key technology in the fields of big data analysis and artificial intelligence.Through sentiment classfication,the sentiment tendency in the data can be mined to provide important reference information for the data analysis.Existing Chinese sentiment classification objects are generally social networking platforms and service websites such as Taobao and DIANPING.The sentiment classfication tasks on video website review data are rarely concerned by scholars and researchers.The main reasons are as follows:Firstly,traditional video websites are mainly based on third-party video content.The platform has less support for users'communication services on this platform,and the retention rate of users in the comment area is low.However,comments on emerging short video platforms such as Tik Tok(Douyin)have word limit,which leads to inadequate communication and expression of emotions.Secondly,video review data are unlabeled data.To conduct supervised learning tasks such as sentiment classification,it is necessary to label a large amount of data.Finally,due to the particularity of the UGC,it is easy to form the sub-culture circles which have their own unique ways of language expression,and data annotators cannot produce high-quality annotated data stably without knowing it among its users.At the same time,these special language expressions also pose a challenge to the language comprehension ability of sentiment classification.In order to solve the problems mentioned above,three aspects from data source selection,artificial sentiment annotation,and targeted special sentiment analysis model are conducted to explore the methods of sentiment classification of comments on video websites.On the data source selection,four factors from the number of users,user activity,the number of user retention in the comment area,and less review limitation are included to select video platforms.After comprehensive consideration,Bilibili was selected as the source of review data.A certain amount of review data was got through the crawler technology,and a complete data cleaning scheme was developed according to the characteristics of the data,and the sentiment label of the review data was marked according to related psychological theories.Aiming at the problem of insufficient ability to extract hierarchical information from text and OOV(Out Of Vocabulary)issue in traditional sentiment classification,this paper proposes a new model which combined the Ordered-Neuron-LSTM(ON-LSTM)network with subword embedding input vector and self-attention mechanism(SA).The algorithm solves the OOV problem through subword vector,and the model can learn the hierarchical manner information of text and retain key information,the self-attention mechanism has powerful ability to extract the features of different location in the same text,all above model structures could help improve the classification accuracy of the Bilibili sentiment classification task with dataset of the Bilibili review.In the experiment,the comparative experimental model was set by replacing the key structures in the proposed model.In this part Long Short Term Memory Network(LSTM)model was used as the baseline model.The effectiveness of the proposed model was proved by experimental data results.In order to fully mine the features of the text and overcome the problem of semantic understanding brought by the single granularity of information in input data,a self-attention-ordered neuron(SA-ONLSTM)network model with mixed multi-granularity input structure is proposed in this paper.Two structures is proposed in new model,one is two-granularity-vectorization methods for the same text data are used,combining with various input data of multi granularity as input variable;another is increasing the thickness through the multi-head mechanism can make different heads of models focus on different features in the text.The two above structure gives the model feature extraction capability for global information.In the experiment,the key structure in the model was replaced and the input structure of the model was fine-tuned to compare the performance of the improved algorithm in the sentiment classification task of B station review.The experimental results prove the feasibility and effectiveness of the algorithm on the B station review data set.
Keywords/Search Tags:UGC video comments, sentiment classification, self-attention mechanism, multi-granularity input
PDF Full Text Request
Related items