Font Size: a A A

Research On Text Orientation Analysis Method Of Small-scale Corpus Based On Word Vector Model

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:J B RenFull Text:PDF
GTID:2518306563459974Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the major social network platforms represented by Weibo have gradually become an indispensable part of people's social life.In the interactive process of users' communication and speech on social network,massive text information will be generated,which has great value of data mining.Text orientation analysis,which mainly takes text as the research object and focuses on mining the emotional tendencies(including opinions,attitudes,emotions,etc.)contained in text,has become an important research direction in the field of natural language processing.Although previous research has obtained certain achievements,but there are still many problems.For example,the power of word vector representation is limited,the existing research cannot solve the problem of polysemy,in addition,the learned word vectors are difficult to meet the needs of specific tasks;There is a lack of sentiment dictionaries in specific fields,and it is difficult to identify new emotion words in the process of constructing dictionaries;Study of small-scale corpus of text tendency analysis also is not very rich,etc.According to the above problem,this paper's main work is divided into the following three parts:1.Aiming at the limited representation ability of word vectors,this paper starts from the feature representation of text and improves the classic Glo Ve model.It uses position embedding to measure the correlation between words and their context to construct cooccurrence matrix.At the same time,it also integrates emotional priori features with semantic features and introduces them into the process of matrix modeling.In addition,for the dichotomous text orientation analysis task,this paper introduces the adjustment parameter to correct the sentiment prior error caused by the imbalance of the data set.By the experimental results show that the improved model is greatly improve the ability to learn the word vector said,to be able to meet the needs of a specific task.2.Aiming at the lack of domain-specific sentiment lexicon and the difficulty in identifying new sentiment words in the process of constructing the lexicon,this paper proposes a multi-level dynamic updating method for constructing domain-specific sentiment lexicon.Firstly,the polarity of some sentiment words was calibrated by the general sentiment dictionary,which was used as the training expectation.Then,the polarity classifier was designed based on the neural network,and the dynamic updating rules of the dictionary were defined to ensure the timeliness of the dictionary.The experimental results show that the accuracy of the affective polarity classifier designed in this paper is much better than the method using word vector similarity directly,and the constructed domain sentiment dictionary is suitable for text orientation analysis.3.To solve the problem that the deep learning model is not effective on small-scale corpus,this paper optimizes the fine-tuning process of the pre-trained BERT model,and proposes such strategies as dynamic learning rate and early termination strategy,reducing the number of layers in the model Transformer block,sentence coding based on pooling,and threshold fine-tuning.The experimental results show that the improved model achieves a faster convergence rate,solves the problem of poor generalization ability,and can be effectively applied to small-scale corpus text tendency analysis tasks.
Keywords/Search Tags:Word vector, Text orientation analysis, Domain emotion dictionary, BERT
PDF Full Text Request
Related items