Font Size: a A A

Research On Short Text Sentiment Analysis And Its Applications

Posted on:2019-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1368330623453328Subject:Computer Science and Technology Network and Information Security
Abstract/Summary:PDF Full Text Request
The short text itself has strong characteristics of the times.The development of the Internet makes short texts have features of strong real-time,wide spreading,fast updating,fragmentation,non-standard wording and features sparsing,etc.,and those features let the traditional algorithm be very difficult to obtain effective features from short texts.Apart from being an important branch of artificial intelligence,short text sentiment analysis has become a great demand at present.However,there are lots of problems over the short text sentiment analysis,for example,the word vector it used usually faced with the limited expression ability,labeled short text data is difficult to expand,the implicit meaning of short text is not easy to express which let the sentiment analysis over few labeled short text data be hard,sentiment analysis needs high-level programming skills which makes it hard to promote.1.In order to solve the problem of limited expression ability of the word vector,the word vector model for sentiment analysis is proposed.During the model,sentiment prior is added to the learned word vector.Models such as DLJT1,DLJT2,DLJC1,DLJC2,WLJT,and WLJC are designed according to the different ways of sentiment prior joining.Through the comparison and analysis,we find that the model using the sentiment ratio as the sentiment prior,which is derived from the current word,the best results are obtained,i.e.the model DLJT2.In the short text sentiment analysis,DLJT2 also achieves the best results.2.In order to solve the problem of shortage of short text data,based on generative adversative neural network,this paper proposes the data augmentation model CS-GAN,which also combines the conditional LSTM,reinforcement learning as the basic modules.In addition to the generator and discriminator,it also includes a classifier to ensure the generated sentence contain the label information.Compared with the real samples,the inner consistency and the outer consistency of the generated samples are verified.Then,through unstack experiments,the roles played by GAN and reinforcement learning are validated.Finally,combined with the generated labeled text data,the generalization ability of the supervised model can be greatly enhanced.Experimental results of sentiment classification show that CS-GAN works well over data sets with less training examples or more label information.Therefore,CS-GAN solves the problem of short text data augmentation well.3.Apart from generating the labeled text data,the shortage of the text data can also be solved with the semi-supervised learning.This paper proposes a semi-supervised disentangled VAE model SDVAE.The core idea is to add an equality constraint based on short text sentiment prior to the ELBO and assumes that the disentangled variable and noninterpretable variable which extracted from the text data are independent.The disentangled variable makes the model reduce the parameters of the classifier and enhance the training efficiency.Based on the different ways of adding the equality constraint,the model SDVAEI and SDVAE-II are designed,and then they are expanded to SDVAE-I&IAF and SDVAEII&IAF based on the inverse regression flow IAF.In order to visualize the roles played by disentangled variable and non-interpretable variable,the t-SNE is applied,together with the results over the image data,we can find that the disentangled variable contains the category information,and it is difficult to reconstruct solely over the disentangled variable.However,there is no category information over the non-interpretable variable,but it does contain data reconstruction features.At the same time,SDVAE also effectively solved the problem of the implicit meaning expression of short texts.The final experimental results show that SDVAE itself can achieve a good sentiment classification.With the help of inverse regression flow,SDVAE-II&IAF achieves the best classification results.4.In order to make the sentiment analysis more easily,we also design and build a proto-system of deep learning platform,which fuses the key elements of the neural networks.The user can build their models like playing the stacker games.All of the proposed models in this paper are listed as the example experiments in the platform which lets the user start the new experiment quickly.Over the platform,the application of the sensitive information based on the sentiment analysis is studied.After verifying the intrinsic link between the sentiment of short texts and the sensitive information they contain,the model of DS based on sensitive word frequency is designed.This model can effectively identify the sensitive information over social media.Compared with traditional sensitive information identification methods,the accuracy of the model in sensitive information detection has increased by about 20% in general.Finally,the sensitivity of short texts and the sentiment tendencies they contain are visualized in the deep learning platform.
Keywords/Search Tags:Short text sentiment analysis, Word vector model, short text data augmentation, semi-supervised learning, sensitive information recognition
PDF Full Text Request
Related items