Font Size: a A A

A Study On The Sentiment Orientation Of Tibetan Short Texts

Posted on:2018-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:2358330515482173Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet and Web2.0 technology,more and more users express their views and opinions in micro-blog,blog,forum,shopping and other website.There is constantly producing a large number of user behavior,review data on the Internet every day.These user data are of great use value,but in the face of this mass of data,artificial statistical analysis methods have been unable to complete.Therefore,it is an urgent problem that how to use the computer which has the ability of fast calculation to analyze the data automatically.As the key technology to solve this problem,the sentiment tendency analysis aims to judge the emotional tendency of a given text automatically,and help people to deal with the problem of massive data analysis better.After many years of development,the technology of sentiment orientation analysis has been relatively mature in Chinese and English,however,for the late start of Tibetan information,the study of the Tibetan sentiment orientation analysis is lagging behind.In this paper,we study the sentiment orientation analysis of Tibetan short text,the main research contents include:We present a method of Tibetan emotion classification based on SVM multi-feature fusion.First of all,this paper adopts the method of artificial to select Tibetan emotional characteristics.At the same time we use the available Tibet Chinese dictionary and Chinese emotion dictionary build a Tibetan emotional dictionary and other emotional resources through the machine automatic matching method and manual proofreading.With the manual sentiment features,we use SVM classifier for Tibetan text sentiment classification.The experimental results show that the classification result is not good because the coverage rate of the emotional dictionary we build is not high enough.In order to solve this problem,we introduce the feature selection method of information gain,and the experiment result shows that this method is more effective than the method based artificial feature selection.Then,we introduce the idea of feature fusion,and combine the SVM classifiers based on the above two feature selection methods.Experimental results show that the fused classifier is better than the single classifier,and the classification accuracy is 80.83%.In this paper,the method of semi-supervised Recursive Autoencoders is applied to Tibetan emotion classification.The Recursive Autoencoders method can represent the semantic information of the text better,and represent the sentence level text as a distribution vector,thus improve the classification effect.The input of this model is word vector,we crawl a lot of Tibetan text from web and train a Words Vector Model which has a good effect.Then,we experimentally explore the effectiveness of this method and the affection of word vector dimension and other parameters on classification results.Experimental results show that the proposed method achieves the best classification result when the alpha parameter is 0.3,and the word vector dimension is taken as 60.The accuracy of the classification is up to 82.25%,which is higher than the method based on the feature fusion.
Keywords/Search Tags:Tibetan short text, Sentiment Orientation analysis, Multi-feature fusion, Semi-supervised Recursive Autoencoders
PDF Full Text Request
Related items