Font Size: a A A

Research On Multilingual Micro-blog Hot Event Opinion Tendency Analysis Method

Posted on:2019-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q GuoFull Text:PDF
GTID:2428330566966998Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of social networks and the wide application of social media,users can not only easily obtain mass data,but also be able to participate in the release and dissemination of information.Nowadays,micro-blog is as a popular social platform,more and more people participate in it and share,disseminate and comment on what they have seen and heard.Therefore,by analyzing the micro-blog users' views on hot issues,we can understand the public's perception of events and the trend of events.However,the Tendency Analysis of viewpoint is mostly for the single language micro-blog.It is rare to analyze the opinion tendency of multilingual micro-blog.The characteristics of multilingual text and the nature of micro-blog increase the difficulty of multilingual micro-blog opinion Tendency analysis.In this paper,multilingual micro-blog hot event opinion tendency analysis method is studied.The main research contents include:First of all,this paper firstly compares the four language recognition models Langid.py,TextCat,CLD,LangDetect,and chooses Langid with the high accuracy as a language recognition tool,and points out some problems that exist in this paper's experiment by the Langid language recognition tool.For this reason,this paper studies the domain number,group number,text type,English and transliteration Uighur ratio and the number of extracted features.Based on this,this paper establishes a language recognition model.This model can effectively recognize four languages: Chinese,English,Uighur,and transliteration Uighur.Among them,the recognition accuracy of Uighur can be reached to 100%,The accuracy of long text recognition for English and transliteration Uighur with similarity of structure and writing is 97%,the accuracy of the short text recognition is 85%.The establishment of this model provides a premise for the establishment of bilingual parallel corpora and the multilingual micro-blog opinion tendency analysis;Second,bilingual parallel corpus is an important foundation for multilingual research.At present,Uighur and transliterated Uighur parallel corpora are lacking.Therefore,eight kinds of translation rules have been designed in this paper.Uighur and transliterated Uighur parallel corpora have been established using artificial translation,machine proofreading and manual proofreading,and this corpora is basis of multilingual opinion tendency analysis;Finally,based on the tagged bilingual parallel corpus,the method of training the opinion perception vector is studied in the multilingual micro-blog hot topic opinion tendency analysis.In this paper,we use bilingual parallel corpus to train bilingual words vector,add opinion constraints in bilingual words vector,train opinion to perceive word vectors,and use logical regression algorithm to establish an opinion tendency analysis model.The effect of sentence length and key words on the effect of classification is analyzed in the opinion tendency model,and it is found that the sentence length has an influence on the classification effect and the key words have no effect.Then an improved opinion tendency analysis method based on word vector is proposed.This method divides sentence vectors by sentence length,and after training,the errors of sentence vectors before and after updating are evenly divided into each word.Experimental results show that this method can reduce the influence of sentence length on the analysis of opinion orientation.
Keywords/Search Tags:Sina Weibo, Opinion Tendency Analysis, Multilingual Recognition, Langid Model
PDF Full Text Request
Related items