Font Size: a A A

Research On Sentiment Analysis Of Uyghur Text

Posted on:2014-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2248330398967937Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, a large number of websites andcommunication platforms based on Uyghur have been established and improved. Dueto the explosion of the Uyghur network text content, the personal emotional speechesare gradually increasing. These emotional texts have great practical significance;therefore, the analysis of text sentiment has become a research focus in the field of thenatural language processing currently. It involves information extraction, informationretrieval, text classification, computational linguistics, machine learning, and the otherresearch contents, the research achievements have a significant role in the productrecommendations, the public opinion surveys, business intelligence, newspapereditors and other works.The method of Uyghur language text sentiment analysis is proposed in this thesis,the main research content includes the following three points:(1) Sentiment analysis of word level: an analysis model based on CRFs Uyghurword-level sentiment is proposed, the Uyghur word emotion is divided into praise,derogatory, happiness, sadness, anger, fear, panic and objective. It selects themorphology, part of speech (POS), stem, adverbs modify and others as thecharacteristics of the Uyghur emotional classification, and constructs five charactersets of the Uyghur language sentiment classification: words and POS feature set, POScollocation feature set, words and stem feature set, part of speech and stem feature set,part of speech and stem feature set. Combined with CRF model, we realized theUyghur word-level emotion eight classifications.(2) Sentiment analysis of sentence level: a sentiment analysis method based onthe emotional word in the sentence is proposed. Eight kinds of emotional categories:praise, derogatory, happiness, sadness, anger, fear, panic and objective are treated asemotional category set. Firstly, if the sentence contains an adversative conjunction, wejust only retain the latter clause. Then Conditional Random Field model is used to tag the words with the sentiment types automatically, and referring to the words ofautomatic tagging, giving a score to each sentiment type of a sentence. The highestscore of sentiment type is chosen to candidate sentiment type. Finally, we revise thefinal sentiment type of sentence according to the negative rules, sentence structureand rhetorical rules, and get the final sentiment of the sentence.(3) The research of topic extraction: a claim-level topic extraction method isproposed. This method uses GLR-Cascaded LDA (Cascaded LDA model for globaltopic, local topic and the relation between them, GLR-Cascaded LDA) to extract thelocal topics of paragraph level, global topics of document level and establish theglobal-local topic relationship, and corresponds the relationships to each opinionclaim; then it puts forward redundant pattern mechanism, and adopts Bootstrappingalgorithm and pattern matching such as fuzzy matching and multiple matching toextract the topics of explicit claims; finally, we use the implicit topic inferencealgorithm to deduce the topics of implicit claims. The ultimate goal of topic extractionis to establish an opinion quadruple of claim-topic OC, GT,LT,LT for each opinionclaim.
Keywords/Search Tags:Uyghur, Sentiment analysis, Word level, Sentence level, Topic extraction
PDF Full Text Request
Related items