Font Size: a A A

Automatic Identification Of New Sentiment Words For A Register Of Language Based On Semantic Association

Posted on:2017-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2348330512451085Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Identifying new sentiment words as a sub-task already has occupied the vital position in sentiment analysis.But the universal method has not utilized the character of a register of language for identifying new sentiment words,this paper made a research about auto-identifying new sentiment words about a register of language,which based on semantic association,according to the unique trait of microblog and prose.There are works as follow:(1)Aiming at oral character of microblog,an automatic extraction of new words about microblog was proposed based on the semantic association.Firstly,a new word,which was incorrectly separated into several words using the Chinese auto-segmentation system,should be assembled as the candidate word.In addition,to make full use of the semantic information of word context,the spatial representation vector of the candidate words was obtained by training a neural network.Finally,using the existing emotional vocabulary as a guide,combining the association-sort algorithm based on vocabulary list and the max association-sort algorithm,the final new emotional word was selected from candidate words.The experimental results on the task No.3 of COAE2014 show that the precision of the proposed method increases at least 22%,compared to Pointwise Mutual Information(PMI),Enhanced Mutual Information(EMI),Normalized Multi-word Expression Distance(NMED),New Word Probability(NWP),and identification of new sentiment words based on word embedding,which proves the effectiveness of the proposed method.(2)Aiming at standard trait of prose,an automatic extraction of new words about prose was proposed based on the semantic association.Firstly,the intersection and subtraction operation were applied to the corpus and the general sentiment dictionary,making the different set as DSet,the intersection set as JSet,and DSet was viewed as the candidate set of new words.In addition,this paper presented an algorithm to calculate the sentiment-semantic intensity based on modern Chinese dictionary,and JSet was calculated using it.Finally,by the guide of the JSet and Chinese thesaurus,the new sentiment words were selected from the candidate set using the sorting algorithm based on semantic association.The experiments resulted on 71460 prose show the method is effective,which reached the 49%in precision under the best circumstance.(3)This paper designed and implemented auto-system for identification of new sentiment words about microblog and prose,which utilized the above algorithms and based on C/S frame.Independent class for each sub-function also was designed to satisfy the "highly cohesive and low coupling".In addition,the thought of inverse index and multi-thread were imported in the system to process the big data.The system enhances the visual of results,conveniences the future research,and increases the usability of methods.
Keywords/Search Tags:Recognition of new sentiment words, Semantic relatedness, Microblog, Prose, Sort algorithm
PDF Full Text Request
Related items