Font Size: a A A

Research Of Cross-domain Word Sentiment Orientation Identification

Posted on:2016-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2308330473957055Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet, a variety of network information is explosive growth, and with it the emergence of a large number of micro-Bo, product reviews and other information, the information often these comments with a certain emotional bias. Therefore, how to automatically identify these comments polar emotions in natural language processing is becoming increasingly important. However, the cost of getting datasets with the same distribution is too large, and so cross-domain sentiment classification is becoming a universal concern.Words are the basic statement, how to identify word sentiment orientation become a research focus. Compared with single-domain word sentiment orientation identification, there are many problems such as sentiment divergence, thus making the task facing greater challenges.In this paper, we study the case of how to identify the word sentiment orientation in cross-domain, mainly as follows:1) First, for the issue of word sentiment divergence in reviews we propose a method based on paradigm words without sentiment divergence(COI). The algorithm extract paradigm words automatically in a given corpus, and filter the words with sentiment divergence based on co-occurrence matrix. Target words sentiment orientation will be identified through calculating the similarity among words. Experiment results demonstrate the effectiveness and feasibility of the algorithm.2) Secondly, for the problem of words mismatch as the result of data sparse, we propose a method of word sentiment orientation identification based on synonym(PCOI). The algorithm is based on COI, mining synonyms from target domain and using them to replace the words mismatch. Thus, the paradigm words can match well in the corpus and avoid the issue of mismatch because of data sparse. Meanwhile the information without marks in target domain has been used well and experimental results have improved.
Keywords/Search Tags:cross-domain, sentiment orientation, sentiment words, data sparsity
PDF Full Text Request
Related items