Font Size: a A A

Research On Chinese Sentiment Analysis Based On Expanded Sentiment Lexicon And Word Embedding

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:B J DingFull Text:PDF
GTID:2308330482999728Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Sentiment analysis is a subject related to natural language processing, data mining and artificial intelligence technology, and has recently become a hot topic. Through text mining and analysis, sentiment analysis identified the sentiment message (such as negative or positive or neutral). This paper is based on the study of Chinese and get the sentiment polarity (positive or negative) from Chinese text. The key factors of sentiment analysis are:sentiment lexicon, semantic context, word order and emotional information. But the traditional sentiment analysis methods have many problems:the existing sentiment lexicon have poor coverage, and each sentiment word have the different polarity in the different fields, feature dimension of traditional feature selection is high and ignore the semantics and word order. To solve the above problems, the paper made the following contribution.This paper researched methods used to expand sentiment lexicon and solved the coverage of sentiment lexicon, which includes two methods:expanded sentiment lexicon based on the rules and expanded sentiment lexicon based on English sentiment lexicon. The method based on the rules consists of three main stages: collect rules manually, get the candidate sentiment words and determine the sentiment polarity; this method mainly used rules and PMI. The method based on English sentiment lexicon expand Chinese sentiment lexicon on English sentiment lexicon and the word alignment information between Chinese words and English words in English and Chinese parallel corpus, this method can get more sentiment words.This paper researched feature selection and representation based on word embedding that were used to solve the problem that feature dimension is high and ignore semantic and order information in traditional methods. We proposed two methods:based on word embedding combined with sentiment information and based on sentence vector. The method based on word embedding and sentiment information combined word embedding of each words and sentiment word in sentiment lexicon as features, which considered semantic context information and sentiment information. The method based on sentence vector trained a sentence overall as a vector used for training classification model, which considered semantic and the word order.Finally, in order to verify the efficiency of the proposed methods, the paper used the proposed expanded sentiment lexicon method and feature selection and representation based on word embedding in sentiment analysis, draw three sentiment analysis frameworks used for the experiments. We used Python environment at experiment, using the web crawler to get commodity reviews data sets, using large corpus to train word embedding and sentence vector, using parallel corpus to get the align information between Chinese words and English words. We used two experiments. Firstly, in order to verify the efficiency of sentiment lexicon building and expanding method, we compared traditional sentiment lexicon method with sentiment analysis based on the proposed sentiment lexicon building and expanding method. Then, to verify the efficiency of feature selection and representation based on word embedding, we compared sentiment analysis based on combination of word embedding and sentiment information and sentiment analysis based on sentence vector with sentiment analysis based on the traditional machine learning method. From the experimental results, we draw the conclusion that the proposed sentiment lexicon building and expanding method and the proposed feature selection and representation based on word embedding are effective.
Keywords/Search Tags:sentiment analysis, sentiment lexicon, natural language processing, the word embedding
PDF Full Text Request
Related items