Font Size: a A A

Semi-supervised Learning Based Sentiment Analysis In Online Catering Reviews

Posted on:2017-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:J FengFull Text:PDF
GTID:2348330503472361Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Sentiment analysis has been paid much attention owing to the rapid development of the Internet in the last decade. The way to extract opinions, sentiments and attitudes from mountainous texts has been a meaningful filed, among which, sentiment lexicons extraction and sentiment orientation classification are two typical and significant tasks.Traditional approaches to extract sentiment lexicons mainly depend on relationship between words or the syntactic rules but ignore latent semantic behind words and phrases. Most of current sentiment classification algorithms are based on supervised learning and semi-supervised learning classification. However, supervised learning requires for a large number of labeled data, which is a really time-costing and manual-intensive work. Differently, semi-supervised learning does not need too many labeled information but it usually neglects the latent information and features behind the unlabeled data. Therefore, the performance of classifier has a higher standard for the initial labeled data. Small number of labeled data makes the performance decays greatly. If the quality of the initial training data descends, the performance reduces too. To improve the problems above, some of researches are made as follows.First, we propose an approach to extract sentiment lexicons based on dependency parsing and Word2 Vec. By extracting certain dependency relations from positive and negative reviews, we collect nouns, adjectives, adverbs and verbs as the initial sentiment lexicon. Then, with a Word2 Vec model trained with all reviews, we select words with the nearest distance from initial lexicons and filter them by word frequencies. The experiment demonstrates that the extracted lexicon can improve the performance of classifier.Second, we propose a latent information miner to make full use of unlabeled data and enhance the performance of classifier by adding unlabeled data into classification with an improved exemplar SVM model. Based on the latent information miner, we propose a more robust and better performed semi-supervised sentiment classification algorithm. Firstly, we train two latent information miners for positive and negative classes respectively and we call the two miners bilateral information miner. With the bilateral information miner and a filter, we select credible samples from unlabeled data and add them into the initial labeled data. After that, we train the final sentiment learner with self-learning algorithm. As is illustrated in experiments, performance improves as the number of initial labeled or unlabeled data increases and the proposed system can achieve a better performance with small amount of initial training data. Moreover, the system turns out to be more robust than others, which means it will not weaken as the quality of initial training data changes.
Keywords/Search Tags:Sentiment analysis, Semi-supervised learning, Dependency parsing, Word2Vec, E-SVM
PDF Full Text Request
Related items