Font Size: a A A

Research Of Cross-language Sentiment Analysis Based On Instance Transfer

Posted on:2017-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y M GengFull Text:PDF
GTID:2428330596457444Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the Web 2.0 era,platforms such as blog,BBS,social networking have emerged.A large number of text information about comments appeared on the network,which reflects the subjective understanding of users about some incident or some goods.To a certain extent,it also reflects the emotional state of social collective.The tagged corpus of English is sufficient,so more research on sentiment analysis has been conducted by using English and less research has been conducted by using other language.Therefore,with the help of English sentiment analysis,it is possible to do sentiment analysis by using other language.This article studies the method to realize the cross-language sentiment analysis by using the instance transfer learning method.Classified model of target language has been built by making use of the portability of knowledge of the source language training dataset,reducing the tagged corpus resource requirements of the target language.research aimed at cross-language sentiment classification has been conducted,making use of the Tr Adaboost algorithm that predecessors referred to,and proposes two kinds of cross-language sentiment analysis method based on the instance transfer.The main work is as follows:1)Against the polarized problems between the source and the target language training sample weights,IMTr Adaboost algorithm has been put forward,which improves the Tr Adaboost algorithm in the aspect of weight updating strategy,reducing the increasing range of target language training sample weight,which effectively improves the classification effect of target language prediction classifier.2)The source language training sample has not been filtered before the classifier constructed,part of the source language training sample may probably do harm to migration learning of target tasks,Aiming at this problem,dataset reconstruction algorithm based on self-learning was proposed,that is Boost Tr A algorithm.Selection strategy of self-learning has been taken advantage of to pick up the source language training data which benefits the target tasks migration learning firstly.Method of dataset reconstruction has been adopted to use Bootstrapping technology to make random and equal division of a large number of selected source language training data,to get each part respectively,and combined it with a small amount of the target language training data,getting multiple reconstruction of training dataset,and then training classifier to make it integrated respectively,using the integrated classifier to tested on the dataset.3)Boost Tr A algorithm was verified in the experiment by making use of the data of three areas,BOOK,DVD and MUSIC.Compared with Tr Adaboost algorithm,in these three areas,Boost Tr A algorithm improves the classification accuracy by an average of 10%.
Keywords/Search Tags:sentiment analysis, instance transfer, weight updating, dataset refactoring
PDF Full Text Request
Related items