Font Size: a A A

Research Of Cross-lingual Sentiment Classification Method Based On Improved Boosting

Posted on:2019-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:S S DongFull Text:PDF
GTID:2428330623468767Subject:Engineering
Abstract/Summary:PDF Full Text Request
The sentiment classification aims at fully mining and judging the sentiment attitude of text publishers by computer technology,and providing valuable reference information for decision makers.However,related technologies highly rely on the quality and quantity of training corpora,while the current situation of uneven distribution of effective high-quality analysis corpora and emotional dictionary resources at home and abroad makes cross-lingual sentiment classification research come into being.Researchers have made much effort to cross-lingual sentiment classification technology,but there are still the following problems: When implementing language mapping,the dependence of words on their context and their fields is neglected;theme transfer and translation errors may lead to differences in data distribution;All source language instances are applied to training classifiers,which may differ from the distribution of the target language.In view of the problems in existing studies,the research work in this paper mainly includes the following contents:1)Combining the transfer learning technology with Boosting algorithm,the ClAdaBoost algorithm is proposed and applied to cross-lingual sentiment classification research.First,the initial weak classifier is trained on the joint training set consisting of source language and target language,and then the error rate calculated based on the target language training set is used to update the sample weights and we will obtain a new weak classifier.With this iteration,a series of weak classifiers are finally combined with certain rules to form a strong classifier that is friendly to the target language samples.2)Based on the above improved algorithm,the ClKAdaBoost algorithm is proposed.The K-nearest neighbor algorithm is used to filter the source language training instances.Due to the improved Boosting algorithm(ClAdaBoost)described above,the weight of mispredicted instances will be increased so that the next base classifier can learn previously unobtained knowledge.However,in the context of transfer learning,source instances that have not been correctly predicted may be far from the target domain,and increasing the weight of these instances may mislead the base classifier to learn excessive source noise.Therefore,this paper adopts K-nearest neighbor algorithm to filter the source language training samples before constructing the classifier.It tries to select those instances in source language that are “useful” for the target task learning,and then forms a strong classifier with high accuracy based on Boosting technology.3)The two algorithms proposed in this paper are compared with a variety of benchmark algorithms and original algorithms on the data set NLP&CC 2013.The experiment results show that the first algorithm ClAdaBoost proposed in this paper improves the classification performance compared with the algorithm without using the transfer learning technology.And the second algorithm ClKAdaBoost has higher classification accuracy than the ClAdaBoost algorithm.
Keywords/Search Tags:sentiment classification, cross-lingual, Boosting, transfer learning, K-Nearest Neighbor
PDF Full Text Request
Related items