Font Size: a A A

Research On Sentiment Classfication Method Based On Semi-supervised Learning

Posted on:2020-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:W T LiuFull Text:PDF
GTID:2428330575456413Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,more and more users are keen to comment on the web.If we can automatically dig out the emotional tendencies contained in these subj ective texts,it will have great application value and economic value to individuals,enterprises,and the government.Text sentiment classification technology is the most effective tool to solve this problem.As a universal machine learning technology,semi-supervised learning can make full use of unlabeled samples to improve classification performance.In this fact,many scenes of text sentiment classification are faced with insufficient corpus,and the labeling of samples is time-consuming and laborious,this paper focuses on the semi-supervised learning sentiment classification.The main innovations and work of this dissertation are as follows:Firstly,this paper proposes a collaborative training sentiment classification algorithm based on stratified sampling random subspace.The algorithm adopts stratified sampling method to construct subspace,and improves the semi-supervised learning algorithm of random feature subspace directly applied to the text sentiment classification.The subspace of some parts may not contain strong correlation attributes.The algorithm effectively improves the sufficiency of each subspace while ensuring the diversity of subspaces.Experiments show that compared with the semi-supervised learning algorithm based on random feature subspace and other commonly used semi-supervised learning algorithms,the classification effect of the algorithm is better.Secondly,a semi-supervised sentiment classification algorithm based on diversity and high confidence estimation is proposed.In the process of iterative training,the incremental self-training algorithm is easy to introduce mislabeled samples.The proposed algorithm combines the posterior probability and prior distribution information of the sample to improve the problem.In order to avoid the concentration of selected sample distribution,which will produces the data space is inconsistent with the real distribution,the algorithm adopts diversity metrics to ensure mutual differences.Experiments show that compared with some commonly used incremental semi-supervised learning algorithms,the proposed algorithm has better classification performance.
Keywords/Search Tags:sentiment classification, semi-supervised learning, stratified sampling, high confidence, diversity metrics
PDF Full Text Request
Related items