Font Size: a A A

Research On Micro-blog Sentiment Classification Based On Co-training

Posted on:2019-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330563454791Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the public awareness of self-expression,microblog and other self-media platforms have emerged.The public generally likes to use platforms such as microblog to get information,to discover new things,and to post comments to express their views.The microblog comment text has the characteristics of short length,unfixed format,small amount of information,but fast producing speed,which contains the public's emotional inclination.In this thesis,we will combine the techniques of micro-blog acquisition,crowd computing,Spark parallelization and semi-supervised collaborative training to study the problem of micro-blog emotion classification based on co-training.In this thesis,we first research the Weibo API-based microblog text collection method and the crawler-based microblog text collection method,then use the crawler-based microblog text collection method to obtain the training and test datasets of the semi-supervised co-training classification algorithm,and propose a kind of new co-training classification model with the introduction of crowd computing system model.Finally,complete the parallelization of the co-training algorithm on the Spark platform.Specifically,the main research contents of this thesis are as follows:Firstly,this thesis introduces the background and significance of the research on weibo emotion classification and the current research status at home and abroad.Second,two methods for microblog text collection are used to collect microblog data.One is a microblog text collection method based on the microblog API,and the other is a crawler-based text collection method.Methods were compared experimentally.At the same time,three Chinese microblog text preprocessing methods,namely Chinese word segmentation method,text vector representation and emotion feature weight calculation,are introduced.Third,the microblog emotion classification model based on co-training is improved.For the semi-supervised co-training algorithm,there are many shortcomings,such as a lot of unlabeled samples and the difficulty of introducing noise sample data.By introducing the crowd computing system model,a new co-training classification model is proposed.The test results show that the classification accuracy of the microblog sentiment classification model introduced into the crowd computing system model is improved.Fourth,in order to enhance the efficiency of Weibo emotion classification,the theco-training algorithm is improved.Parallelization of emotion classification preprocessing on spark platform is completed,including parallelization of the two classifier algorithms,namely support vector machine(SVM)and naive Bayes algorithm on Spark platform.The test results show that after the introduction of the Spark platform,the parallel execution of the co-training algorithm has a better performance in terms of speedup ratio,scale ratio and scalability ratio.
Keywords/Search Tags:Sentiment classification, Micro-blog, Crowd computing, Co-training, SVM, Na(?)ve Bayes, Spark
PDF Full Text Request
Related items