Font Size: a A A

Research On Cross-Domain Text Classification Of Tendency Analysis Based On Ensemble Learning

Posted on:2019-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:S S YaoFull Text:PDF
GTID:2428330593951695Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet social platform,the traditional social mode and business economic structure are undergoing tremendous changes.More and more people prefer to communicate with others through WeChat,micro-blog and other social tools,so that a lot of subjective information emerges in the Internet.These text data containing views and opinions are of great value,and text classification of tendency analysis based on these data has become one of the main research contents of Natural Language Processing task.This paper mainly studies two aspects of text classification of tendency analysis.From the perspective of sentiment classification algorithm,we design an algorithm for absolute imbalanced data classification;and from the aspect of text feature extraction,we propose a sentiment classification algorithm based on multi-feature fusion.The main work and achievements of this paper are as follows:(1)For the text sentiment analysis task,this paper proposes a transfer learning method based on cascade structure,which solves the problem of absolute imbalance of target domain data from the data level and algorithm level.At the algorithm level,a TrAdaboost algorithm based on weight recovery factor is proposed.The algorithm not only solves the problem of non-recoverable weight of auxiliary data in TrAdaboost algorithm,but also takes different sample weight updating strategy by using costsensitive learning idea to different types of samples in different fields.At the data level,the data of the target area is oversampled and the auxiliary data is under-sampled by using a cascade structure,which can effectively avoid "negative" migration while balancing the data set.Experimental results show that the proposed ensemble learning algorithm based on cascade structure can solve the problem of absolute imbalance classification,and its classification effect is better than the current imbalance classification algorithm and instance based transfer learning algorithm.(2)The distributed word vector training model mainly focuses on the context cooccurrence of texts and ignores the intrinsic sentiment characteristics of words.Some researches begin to utilize the existing information of sentiment-rich resource vectors,but they ignored the domain dependency issues.In this paper,a sentiment classification algorithm based on multi-feature fusion is proposed.On the one hand,implementation of words representation through word context information,part of speech character,and emotional dictionary;on the other hand,CNN-LSTM structures with different convolution kernels are used to implement sentence representation.Experimental results show that the proposed sentiment classification model based on multi-feature fusion can improve the effect of affective classification.
Keywords/Search Tags:Text classification of tendency analysis, Ensemble learning, Transfer learning, Text representation, Imbalanced learning
PDF Full Text Request
Related items