Font Size: a A A

Research On Chinese Word Segmentation And Sentiment Analysis For Micro-blog Text

Posted on:2017-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:J M ShiFull Text:PDF
GTID:2308330485980925Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the time of mobile internet is coming, as the representative of social media,micro-blog, has attracted great attention.The brevity and arbitrary of social media text posed new challenges to many natural language processing tasks.Sentiment classification for micro-blog is a hot issue in the field of natural language processing in recent years. The present mainstream framework for sentiment classification is 1.design some exquisite features according to the data set 2.select a suitable machine learning classifier. This method could gain a very high precision, but the features are highly relied on the data set, so it wouldn’t work if the data set was changed to a real word data, this thesis proposed a representation learning method for sentient classification that get the sentence embedding by combine word embedding which could avoid designing features manually. Because the sentence embedding is generated by combine word embedding,so Chinese Word Segmentation(CWS) is the foundation of representation learning, the performance of sentiment analysis is heavily relying on the performance of CWS.A lot of new words that generated online would certainly reduce the performance of present segmentation algorithm, therefore,a representation learning CWS method for micro-blog texts was first proposed.In the thesis, we first introduced and compared the current popular word segmentation methods, and then a representation learning CWS method was proposed, the algorithm use unsupervised neural network to get character embedding that contains semantic information,then the character embedding was treated as features, and then input them into a sequence labeling model, some post-processing steps were also added considering the innate characteristic of micro-blog. Then this CWS method was used in sentiment classification, a convolutional neural network was trained to get sentence embedding by combine word embedding, and the sentence embedding was treated as features to classify sentiment directly.Specifically, the innovative research of this thesis is mainly reflected on the following points:(1) A representation learning Chinese word segmentation method was proposed, this method could get unsupervised character embedding, and treat these embedding as features input them into sequence labeling model, some post-processing steps were also added considering the innate characteristic of micro-blog.(2) This thesis explored how to use the embedding as features and use them in CRF.This thesis also evaluated the performance of sentence embedding when this segmentation method was used.(3) This thesis tried to get sentence embedding by convolutional neural networks, a convolutional neural network was used to combine word embedding, then the sentence embedding was used as features in sentiment classification.
Keywords/Search Tags:Representation Learning, Sentiment Analysis, Chinese Word Segmentation, Convolutional neural networks
PDF Full Text Request
Related items