Font Size: a A A

Research Of Chinese Weibo Opinionated Sentence Identification Based On Feature Template And SVM

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2308330464466360Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of WEB2.0 technology and microblog platform, posting microblog have increasingly become a kind of living style and habit by public. Through these Chinese weibo sentences, individual and organization have chance to obtain public’s thoughts and views towards things, nation is possible to grasp, control and lead media and public’s opinion. Nevertheless, massive data from microblog platform contains redundant, noisy and non-opinionated information, we have no access to extract opinionated sentences from them by artificial way. Therefore, how to identify and extract the sentences with users’ opinions from big data has become a hot field of research.This paper mainly discusses as followed: To start with, summarize and analyze the special characteristics of weibo texts. According to these characteristics, we get down to expand the Hownet lexicon and for the phenomena such as informal language usage in weibo, the initial texts are preprocessed. Furthermore, information gain is employed to select features, and later we use a score function to further explore how to construct a feature rule template. Then, by virtue of SVM classifier, we use Java programming language to implement the binary classification experiment which identifies opinionated and non-opinionated sentences in Chinese weibo texts. Finally, by analyzing the characteristics of non-opinionated sentences, we design a negation template and conduct the experiment to verify its effectiveness.The mainly innovations in this paper are: 1) Taking varied aspects of weibo characteristics into account, the Hownet lexicon is expanded effectively so that the word segmentation can be more correct. 2) Through combining information gain, score function and corresponding analysis, the feature template is designed, in which the effect of classification turns to be better. 3) We set a negation template, which provides an alternate thinking way to explore opinionated sentence identification.
Keywords/Search Tags:Opinionated sentence, Feature template, SVM, Text classification
PDF Full Text Request
Related items