Font Size: a A A

The Research Of Opinion Sentence Identification And Element Extraction In Chinese Micro Blogs

Posted on:2017-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:G Q WangFull Text:PDF
GTID:2348330488958158Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Micro blogs as a new style has received widespread attention. Under the drive of many evaluation at home and abroad, the research of sentiment analysis about micro blogs has become a hot spot of NLP. The micro-blogging platform has large amount of opinion text, the analysis of them can understand user preferences, and not only has theoretical significance, but also has practical value.This article is aimed at opinion sentence identification and element extraction for Chinese micro blogs which is a special kind of text. According to the characteristics of Chinese micro blogs, we should find the suitable analysis method.Due to opinion sentence identification is the premise of element extraction, in order to ensure its accuracy, we adopted unsupervised machine learning method, and used a SVM classifier which combined unigram feature in morphology to classify emotion. We compared the classification performance of a lot of feature representation, and used information gain to reduce the amount of features in feature set. The experiments show that the weight setting method of TF-IDF is more suited to sentiment classification in Chinese micro blogs. When the amount of features we used is 20% of the total number of the feature set, it can achieve the highest accuracy 95.54%. Meanwhile, We compared the classification effect of different feature representation in single-clause and multi-clause micro blogs. The results show that discrete representation and sentence modeling with distributed representation can get higher accuracy in multi-clause micro blogs, whereas sentence modeling with composition representation is more applicable to single-clause micro blogs.In the aspect of element extraction, in order to avoid the mutual interference between different categories of micro blogs, we first applied the subject classification using the LDA model, and determined subject words in every classification. Second, we used the two-level association rule to extract object-level and feature-level frequency item sets, which need the compact pruning and confidence pruning both in structure and semantics. We made rules of sifting and delimiting to get object-level and feature-level elements on the basis of frequency item sets in every micro blog. Finally, we found the corresponding relation of objects and features with the location information of words and the pointwise mutual information, and ascertained the sentiment orientation of elements by the results of opinion sentence identification. The released data of the sixth Chinese Opinion Analysis Evaluation was adopted as experimental data. The experiment results is essentially flat to the best results of the evaluation in 2014, our F-measure is 23.83%. As well as the extraction results of object-level and feature-level are both better than the best evaluation results, our F-measures are 46.66% and 46.48% respectively.
Keywords/Search Tags:Sentiment Analysis, Feature Representation, SVM, Subject Classification, Association Rule
PDF Full Text Request
Related items