Font Size: a A A

Research On Clustering Algorithm Of Feature Extraction Algorithm And Viewpoint Of Emotion Analysis

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2208330470968114Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With increasingly development of web technology, more and more people on the Internet to freely express their opinion. This makes the Internet extremely valuable resource for mining the user opinions about all kinds of topics. In this context sentiment analysis and opinion mining can be studied. However, the previous text processing technology can not be directly applicable to sentiment analysis and opinion mining about reviews on the web. So it is necessary to study suit for sentiment characteristic of the web text. In contrast to high frequency of the topic characteristics of traditional text classification, the sentiment characteristic of web reviews is low frequency and sparse distribution. According to these features, the thesis foucus on algorithms of sentiment analysis and opinion mining of web reviews.It is important of theoretical and practical value.Our research work mainly includes following aspects:Firstly, on the basis of statistical analysis of two corpuses, we put forward POS-based feature selection algorithm of sentiment classification. It is found in the Statistics analysis that adjectives, adverbs, verbs and nouns are sentiment-orient.So they can be classified as terms of sentiment feature. According to part-of-speech to filter extract words who’s POS is not adjectives, adverbs, verbs or nouns, it will make the feature space significantly to be reduced and make some words which do not contribute to sentiment classification to be excluded. And then the improved information gain feature selection methods and χ2 statistics feature selection methods are used to selection of emotion feature. The experiment also proved that the classification performance of the POS-based feature selection of setiment classification is more dramatically increased than that of the traditional feature selection algorithm based on word frequency.Secondly, because N-gram model will produce a lot of redundant information in the classification based on N-gram model, the noise will affect performance of the classification. So for this shortcoming we proposed feature selection algorithm based on N-pos of sentiment classification. In the statistical analysis of POS combination pattern of the N-pos term, sentiment-orient POS combination patterns are limited, so we can make use of this statistical regularity to filter N-pos terms which do not contribute to sentiment classification in order to reduce feature space dimension and improve classification accuracy. Also verified by contradistinctive experiment, classification performance of the feature selection algorithm based on N-pos of sentiment classification is more improved than feature extraction algorithm based on N-gram model of sentiment classification.Thirdly, we study problems about opinion-sentence clustering of opinion-integration and proposed opinion-sentence clustering algorithm based on PLSA. According to characteristic of opinion-sentence in the web reviews, we first reduce dimensions and eliminate ambiguity of words in the way of SVD. Secondly, we cluster opinion-sentences into different categories through PLSA clustering algorithm, thus accomplish opinion-sentence clustering. Comments for the network point of view the characteristics of sentences, first using the SVD dimension reduction, elimination of synonyms phenomenon. Also verified by experiments, the clustering performance of this algorithm is obvious.Finally, we develop prototype system of sentiment analysis and opinion mining to support experiments of the thesis, realize text preprocessing module, sentiment classification module and opinion clustering module provide a friendly user interface.
Keywords/Search Tags:sentiment analysis, opinion mining, classfication, clustering, algorithm
PDF Full Text Request
Related items